通过使用Python,我想删除文本中的一些单词,这些单词由列表列表组成,如下所示(例如text_list由5个文本组成,每个单词包含大约4到8个单词,以及删除单词5个单词列表):
text_list = [["hello", "how", "are", "you", "fine", "thank", "you"],
["good", "morning", "have", "great", "breakfast"],
["you", "are", "a", "student", "I", "am", "a", "teacher"],
["trump", "it", "is", "a", "fake", "news"],
["obama", "yes", "we", "can"]]
remove_words = ["hello", "breakfast", "a", "obama", "you"]
当您处理上面的小数据时,这是一个非常简单的问题,如下所示:
new_text_list = list()
for text in text_list:
temp_list = list()
for word in text:
if word not in remove_words:
temp_list.append(word)
new_text_list.append(temp_list)
但是当谈到包含超过10,000个文本的大量数据时,每个文本中还包含超过1,000个单词,以及超过20,000个单词的删除单词列表,我想知道如何处理这种情况。是不是有任何有效的Python代码可以产生相同的结果或任何多核处理程序左右?提前致谢!





