如何计算单词频率？

詹惠儿

2019-07-02 阅读量: 576

如何计算单词频率？

我们将构建一个函数来计算文本中的单词频率。我们将考虑一个示例测试文本，稍后将用我们刚刚下载的书籍的文本文件替换示例文本。因为我们要去计算单词频率，因此大写和小写字母是相同的。我们将整个文本转换为小写并保存。

text = "This is my test text. We're keeping this text short to keep things manageable."

text = text.lower()

可以用各种方式计算单词频率。我们将编写代码，两种方式（仅用于知识）。一个使用for循环，另一个使用Counter来自集合，这证明比前一个更快。函数将返回一个独特单词的字典及其作为键值对的频率。所以，我们编码：

from collections import Counter

def count_words(text):

skips = [".", ", ", ":", ";", "'", '"']

for ch in skips:

text = text.replace(ch, "")

word_counts = {}

for word in text.split(" "):

if word in word_counts:

word_counts[word]+= 1

else:

word_counts[word]= 1

return word_counts

# >>>count_words(text) You can check the function

def count_words_fast(text): #counts word frequency using Counter from collections

text = text.lower()

skips = [".", ", ", ":", ";", "'", '"']

for ch in skips:

text = text.replace(ch, "")

word_counts = Counter(text.split(" "))

return word_counts

# >>>count_words_fast(text) You can check the function

0.0000

关注作者

发表评论

暂无数据

CDA考试动态

CDA报考指南

推荐帖子