什么是词语_CDA答疑社区

啊啊啊啊啊吖

2018-12-06 阅读量: 787

什么是词语

一般情况下，数据科学家并不看重词云，这在很大程度上是因为单词的布局没有任

何特殊意义，顶多意味着“这里还有一些空，可以放上一个单词”而已。

当你不得不生成一个词云的时候，不妨考虑一下能否透过词的坐标传达某些东西。举例来

说，假如你收集了一些与数据科学相关的流行语，那么对于每一个流行语，你可以用两个

介于 0 至 100 之间的数字来描述，第一个数字代表它在招聘广告中出现的频次，第二个数

字是在简历中出现的频次：

data = [ ("big data", 100, 15), ("Hadoop", 95, 25), ("Python", 75, 50),
("R", 50, 40), ("machine learning", 80, 20), ("statistics", 20, 60),
("data science", 60, 70), ("analytics", 90, 3),
("team player", 85, 85), ("dynamic", 2, 90), ("synergies", 70, 0),
("actionable insights", 40, 30), ("think out of the box", 45, 10),
("self-starter", 30, 50), ("customer focus", 65, 15),
("thought leadership", 35, 35)]

词云的做法，只不过就是利用很酷的字体把各个单词布置到页面上罢了。

def text_size(total):
"""equals 8 if total is 0, 28 if total is 200"""
return 8 + total / 200 * 20
for word, job_popularity, resume_popularity in data:
plt.text(job_popularity, resume_popularity, word,
ha='center', va='center',
size=text_size(job_popularity + resume_popularity))
plt.xlabel("Popularity on Job Postings")
plt.ylabel("Popularity on Resumes")
plt.axis([0, 100, 0, 100])
plt.xticks([])
plt.yticks([])
plt.show()

0.0000

关注作者

发表评论

暂无数据

CDA考试动态

CDA报考指南

推荐帖子