我想得到这个维基百科数据集(people_wiki.csv)中每个单词的单词计数。我能够得到每个单词并将其作为字典出现,但我无法将字典键值对拆分为单独的列。我已经尝试了几种方法(from_dict,from_records,to_frame,pivot_table等)这在python中是可行的。我将不胜感激任何帮助。
Samle数据集:
URI name text
<http://dbpedia.org/resource/George_Clooney> George Clooney 'george timothy clooney born may 6 1961 is an american actor writer producer director and activist he has received three golden globe awards for his work as an actor and two academy awards one for acting and the other for producingclooney made his...'
我试过了:
clooney_word_count_table = pd.DataFrame.from_dict(clooney['word_count'], orient='index', columns=['word','count']
我也尝试过:
clooney['word_count'].to_frame()
问题是,clooney是(含一排索引35817)一个数据帧,所以clooney['word_count']是一个系列的指数35817包含一个值(您计数字典)。
DataFrame.from_dict然后将这个系列视为等同于{35817: {'george': 1,...}哪个系统会给你带来令人困惑的结果。
尝试类似的东西:
c = Counter()
cloony['text'].apply(lambda x: c.update(x.split()))
pd.from_dict(c, orient='index', columns=['count'])








暂无数据