python如何利用NLP对文本进行分析（5）

詹惠儿

2019-02-27 阅读量: 637

第5步：将语料库分为训练和测试集。为此，我们需要来自sklearn.cross_validation的class train_test_split。拆分可以是70/30或80/20或85/15或75/25，这里我通过“test_size”选择75/25。

X是单词包，y是0或1（正面或负面）。

# Splitting the dataset into

# the Training set and Test set

from sklearn.cross_validation import train_test_split

# experiment with "test_size"

# to get better results

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

第6步：拟合预测模型（此处为随机森林）

# Fitting Random Forest Classification

# to the Training set

from sklearn.ensemble import RandomForestClassifier

# n_estimators can be said as number of

# trees, experiment with n_estimators

# to get better results

model = RandomForestClassifier(n_estimators = 501,

criterion = 'entropy')

model.fit(X_train, y_train)

24.8917

关注作者

发表评论

暂无数据

CDA考试动态

CDA报考指南

推荐帖子