热线电话:13121318867

登录
2019-02-27 阅读量: 592
python如何利用NLP对文本进行分析(5)

第5步:将语料库分为训练和测试集。为此,我们需要来自sklearn.cross_validation的class train_test_split。拆分可以是70/30或80/20或85/15或75/25,这里我通过“test_size”选择75/25。

X是单词包,y是0或1(正面或负面)。

# Splitting the dataset into

# the Training set and Test set

from sklearn.cross_validation import train_test_split

# experiment with "test_size"

# to get better results

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

第6步:拟合预测模型(此处为随机森林)

  • 由于Random fored是来自sklearn.ensemble的集合模型(由许多树组成),因此导入RandomForestClassifier类
  • 使用501树或“n_estimators”,标准为“熵”
  • 通过.fit()方法使用属性X_train和y_train拟合模型

# Fitting Random Forest Classification

# to the Training set

from sklearn.ensemble import RandomForestClassifier

# n_estimators can be said as number of

# trees, experiment with n_estimators

# to get better results

model = RandomForestClassifier(n_estimators = 501,

criterion = 'entropy')

model.fit(X_train, y_train)

24.8917
3
关注作者
收藏
评论(0)

发表评论

暂无数据
推荐帖子