热线电话:13121318867

登录
2019-04-15 阅读量: 936
管道中LogisticRegression的_coef值太多

我正在sklearn管道中使用sklearn-pandas DataFrameMapper。为了评估特征联合管道中的特征贡献,我喜欢测量估计器的系数(Logistic回归)。对于下面的代码示例中,三个文本内容列a, b和c被矢量化和选择用于X_train:

import pandas as pd

import numpy as np

import pickle

from sklearn_pandas import DataFrameMapper

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

from sklearn.pipeline import Pipeline

np.random.seed(1)

data = pd.read_csv('https://pastebin.com/raw/WZHwqLWr')

#data.columns

X = data.copy()

y = data.result

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

mapper = DataFrameMapper([

('a', CountVectorizer()),

('b', CountVectorizer()),

('c', CountVectorizer())

])

pipeline = Pipeline([

('featurize', mapper),

('clf', LogisticRegression(random_state=1))

])

pipeline.fit(X_train, y_train)

y_pred = pipeline.predict(X_test)

print(abs(pipeline.named_steps['clf'].coef_))

0.0000
3
关注作者
收藏
评论(0)

发表评论

暂无数据