在sklearn predict_proba()方法中,通常我们只查看概率最高的方法。如何输出前n个类(n> 1)的概率?
例如,输出predict_proba()是这样的,我如何返回最高的2个概率及其关联类?
result_prob = clf.predict_proba(X_test)
返回:
array([
2.55420153e-02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
3.41739673e-02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 2.11688875e-05, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 8.02579585e-01, 0.00000000e+00,
0.00000000e+00, 1.37978949e-02, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 1.15640553e-02, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.76391638e-02,
9.06030431e-03, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 3.56218448e-02, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00])
在这种情况下,应该返回概率为8.02579585e-01和的类6.76391638e-02。
解决办法:这实际上是一个Numpy问题; 你可以使用np.argpartition:
import numpy as np
x =np.array([
2.55420153e-02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
3.41739673e-02, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 2.11688875e-05, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 8.02579585e-01, 0.00000000e+00,
0.00000000e+00, 1.37978949e-02, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 1.15640553e-02, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 6.76391638e-02,
9.06030431e-03, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 3.56218448e-02, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00])
k = 2 # top-k
ind = np.argpartition(x, -k)[-k:]
x[ind]
结果:
array([0.06763916, 0.80257959])
根据要求,各个班级在ind:
ind
# array([27, 14])








暂无数据