我说有一个包含蜇和浮子列表的数据框
Names Prob
[Anne, Mike, Anne] [10.0, 10.0, 80.0]
[Sophie, Andy, Vera, Kate] [30.0, 4.5, 5.5, 60.0]
[Josh, Anne, Sophie] [51, 24, 25]
我想要做的是循环Names,如果名称包含在预定义的组中,重新标记它,然后聚合相应的数字Prob。
例如,如果team1 = ['Anne', 'Mike', 'Sophie']我想结束:
Names Prob
[Team_One] [100.0]
[Andy, Kate, Team_One, Vera] [4.5, 60.0, 30.0, 5.5]
[Josh, Team_One] [51, 49]
我写的是这个,但我认为这有点荒谬的TBH,我在循环中创建一个临时数据帧,然后进行分组; 对我来说听起来有点矫枉过正,太沉重了。
请问有效率更高吗?(如果重要,我正在使用Python 3)
import pandas as pd
def pool(df):
team1 = ['Anne', 'Mike', 'Sophie']
names = df['Names']
prob = df['Prob']
out_names = []
out_prob = []
for key, name in enumerate(names):
# relabel if in team1 otherwise keep it the same
name = ['Team_One' if x in team1 else x for x in name]
# make a temp dataframe and group by name
temp = pd.DataFrame({'name': name, 'prob': prob[key]} )
temp = temp.groupby('name').sum()
# make the output
out_names.append(temp.index.tolist())
out_prob.append(temp['prob'].tolist())
df['Names'] = out_names
df['Prob'] = out_prob
return df
df = pd.DataFrame({
'Names':[['Anne', 'Mike', 'Anne'],
['Sophie', 'Andy', 'Vera', 'Kate'],
['Josh', 'Anne', 'Sophie']
],
'Prob': [[10., 10., 80.],
[30., 4.5, 5.5, 60.],
[51, 24, 25]
]
解决办法:使用defaultdict在列表和所有的值,然后将其转换为元组的列表,并通过对数据帧的构造函数:
from collections import defaultdict
out = []
for a, b in zipped:
d = defaultdict(int)
for x, y in zip(a, b):
if x in team1:
d['Team_One'] +=y
else:
d[x] = y
out.append((list(d.keys()), list(d.values())))
df = pd.DataFrame(out, columns=['Names','Prob'])
print (df)
Names Prob
0 [Team_One] [100.0]
1 [Team_One, Andy, Vera, Kate] [30.0, 4.5, 5.5, 60.0]
2 [Josh, Team_One] [51, 49]
解决方案如果没有0值,则工作Prob:








暂无数据