热线电话:13121318867

登录
2019-02-17 阅读量: 772
使用其他列表中的键聚合列表

我说有一个包含蜇和浮子列表的数据框

Names Prob

[Anne, Mike, Anne] [10.0, 10.0, 80.0]

[Sophie, Andy, Vera, Kate] [30.0, 4.5, 5.5, 60.0]

[Josh, Anne, Sophie] [51, 24, 25]

我想要做的是循环Names,如果名称包含在预定义的组中,重新标记它,然后聚合相应的数字Prob。

例如,如果team1 = ['Anne', 'Mike', 'Sophie']我想结束:

Names Prob

[Team_One] [100.0]

[Andy, Kate, Team_One, Vera] [4.5, 60.0, 30.0, 5.5]

[Josh, Team_One] [51, 49]

我写的是这个,但我认为这有点荒谬的TBH,我在循环中创建一个临时数据帧,然后进行分组; 对我来说听起来有点矫枉过正,太沉重了。

请问有效率更高吗?(如果重要,我正在使用Python 3)

import pandas as pd

def pool(df):

team1 = ['Anne', 'Mike', 'Sophie']

names = df['Names']

prob = df['Prob']

out_names = []

out_prob = []

for key, name in enumerate(names):

# relabel if in team1 otherwise keep it the same

name = ['Team_One' if x in team1 else x for x in name]

# make a temp dataframe and group by name

temp = pd.DataFrame({'name': name, 'prob': prob[key]} )

temp = temp.groupby('name').sum()

# make the output

out_names.append(temp.index.tolist())

out_prob.append(temp['prob'].tolist())

df['Names'] = out_names

df['Prob'] = out_prob

return df

df = pd.DataFrame({

'Names':[['Anne', 'Mike', 'Anne'],

['Sophie', 'Andy', 'Vera', 'Kate'],

['Josh', 'Anne', 'Sophie']

],

'Prob': [[10., 10., 80.],

[30., 4.5, 5.5, 60.],

[51, 24, 25]

]

解决办法:使用defaultdict在列表和所有的值,然后将其转换为元组的列表,并通过对数据帧的构造函数:

from collections import defaultdict

out = []

for a, b in zipped:

d = defaultdict(int)

for x, y in zip(a, b):

if x in team1:

d['Team_One'] +=y

else:

d[x] = y

out.append((list(d.keys()), list(d.values())))

df = pd.DataFrame(out, columns=['Names','Prob'])

print (df)

Names Prob

0 [Team_One] [100.0]

1 [Team_One, Andy, Vera, Kate] [30.0, 4.5, 5.5, 60.0]

2 [Josh, Team_One] [51, 49]

解决方案如果没有0值,则工作Prob:

0.0000
2
关注作者
收藏
评论(0)

发表评论

暂无数据
推荐帖子