我有两个pandas数据帧df1和df2。我需要df1通过搜索df2['B']来查看是否df1['A']是子字符串来创建新列df2['B']。如果匹配,则返回df2['A']新列的值df1['B']。
以下是示例数据帧
DF1
A B
9.female.ceo.,ceo, ?
9.female.ned.,ned,
9.female.ned.,chair,
2.female.ed.,ned,
2.female.ned.,ed,
9.female.chair.,ceo,
2.female.chair.,chair,
DF2
A B
,ceo,ned, 2.male.chair.,ceo,ned,
,chair,ned, 2.male.ned.,chair,ned,
,ned, 2.female.ed.,ned,
,ceo,chair, 6.female.ed.,ceo,chair,
,ed,ceo, 6.male.chair.,ed,ceo,
,ceo,chair, 9.female.ed.,ceo,chair,
,ceo,ned, 9.female.chair.,ceo,ned,
,chair,(in ft10), 9.male.ceo.,chair,(in ft10),
解决办法:Idea是通过拆分创建集合,并匹配issubset:
d = {k: set(v.split(',')) for k, v in df2.set_index('A')['B'].items()}
df1['B'] = [next(iter([k for k, v in d.items() if set(x.split(',')).issubset(v)]), '')
for x in df1['A']]
print (df1)
A B
0 9.female.ceo.,ceo,
1 9.female.ned.,ned,
2 9.female.ned.,chair,
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed,
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair,
测试解决方案in:
d = df2.set_index('A')['B']
df1['B'] = [next(iter([k for k, v in d.items() if x in v]), '') for x in df1['A']]
print (df1)
A B
0 9.female.ceo.,ceo,
1 9.female.ned.,ned,
2 9.female.ned.,chair,
3 2.female.ed.,ned, ,ned,
4 2.female.ned.,ed,
5 9.female.chair.,ceo, ,ceo,ned,
6 2.female.chair.,chair,
合并在这种情况下不起作用,因为df1['A']包含子串df2['B']








暂无数据