热线电话:13121318867

登录
2019-02-25 阅读量: 675
用字符串进行搜索

我有两个pandas数据帧df1和df2。我需要df1通过搜索df2['B']来查看是否df1['A']是子字符串来创建新列df2['B']。如果匹配,则返回df2['A']新列的值df1['B']。

以下是示例数据帧

DF1

A B

9.female.ceo.,ceo, ?

9.female.ned.,ned,

9.female.ned.,chair,

2.female.ed.,ned,

2.female.ned.,ed,

9.female.chair.,ceo,

2.female.chair.,chair,

DF2

A B

,ceo,ned, 2.male.chair.,ceo,ned,

,chair,ned, 2.male.ned.,chair,ned,

,ned, 2.female.ed.,ned,

,ceo,chair, 6.female.ed.,ceo,chair,

,ed,ceo, 6.male.chair.,ed,ceo,

,ceo,chair, 9.female.ed.,ceo,chair,

,ceo,ned, 9.female.chair.,ceo,ned,

,chair,(in ft10), 9.male.ceo.,chair,(in ft10),

解决办法:Idea是通过拆分创建集合,并匹配issubset:

d = {k: set(v.split(',')) for k, v in df2.set_index('A')['B'].items()}

df1['B'] = [next(iter([k for k, v in d.items() if set(x.split(',')).issubset(v)]), '')

for x in df1['A']]

print (df1)

A B

0 9.female.ceo.,ceo,

1 9.female.ned.,ned,

2 9.female.ned.,chair,

3 2.female.ed.,ned, ,ned,

4 2.female.ned.,ed,

5 9.female.chair.,ceo, ,ceo,ned,

6 2.female.chair.,chair,

测试解决方案in:

d = df2.set_index('A')['B']

df1['B'] = [next(iter([k for k, v in d.items() if x in v]), '') for x in df1['A']]

print (df1)

A B

0 9.female.ceo.,ceo,

1 9.female.ned.,ned,

2 9.female.ned.,chair,

3 2.female.ed.,ned, ,ned,

4 2.female.ned.,ed,

5 9.female.chair.,ceo, ,ceo,ned,

6 2.female.chair.,chair,

合并在这种情况下不起作用,因为df1['A']包含子串df2['B']

0.0000
2
关注作者
收藏
评论(0)

发表评论

暂无数据
推荐帖子