我想从我的数据框列中提取年份data3['CopyRight']。
CopyRight
2015 Sony Music Entertainment
2015 Ultra Records , LLC under exclusive license
2014 , 2015 Epic Records , a division of Sony Music Entertainment
Compilation ( P ) 2014 Epic Records , a division of Sony Music Entertainment
2014 , 2015 Epic Records , a division of Sony Music Entertainment
2014 , 2015 Epic Records , a division of Sony Music Entertainment
我使用以下代码来提取年份:
data3['CopyRight_year'] = data3['CopyRight'].str.extract('([0-9]+)', expand=False).str.strip()
我的代码我只获得了第一次出现的年份。
CopyRight_year
2015
2015
2014
2014
2014
2014
我想提取专栏中提到的所有年份。
预期产出
CopyRight_year
2015
2015
2014,2015
2014
2014,2015
2014,2015
解决办法:
findall与regex一起使用,查找长度4为list的所有整数,并按join分隔符结束:
data3['CopyRight_year'] = data3['CopyRight'].str.findall(r'\b\d{4}\b').str.join(',')
print (data3)
CopyRight CopyRight_year
0 2015 Sony Music Entertainment 2015
1 2015 Ultra Records , LLC under exclusive license 2015
2 2014 , 2015 Epic Records , a division of Sony ... 2014,2015
3 Compilation ( P ) 2014 Epic Records , a divisi... 2014
4 2014 , 2015 Epic Records , a division of Sony ... 2014,2015
5 2014 , 2015 Epic Records , a division of Sony ... 2014,2015








暂无数据