数据科学专业问答社区，好文章，一字千金--CDA答疑社区

python数据清洗案例代码（速查 helpfull!!!）

读取数据以及数据整理[25]:with open( 'QQ.txt',mode='r',encoding='utf-8') as file: txt=file.read()txt分别提取时间,用户名,聊天内容[27]:#正则表达式提取#提取信息import rere_pat='20[\d-]{8}\s[\d:]{7,8}\s.*[\)>]'#正则表达式,'.*'代表任意组合log_title

CDA117513

2022-02-22

48.1037 6 0

python数据清洗案例代码（速查 helpfull!!!）<谷歌论文数据清洗>

读取数据df=pd.read_cvs('.csv',index_col=0)df.head()df.info()df.shape数据清洗删除重复数据df.dulpicated().sum()df['标题'].unique()标题处理df['标题']=df['标题'].str.strip('[PDF] ')提取引用次数df['引用次数']=df['引用次数'].str.findall('\d+').

CDA117513

2022-02-21

201.2724 6 0

python数据清洗案例代码（速查 helpfull!!!）<餐厅订单详情表分析>

导入数据df1=pd.read_excel('.xlxs',sheetname='')df2=...df=pd.concat([df1,df2,df3],axis=0) #axis=0>>>detail_iddf.reset_index(drop=True,inplace=True)df.shapedf,info()清洗数据检查是否存在重复值df.duplicated().sum()删除缺失的字段

CDA117513

2022-02-20

206.9696 6 0

python数据清洗案例代码（速查 helpfull!!!）<淘宝电商案例>

读取并探索数据df=pd.read_csv('.csv')df.shapedf.info()查看数值型字段的描述性统计信息df.describe(include='all')#If include='all' is provided as an option, the result will include a union of attributes of each type. The inclu

CDA117513

2022-02-20

75.5484 6 0

python数据清洗案例代码（速查 helpfull!!!）《Stockholm气温的数据分析》

数据清洗outdoor数据df1=pd.read_csv('.csv',index_col=0) #没有索引，设置/增加df1.columns=['A','B'] #有列名，修改df1.shapedf1.info()df1['']=pd.to_datetime(df1[''],unit='s')indoor 数据df2=pd.read_csv('.tsv',sep='\t',names=['A'.

CDA117513

2022-02-20

75.5484 6 0

Pandas中快速转换数据类型的10个技巧（推荐！！！）

目录：将字符串/int转换为int/float将浮点数转换为int转换混合数据类型的列处理缺失值将货币列转换为浮点数将布尔值转换为0/1一次性转换多个数据列读取CSV文件时定义数据类型创建一个自定义函数来转换数据类型astype()与to_numeric()的比较(本文翻译自：https://towardsdatascience.com/converting-data-to-a-numeric-t

CDA117513

2022-02-20

75.5484 7 0

python数据清洗案例代码（速查 helpfull!!!）<欧洲城市人口普查>

1. 导入数据 df=pd.read_csv('.csv')df.head()2. 初步探索数据2.1. 查看数据类型df.info()2.2. 查看Shapedf.shape2.3. 是否有缺失值3. 对数据进行清洗3.1. 城市 df[''].unique()df['']=df[''].str.strip('[2107]')3.2. 国家df['']=df[''].str.strip()3.3

CDA117513

2022-02-20

75.5484 7 0

sql面试题练习答疑（社区范围求解！感谢！已解答）

题目：我的答案：select tag,CONCAT(avg_play_progress,'%') as avg_play_progressfrom (select tag,round(avg(if(TIMESTAMPDIFF(SECOND,start_time,end_time)>duration,1,TIMESTAMPDIFF(SECOND,start_time,end_time)/durati

CDA117513

2022-02-20

44.5165 6 0

Pandas 文件读取规范

5 - Pandas - Reading CSV and Basic Plotting.pdf

CDA117513

2022-02-20

44.5165 6 0

Pandas Dataframe 练习题（加强记忆）

4 - Pandas DataFrames exercises.pdf

CDA117513

2022-02-19

53.6950 6 0

Pandas -Dataframe(精简笔记<公式速查>)

3 - Pandas - DataFrames.pdf

CDA117513

2022-02-19

53.6950 6 0

Pandas series 练习题（加强记忆）

2 - Pandas Series exercises.pdf

CDA117513

2022-02-19

50.0612 6 0

Pandas - Series(精简笔记<公式速查>)

1 - Pandas - Series.pdf

CDA117513

2022-02-19

50.0612 6 0

NumPy(精简笔记<公式速查>)

2. NumPy.pdf

CDA117513

2022-02-19

50.0612 6 0

NumPy 练习题（加强记忆）

NumPy exercises.pdf

CDA117513

2022-02-19

50.0612 6 0

pandas2笔记(代码)

Pandas2-课堂.md

CDA117513

2022-02-16

7.9498 6 0

Python File(文件) 方法

1.1 非表格文件的读取# 非表格文件（文本文件）：TXT、CSV# CSV和TXT文件没什么本质区别1.1.1 使用open()方法文件打开三个动作：打开open>>>读写read>>>关闭closepython文件读取语法: open(file, mode='r')更多请参考：https://www.runoob.com/python/file-methods.html备注：open

CDA117513

2022-02-16

44.0809 6 0

python函数：lambda函数和def自定义函数有什么区别？

以数据帧（DataFrame）为例子：df4_1 = DataFrame([ [1, 2, 3], [4, 5, 6], [7, 8, 9]], index=['a','c','b'],columns=['one','two','three'])需求：给df4_1每个元素加上10solution1:df4_1.apply(lambda x:x+10 )# lambda函数的

CDA117513

2022-02-16

24.2578 6 1

Python pandas Series：如何根据series index筛选出想要的数据？（index为字符串类型）

例子：筛选出index中名字开头为'李'或者名字中包含'李'的序列判断：筛选：文中代码如下：s.index.str.startswith('李')'李四'.startswith('李')s.index.str.contains('李')s[s.index.str.startswith('李')] s[s.index.str.contains('李')]

CDA117513

2022-02-15

27.5344 4 0

Python pandas文件读取：读文件信息的时候为什么字符串（列）类型显示为object？

如以下学生成绩表：** 字符型在表格的列中存的是地址（因为位宽没办法确定，没办法确定存储空间），所以是以object显示** 但数据类型是object不一定是字符串，因为一列里面有可能混有数值和字符串

CDA117513

2022-02-15

145.4349 4 1