pandas数据结构：Series-CDA数据分析师官网

pandas数据结构：Series

2020-06-16

pandas有Series和DataFrame两种数据结构，我们之前已经讲过了DataFrame，接下来给大家介绍下另一种数据结构Series。

什么是Series？

# 自定义Series索引

arr = np.random.rand(5)
s = pd.Series(arr, index=list("abcde"))
print(s)

a    0.239432
b    0.554542
c    0.058231
d    0.211549
e    0.362285
dtype: float64

[ 0.67962276  0.76999562  0.95308305  0.66162424  0.93883112]

0    0.679623
1    0.769996
2    0.953083
3    0.661624
4    0.938831
dtype: float64

RangeIndex(start=0, stop=5, step=1) <class 'pandas.core.indexes.range.RangeIndex'>

[0, 1, 2, 3, 4]

[ 0.67962276  0.76999562  0.95308305  0.66162424  0.93883112]

# 自定义Series索引

arr = np.random.rand(5)
s = pd.Series(arr, index=list("abcde"))
print(s)

a    0.239432
b    0.554542
c    0.058231
d    0.211549
e    0.362285
dtype: float64

Series创建方法

# 通过标量创建

s = pd.Series(100, index=range(5))
print(s)

0    100
1    100
2    100
3    100
4    100
dtype: int64

# 通过标量创建

s = pd.Series(100, index=range(5))
print(s)

0    100
1    100
2    100
3    100
4    100
dtype: int64

# 通过标量创建

s = pd.Series(100, index=range(5))
print(s)

0    100
1    100
2    100
3    100
4    100
dtype: int64

Series下标索引

arr = np.random.rand(5)*100
s = pd.Series(arr, index=[chr(i) for i in range(97, 97+len(arr))])
print(s)
print("")

bool_index = s>50  # 布尔型索引
print(bool_index)
print("")

print(s[s>50])  # 用bool_index取出s中大于50的值

a    24.447599
b     0.795073
c    49.464825
d     9.987239
e    86.314340
dtype: float64

a    False
b    False
c    False
d    False
e     True
dtype: bool

e    86.31434
dtype: float64

a    0.001694
b    0.107466
c    0.272233
d    0.637616
e    0.875348
dtype: float64

0.107465887721

0.107465887721

b    0.107466
d    0.637616
dtype: float64

a    0.001694
c    0.272233
dtype: float64

Series切片

print(s)
s["f"] = None  # 给s添加一个空值
s["g"] = np.nan  # np.nan 代表有问题的值 也会识别为空值
print("")

print(s)
print("")

bool_index1 = s.isnull()  # 判断那些值是空值: 空值是True 非空为False
print(bool_index1)
print("")

print(s[bool_index1])  # 取出空值
print("")

bool_index2 = s.notnull()  # 判断那些值是非空值: 空值是False 非空为True
print(bool_index2)
print("")

print(s[bool_index2])  # 取出非空值

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
f        None
g         NaN
dtype: object

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
f        None
g         NaN
dtype: object

a    False
b    False
c    False
d    False
e    False
f     True
g     True
dtype: bool

f    None
g     NaN
dtype: object

a     True
b     True
c     True
d     True
e     True
f    False
g    False
dtype: bool

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
dtype: object

Series布尔型索引

print(s)
s["f"] = None  # 给s添加一个空值
s["g"] = np.nan  # np.nan 代表有问题的值 也会识别为空值
print("")

print(s)
print("")

bool_index1 = s.isnull()  # 判断那些值是空值: 空值是True 非空为False
print(bool_index1)
print("")

print(s[bool_index1])  # 取出空值
print("")

bool_index2 = s.notnull()  # 判断那些值是非空值: 空值是False 非空为True
print(bool_index2)
print("")

print(s[bool_index2])  # 取出非空值

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
f        None
g         NaN
dtype: object

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
f        None
g         NaN
dtype: object

a    False
b    False
c    False
d    False
e    False
f     True
g     True
dtype: bool

f    None
g     NaN
dtype: object

a     True
b     True
c     True
d     True
e     True
f    False
g    False
dtype: bool

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
dtype: object

print(s)
s["f"] = None  # 给s添加一个空值
s["g"] = np.nan  # np.nan 代表有问题的值 也会识别为空值
print("")

print(s)
print("")

bool_index1 = s.isnull()  # 判断那些值是空值: 空值是True 非空为False
print(bool_index1)
print("")

print(s[bool_index1])  # 取出空值
print("")

bool_index2 = s.notnull()  # 判断那些值是非空值: 空值是False 非空为True
print(bool_index2)
print("")

print(s[bool_index2])  # 取出非空值

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
f        None
g         NaN
dtype: object

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
f        None
g         NaN
dtype: object

a    False
b    False
c    False
d    False
e    False
f     True
g     True
dtype: bool

f    None
g     NaN
dtype: object

a     True
b     True
c     True
d     True
e     True
f    False
g    False
dtype: bool

a     24.4476
b    0.795073
c     49.4648
d     9.98724
e     86.3143
dtype: object

Series基本技巧

查看数据

import numpy as np
import pandas as pd

s = pd.Series(np.random.rand(15))
print(s)
print("")

print(s.head())  # 查看前5条数据
print("")

print(s.head(2))  # 查看前2条数据
print("")

print(s.tail())  # 查看后5条数据
print("")

print(s.tail(2))  # 查看后两条数据

0     0.049732
1     0.281123
2     0.398361
3     0.492084
4     0.555350
5     0.729037
6     0.603854
7     0.643413
8     0.951804
9     0.459948
10    0.261974
11    0.897656
12    0.428898
13    0.426533
14    0.301044
dtype: float64

0    0.049732
1    0.281123
2    0.398361
3    0.492084
4    0.555350
dtype: float64

0    0.049732
1    0.281123
dtype: float64

10    0.261974
11    0.897656
12    0.428898
13    0.426533
14    0.301044
dtype: float64

13    0.426533
14    0.301044
dtype: float64

重置索引

# reindex 与给索引重新命名不同

s = pd.Series(np.random.rand(5), index=list("bdeac"))
print(s)
print("")

s1 = s.reindex(list("abcdef"))  # Series的reindex使它符合新的索引，如果索引不存在就自动填入空值
print(s1)
print("")

print(s)  # 不会改变原数组
print("")

s2 = s.reindex(list("abcdef"), fill_value=0)  # 如果索引值不存在就自定义填入缺失值
print(s2)

b    0.539124
d    0.853346
e    0.065577
a    0.406689
c    0.562758
dtype: float64

a    0.406689
b    0.539124
c    0.562758
d    0.853346
e    0.065577
f         NaN
dtype: float64

b    0.539124
d    0.853346
e    0.065577
a    0.406689
c    0.562758
dtype: float64

a    0.406689
b    0.539124
c    0.562758
d    0.853346
e    0.065577
f    0.000000
dtype: float64

对齐

s1 = pd.Series(np.random.rand(3), index=list("abc"))
s2 = pd.Series(np.random.rand(3), index=list("cbd"))
print(s1)
print("")

print(s2)
print("")

print(s1+s2)  # 对应的标签相加  缺失值加任何值还是缺失值

a    0.514657
b    0.618971
c    0.456840
dtype: float64

c    0.083065
b    0.893543
d    0.125063
dtype: float64

a         NaN
b    1.512513
c    0.539905
d         NaN
dtype: float64

删除

# Series.drop("索引名") 

s = pd.Series(np.random.rand(5), index=list("abcde"))
print(s)
print("")

s1 = s.drop("b")  # 一次删除一个并返回副本
print(s1)
print("")

s2 = s.drop(["d", "e"])  # 一次删除两个并返回副本
print(s2)
print("")

print(s)  # 验证原数没有改变

a    0.149823
b    0.330215
c    0.069852
d    0.967414
e    0.867417
dtype: float64

a    0.149823
c    0.069852
d    0.967414
e    0.867417
dtype: float64

a    0.149823
b    0.330215
c    0.069852
dtype: float64

a    0.149823
b    0.330215
c    0.069852
d    0.967414
e    0.867417
dtype: float64

s = pd.Series(np.random.rand(5), index=list("abcde"))
print(s)
print("")

s1 = s.drop(["b", "c"], inplace=True)  # inplace默认是False 改为True后不会返回副本 直接修改原数组
print(s1)
print("")

print(s)  # 验证原数组已改变

a    0.753187
b    0.077156
c    0.626230
d    0.428064
e    0.809005
dtype: float64

None

a    0.753187
d    0.428064
e    0.809005
dtype: float64

添加

s1 = pd.Series(np.random.rand(5), index=list("abcde"))
print(s1)
print("")

# 通过索引标签添加
s1["f"] = 100
print(s1)
print("")

# 通过append添加一个数组 并返回一个新的数组

s2 = s1.append(pd.Series(np.random.rand(2), index=list("mn")))
print(s2)

a    0.860190
b    0.351980
c    0.237463
d    0.159595
e    0.119875
dtype: float64

a      0.860190
b      0.351980
c      0.237463
d      0.159595
e      0.119875
f    100.000000
dtype: float64

a      0.860190
b      0.351980
c      0.237463
d      0.159595
e      0.119875
f    100.000000
m      0.983410
n      0.293722
dtype: float64

Series pandas DataFrame numpy

数据分析咨询请扫描二维码

上一篇第12届CDA考试即将开始，这份备考攻略请你查收！

下一篇销售漏斗，一个重要的销售管理模型

pandas数据结构：Series

对齐

考试指南

报考指南

热门栏目

最新资讯

政府、国央企、科研单位——中国航信-面向测试度量 ...

CDA内训丨深圳迈瑞生物医疗数据分析统计思维培训 ...

CDA数据分析师应合肥阳光新能源科技有限公司邀约开 ...

CDA走进海尔大学

苏州中行&CDA数据分析师开展数据分析师培训 ...

中国银行江苏分行-大数据应用培训

浙江农信数据建模及案例应用培训

华夏银行信用卡中心-机器学习培训

字节跳动-CDA案例实操及行业分析

长沙银行-Python集训营

数据分析在业务中的三大应用场景

AI提示词的使用方法详解及示例

CDA立足未来职场，拓展前沿视野

CDA 塑造未来职涯：构筑未来职业发展的数字基石 ...

随机森林（Random Forest）算法的优点和缺点都有哪 ...

方差分析的基本思想和原理是什么？

发现了一个好用到爆的数据分析利器

自从搞懂了回调函数，我对Python的理解上了一个台阶 ...

2020在学硕士达300万，失去学历光环的新生代何去何 ...

缓解就业焦虑的利器，证书真的越多越有保障吗？ ...