热线电话:13121318867

登录
2019-03-06 阅读量: 688
Pandas阈值数据序列基于模式的长度

我有这个数据框

    A
0 -2
1 0
2 2
3 2
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 2
13 2
14 2
15 2
16 2
17 3
18 2
19 0
20 2
21 2
22 2

在此输入图像描述

我想基于序列的长度来阈值数据,以便上面的例子展平B部分,因为它的长度小于3,如下所示

在此输入图像描述

解决办法:df = pd.DataFrame([-2,0,2,2,0,0,0,0,0,0,0,0,2,2,2,2,2,3,2,0,2,2,2,0,3,3,0])

df.columns = ['A']

df

为了理智检查,我在最后添加了两个3和一个4,这给了我们

A

0 -2

1 0

2 2

3 2

4 0

5 0

6 0

7 0

8 0

9 0

10 0

11 0

12 2

13 2

14 2

15 2

16 2

17 3

18 2

19 0

20 2

21 2

22 2

23 0

24 3

25 3

26 0

现在我们必须看看哪些元素必须转为零才能使用

prev = None

flag = 0

terminationLst = []

for val,i in zip(df['A'],df.index):

if val == 0 and prev == None: #First time encountering a zero element

prev = i

continue

if val !=0 and prev != None: #Encountering a non zero element after having seen a zero

flag = 1

elif val == 0 and i-prev > 3: Encountering a zero after more than 3 consecutive none zeros

prev = i

elif val == 0 and i-prev <=3 and flag ==1: #Encountering a zero after less than 3 consecutive non zeros

flag = 0

terminationLst.append([x for x in range(prev+1,i)])

prev = i

print (terminationLst)

这给了我们需要转向零的元素索引 [[2, 3], [24, 25], [27]]

现在我们只需将它们变为零,这可以简单地完成

for elem in terminationLst:

df['A'].iloc[elem] = 0

现在数据框变成了

A

0 -2

1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

9 0

10 0

11 0

12 2

13 2

14 2

15 2

16 2

17 3

18 2

19 0

20 2

21 2

22 2

23 0

24 0

25 0

26 0

27 0

28 0

0.0000
4
关注作者
收藏
评论(0)

发表评论

暂无数据
推荐帖子