我有这个数据框
A
0 -2
1 0
2 2
3 2
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 2
13 2
14 2
15 2
16 2
17 3
18 2
19 0
20 2
21 2
22 2
我想基于序列的长度来阈值数据,以便上面的例子展平B部分,因为它的长度小于3,如下所示
解决办法:df = pd.DataFrame([-2,0,2,2,0,0,0,0,0,0,0,0,2,2,2,2,2,3,2,0,2,2,2,0,3,3,0])
df.columns = ['A']
df
为了理智检查,我在最后添加了两个3和一个4,这给了我们
A
0 -2
1 0
2 2
3 2
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 2
13 2
14 2
15 2
16 2
17 3
18 2
19 0
20 2
21 2
22 2
23 0
24 3
25 3
26 0
现在我们必须看看哪些元素必须转为零才能使用
prev = None
flag = 0
terminationLst = []
for val,i in zip(df['A'],df.index):
if val == 0 and prev == None: #First time encountering a zero element
prev = i
continue
if val !=0 and prev != None: #Encountering a non zero element after having seen a zero
flag = 1
elif val == 0 and i-prev > 3: Encountering a zero after more than 3 consecutive none zeros
prev = i
elif val == 0 and i-prev <=3 and flag ==1: #Encountering a zero after less than 3 consecutive non zeros
flag = 0
terminationLst.append([x for x in range(prev+1,i)])
prev = i
print (terminationLst)
这给了我们需要转向零的元素索引 [[2, 3], [24, 25], [27]]
现在我们只需将它们变为零,这可以简单地完成
for elem in terminationLst:
df['A'].iloc[elem] = 0
现在数据框变成了
A
0 -2
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 2
13 2
14 2
15 2
16 2
17 3
18 2
19 0
20 2
21 2
22 2
23 0
24 0
25 0
26 0
27 0
28 0








暂无数据