我制作了一个可重复的数据集。
在这个数据集中,我试图获取按“值”和“类别”分组的列,并且只有在该group_by中存在值大于4的“值”时,才能获得“类别”中所有值的最大值
提出问题的另一种方法是为每个标签的每个“类别”获取最大的“值”,只有在“类别”中存在大于4的“值”时
das <- data.frame(val=1:24,
weigh=c(10,10,10,11,11,11,20,20,20,21,21,21,30,30,30,31,31,31,40,40,40,41,41,41),
value=c(4.1,3.2,4.3,1.1,2.2,5.3,2.1,2.2,3.3,3.1,8.2,1.3,3.6,2.1,3.1,3.1,3.1,1.1,7.2,4.5,5.1,3.2,2.5,9.1),
label=c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4),
category=c("A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C","A","B","C"))
val weigh value label category
1 1 10 4.1 1 A
2 2 10 3.2 1 B
3 3 10 4.3 1 C
4 4 11 1.1 1 A
5 5 11 2.2 1 B
6 6 11 5.3 1 C
7 7 20 2.1 2 A
8 8 20 2.2 2 B
9 9 20 3.3 2 C
10 10 21 3.1 2 A
11 11 21 8.2 2 B
12 12 21 1.3 2 C
13 13 30 3.6 3 A
14 14 30 2.1 3 B
15 15 30 3.1 3 C
16 16 31 3.1 3 A
17 17 31 3.1 3 B
18 18 31 1.1 3 C
19 19 40 7.2 4 A
20 20 40 4.5 4 B
21 21 40 5.1 4 C
22 22 41 3.2 4 A
23 23 41 2.5 4 B
24 24 41 9.1 4 C
这是预期的产出
val weigh value label category
1 1 10 4.1 1 A
5 6 11 5.3 1 C
2 2 10 3.2 1 B
10 10 21 3.1 2 A
3 11 21 8.2 2 B
9 9 20 3.3 2 C
2 19 40 7.2 4 A
4 20 40 4.5 4 B
6 24 41 9.1 4 C
我试过跟随,但没有得到预期的输出。这里我只是获取值> 4,而不是该类别中具有此标签的所有最大数字
das1 <- das[das$value >4,]
result <- das1 %>%
group_by(category,label) %>%
slice(which.max(value))
val weigh value label category
1 1 10 4.1 1 A
5 6 11 5.3 1 C
3 11 21 8.2 2 B
2 19 40 7.2 4 A
4 20 40 4.5 4 B
6 24 41 9.1 4 C
解决办法:我们可以先group_by label和filter具有群体any value > 4,然后选择只排max value在每个label和category。
library(dplyr)
das %>%
group_by(label) %>%
filter(any(value > 4)) %>%
ungroup() %>%
group_by(label, category) %>%
slice(which.max(value))
# val weigh value label category
# <int> <dbl> <dbl> <dbl> <fct>
#1 1 10 4.1 1 A
#2 2 10 3.2 1 B
#3 6 11 5.3 1 C
#4 10 21 3.1 2 A
#5 11 21 8.2 2 B
#6 9 20 3.3 2 C
#7 19 40 7.2 4 A
#8 20 40 4.5 4 B
#9 24 41 9.1 4 C








暂无数据