2023-07-20
阅读量:
603
Cda数据分析——Sql淘宝案例(五)留存分析
新增/次日/七日/三十日留存情况:以用户id和日期分组后,min()作为开窗函数得到每名用户的新增日日期,然后将排好序的日期字段和firstday字段的日期数据相减(利用datediff),得到相差天数,以此为子表,用sum()if()聚合判断,并以新增日为分组条件,得到每日的留存,再除以新增日,得到各个日期的留存率。
select firstday,
sum(if(diff=0,1,0)) as 新增用户,
sum(if(diff=1,1,0)) as 次日留存用户,
sum(if(diff=6,1,0)) as 七日留存用户,
sum(if(diff=29,1,0)) as 三十日留存用户,
sum(if(diff=1,1,0))/sum(if(diff=0,1,0)) as 次日留存率,
sum(if(diff=6,1,0))/sum(if(diff=0,1,0)) as 7日留存率,
sum(if(diff=29,1,0))/sum(if(diff=0,1,0)) as 30日留存率
from( select user_id ,
日期,
min(日期)over(partition by user_id order by user_id ,日期) as firstday,
datediff(日期,min(日期)over(partition by user_id order by user_id ,日期)) as diff
from userbehavior_new
group by user_id ,日期
order by firstday,diff
) as T
group by firstday;2.查询每个用户的最长连续登录天数:首先以用户id和日期分组,然后用row_number函数添加序号,可以发现在连续的登录天数日期与序号同为等差数列,将两者相减(使用date_sub(interval day))可以为连续登录的天数打上相同的日期标签,将其作为子表,将用户id和分组计数标签作为分组,为标签计数,可以得到用户各个时期的连续登录天数,求最长连续登录天数,只需将此作为子表再进行对每个用户分组下的连续登陆天数求最大值即可。
select user_id,max(连续登录天数) as 最长连续登录天数
from( select user_id,分组计数标签,count(分组计数标签) as 连续登录天数
from(select user_id,
日期,
row_number()over(partition by user_id order by 日期) as 序号,
date_sub(日期,interval row_number()over(partition by user_id order by 日期) day) as 分组计数标签
from userbehavior_new
group by user_id ,日期
) as T1
group by user_id,分组计数标签
) as T2
group by user_id
having 最长连续登录天数 > 1
order by 最长连续登录天数 desc;注:这里的having能使用select里的别名
附链接:
(22条消息) MySQL的having和select顺序问题_sql中having和select的执行顺序_铁血阿张的博客-CSDN博客
0.0000
0
0
关注作者
收藏
评论(0)
发表评论
暂无数据

