select count(date2)/count(date1) as avg_ret from (
(SELECT device_id, date AS date1 
FROM question_practice_detail 
GROUP BY device_id, date) as t1
left join
(SELECT device_id, date_add(date,interval 1 day) AS date2
FROM question_practice_detail 
GROUP BY device_id, date2) as t2
on 
t1.device_id = t2.device_id and t1.date1 = t2.date2)

# select avg(if(datediff(date2,date1)=1,1,0))from(
# select 
# device_id,
# date as date1,
# lead(date,1,null) over (partition by device_id order by date) date2
# from 
# (select distinct device_id,date from question_practice_detail) t1)t

条件概率平均:AVG(IF(...))AVG() 函数计算所有返回的 10 的平均值。假设数据集有 100 行,如果有 40 行符合 DATEDIFF(date2, date1) = 1,即返回 1,其余 60 行返回 0。那么 AVG() 会计算 (40 * 1 + 60 * 0) / 100 = 0.4,即这 100 行数据中有 40% 满足日期差为 1 的条件。

日期加减

date_add:date_add(date,interval 1 day)

datediff:datediff(date2,date1)

lead()的用法:lead(date) over (partition by device_id order by date)https://blog.csdn.net/u013536232/article/details/103545895

为什么要使用 GROUP BY 替代 DISTINCT?

性能优化:GROUP BY 通常比 DISTINCT 更高效,特别是在处理大量数据时。

避免重复结果:在进行 JOIN 操作时,DISTINCT 可能无法完全控制返回的行数,而 GROUP BY 则更容易确保每个组合唯一。

对日期进行加法,要用date_add()函数