select count(date2)/count(date1) as avg_ret from ( (SELECT device_id, date AS date1 FROM question_practice_detail GROUP BY device_id, date) as t1 left join (SELECT device_id, date_add(date,interval 1 day) AS date2 FROM question_practice_detail GROUP BY device_id, date2) as t2 on t1.device_id = t2.device_id and t1.date1 = t2.date2) # select avg(if(datediff(date2,date1)=1,1,0))from( # select # device_id, # date as date1, # lead(date,1,null) over (partition by device_id order by date) date2 # from # (select distinct device_id,date from question_practice_detail) t1)t
条件概率平均:AVG(IF(...)):AVG() 函数计算所有返回的 1 和 0 的平均值。假设数据集有 100 行,如果有 40 行符合 DATEDIFF(date2, date1) = 1,即返回 1,其余 60 行返回 0。那么 AVG() 会计算 (40 * 1 + 60 * 0) / 100 = 0.4,即这 100 行数据中有 40% 满足日期差为 1 的条件。
日期加减
date_add:date_add(date,interval 1 day)
datediff:datediff(date2,date1)
lead()的用法:lead(date) over (partition by device_id order by date)https://blog.csdn.net/u013536232/article/details/103545895
为什么要使用 GROUP BY 替代 DISTINCT?
性能优化:GROUP BY 通常比 DISTINCT 更高效,特别是在处理大量数据时。
避免重复结果:在进行 JOIN 操作时,DISTINCT 可能无法完全控制返回的行数,而 GROUP BY 则更容易确保每个组合唯一。
对日期进行加法,要用date_add()函数

京公网安备 11010502036488号