WITH t1 AS(
    SELECT order_id, uid, product_id, event_time, tag
            ,DENSE_RANK() OVER(PARTITION BY uid, product_id ORDER BY event_time DESC, order_id) AS buy_num
            ,MAX(event_time) OVER() AS max_date
    FROM tb_order_detail
    JOIN tb_product_info USING(product_id)
    JOIN tb_order_overall USING(order_id)
)
, t2 AS(
    SELECT product_id
            ,ROUND(COUNT(DISTINCT uid)/uid_product, 3) AS rate
    FROM t1
    JOIN (
        SELECT product_id
        , COUNT(DISTINCT uid) AS uid_product
        FROM t1
        GROUP BY product_id
    ) data1 USING(product_id) 
    WHERE DATEDIFF(max_date, event_time) < 90 AND buy_num >= 2
    GROUP BY product_id
)

SELECT product_id, ROUND(IFNULL(rate, 0), 3) AS repurchase_rate
FROM tb_product_info
LEFT JOIN t2 USING(product_id)
WHERE tag = "零食" 
ORDER BY repurchase_rate DESC, product_id
LIMIT 3

  • 注意,最大时间是全局最大时间,不是各个产品最近销售的最大时间,t1中添加:MAX(event_time) OVER(PARTITION BY product_id ) AS max_date反而会出错
  • 错误1:
  • 本身想在t1中使窗口函数计算各产品product的总用户数,然后直接在t2进行比值并输出,结果报错
  • 报错:程序异常退出, 请检查代码"是否有数组越界等异常"或者"是否有语法错误",SQL_ERROR_INFO: "This version of MySQL doesn't yet support '
  • 原因在于: 1、t1窗口函数在使用窗口函数时,不能使用COUNT(DISTINCT ),解决办法:可以使用DENSErank替代2、t1成功后,由于t2需要GROUP,因此在GROUP中也不可以直接使用uid_product,考虑可以加上GROUP BY product_id,uid_product
  • 实现:显然本文没有使用这些修改方法,这是后期想到的
  • 错误2:
  • 由于t2步需要剔除没有复购的样本,因此t2中不存在产品复购率为0的样本,但结果要求即使没有复购的也需要输出复购率为0,因此使用LEFT JOIN