with sub1 as(
    select b.city as city,
    round(sum(a.loan_amount),2) as total_loan_amount,
    round(sum(a.loan_amount)/count(distinct b.customer_id),2) as average_loan_amount,
    count(distinct b.customer_id) as total_customers
    from loan_applications a join customers b using(customer_id)
    group by b.city),
    sub2 as(
    select city,loan_type_name as most_applied_loan_type
    from(
    select b.city as city,
    c.loan_type_name as loan_type_name,
    row_number()over(partition by b.city order by count(c.loan_type_name) desc,c.loan_type_id) as rk
    from loan_applications a join customers b using(customer_id)
    join loan_application_types d using(application_id)
    join loan_types c using(loan_type_id)
    group by b.city,c.loan_type_id,c.loan_type_name) as sub3
    where sub3.rk=1)
    select sub1.city,total_loan_amount,average_loan_amount,total_customers,most_applied_loan_type
    from sub1 join sub2 using(city)
    order by sub1.city

存在误区:

1、客户数量,务必customer_id去重求值;

2、贷款均值=贷款总额/客户数量,而非“每笔贷款的均值”;

3、每个城市最常见的贷款类型,partition by city 是根据城市分区,order by count(c.loan_type_name)再计算该城市下,每个贷款类型数量,进行排序;partition by city,c.loan_type_name是直接根据城市且贷款类型组合分区,即使排序后,每个是单独一个序号,都为1——这种是错误的。

4、分组颗粒度,group by b.city,c.loan_type_id,c.loan_type_name和group by b.city,c.loan_type_name区别:如果c.loan_type_name一致,后者其实两个不同的类型就会划到一组,所以最好加上“类型id”