with sub1 as( select b.city as city, round(sum(a.loan_amount),2) as total_loan_amount, round(sum(a.loan_amount)/count(distinct b.customer_id),2) as average_loan_amount, count(distinct b.customer_id) as total_customers from loan_applications a join customers b using(customer_id) group by b.city), sub2 as( select city,loan_type_name as most_applied_loan_type from( select b.city as city, c.loan_type_name as loan_type_name, row_number()over(partition by b.city order by count(c.loan_type_name) desc,c.loan_type_id) as rk from loan_applications a join customers b using(customer_id) join loan_application_types d using(application_id) join loan_types c using(loan_type_id) group by b.city,c.loan_type_id,c.loan_type_name) as sub3 where sub3.rk=1) select sub1.city,total_loan_amount,average_loan_amount,total_customers,most_applied_loan_type from sub1 join sub2 using(city) order by sub1.city
存在误区:
1、客户数量,务必customer_id去重求值;
2、贷款均值=贷款总额/客户数量,而非“每笔贷款的均值”;
3、每个城市最常见的贷款类型,partition by city 是根据城市分区,order by count(c.loan_type_name)再计算该城市下,每个贷款类型数量,进行排序;partition by city,c.loan_type_name是直接根据城市且贷款类型组合分区,即使排序后,每个是单独一个序号,都为1——这种是错误的。
4、分组颗粒度,group by b.city,c.loan_type_id,c.loan_type_name和group by b.city,c.loan_type_name区别:如果c.loan_type_name一致,后者其实两个不同的类型就会划到一组,所以最好加上“类型id”