Attention
Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)

Transformer

The Illustrated Transformer

Transformer实现

Bert

一文看懂Bert原理
文本分类实践
Pytroch的Bert微调教程