https://towardsai.net/p/artificial-intelligence/a-visual-walkthrough-of-deepseeks-multi-head-latent-attention-mla-️

https://github.com/deepseek-ai/DeepSeek-V2

https://arxiv.org/html/2405.04434v5

https://www.youtube.com/watch?v=t509sv5MT0w

https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/

https://github.com/tspeterkim/flash-attention-minimal/tree/main