DeepSeek V1, V2 and V3 Deep Dive | Kings AI Reading Group

Avyav Singh presented the collection of papers that led to the innovative appraoch adopted to create DeepSeek V1 and V2 (Guo et al., 2025) (Liu et al., 2024), (Liu et al., 2024), (Gloeckle et al., 2024) and (Shao et al., 2024)

Abstract

This talk explored the rapid evolution of DeepSeek’s foundational models from V1 through V3, highlighting key architectural advancements and capabilities. We explored in depth the key innovations which enabled the sucess of DeepSeek namely the introduced Mixture-of-Experts (MoE) architecture for scalable compute and dynamic routing, Multi-head Latent Attention and auxiliary-loss-free strategy for load balancing.

References

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning

Daya Guo, Dejian Yang, Haowei Zhang, and 8 more authors

arXiv preprint arXiv:2501.12948, 2025
Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model

Aixin Liu, Bei Feng, Bin Wang, and 8 more authors

arXiv preprint arXiv:2405.04434, 2024
Deepseek-v3 technical report

Aixin Liu, Bei Feng, Bing Xue, and 8 more authors

arXiv preprint arXiv:2412.19437, 2024
Better & faster large language models via multi-token prediction

Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, and 2 more authors

arXiv preprint arXiv:2404.19737, 2024
Deepseekmath: Pushing the limits of mathematical reasoning in open language models

Zhihong Shao, Peiyi Wang, Qihao Zhu, and 8 more authors

arXiv preprint arXiv:2402.03300, 2024

References

Enjoy Reading This Article?