DeepSeek V1, V2 and V3 Deep Dive
Avyav Singh presented the collection of papers that led to the innovative appraoch adopted to create DeepSeek V1 and V2 (Guo et al., 2025) (Liu et al., 2024), (Liu et al., 2024), (Gloeckle et al., 2024) and (Shao et al., 2024)
Abstract
This talk explored the rapid evolution of DeepSeek’s foundational models from V1 through V3, highlighting key architectural advancements and capabilities. We explored in depth the key innovations which enabled the sucess of DeepSeek namely the introduced Mixture-of-Experts (MoE) architecture for scalable compute and dynamic routing, Multi-head Latent Attention and auxiliary-loss-free strategy for load balancing.
References
- Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learningarXiv preprint arXiv:2501.12948, 2025
- Deepseek-v2: A strong, economical, and efficient mixture-of-experts language modelarXiv preprint arXiv:2405.04434, 2024
- Deepseek-v3 technical reportarXiv preprint arXiv:2412.19437, 2024
- Better & faster large language models via multi-token predictionarXiv preprint arXiv:2404.19737, 2024
- Deepseekmath: Pushing the limits of mathematical reasoning in open language modelsarXiv preprint arXiv:2402.03300, 2024
Enjoy Reading This Article?
Here are some more articles you might like to read next: