DeepSeek V1, V2 and V3 Deep Dive

Avyav Singh presented the collection of papers that led to the innovative appraoch adopted to create DeepSeek V1 and V2 (Guo et al., 2025) (Liu et al., 2024), (Liu et al., 2024), (Gloeckle et al., 2024) and (Shao et al., 2024)

Abstract

This talk explored the rapid evolution of DeepSeek’s foundational models from V1 through V3, highlighting key architectural advancements and capabilities. We explored in depth the key innovations which enabled the sucess of DeepSeek namely the introduced Mixture-of-Experts (MoE) architecture for scalable compute and dynamic routing, Multi-head Latent Attention and auxiliary-loss-free strategy for load balancing.

References

  1. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning
    Daya Guo, Dejian Yang, Haowei Zhang, and 8 more authors
    arXiv preprint arXiv:2501.12948, 2025
  2. Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model
    Aixin Liu, Bei Feng, Bin Wang, and 8 more authors
    arXiv preprint arXiv:2405.04434, 2024
  3. Deepseek-v3 technical report
    Aixin Liu, Bei Feng, Bing Xue, and 8 more authors
    arXiv preprint arXiv:2412.19437, 2024
  4. Better & faster large language models via multi-token prediction
    Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, and 2 more authors
    arXiv preprint arXiv:2404.19737, 2024
  5. Deepseekmath: Pushing the limits of mathematical reasoning in open language models
    Zhihong Shao, Peiyi Wang, Qihao Zhu, and 8 more authors
    arXiv preprint arXiv:2402.03300, 2024



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Agentic AI systems - Promises, Risks and the Paths Forward
  • A Survey of Cognitive Distortion Detection and Classification in NLP
  • Reproducibility The New Frontier in AI Governance
  • The Biology of a Large Language Model (Anthropic)
  • Reflections on The International AI Safety Report