GPT-5 Takes #1 on SWE-bench at 78.4%

OpenAI's flagship surpasses all previous models on the canonical software-engineering benchmark. FUNDING

Mistral Closes $640M Series C at $6B Valuation

The Paris-based lab will use the funds to expand its API, enterprise sales, and open-weight research. RELEASE

Gemini 2.5 Flash Hits 2M Token Context

Google expands context parity with Claude and adds Grounding API improvements in the same drop. KolayVibe Built in Istanbul. Charting AI for the curious, the cautious, and the shipping. Learn Courses Learning Paths Prompt Library AI Glossary Discover Compare Models AI News Vibe Pulse AI Legends Platform Overview Pricing Marketplace API Access Company About Blog Careers Contact © 2026 KolayVibe · All rights reserved Privacy Terms

← back to news

PAPER 3h ago · 5 min

DeepSeek Publishes V4 Architecture Notes

The Chinese lab reveals mixture-of-experts details and training compute breakdowns for DeepSeek-V4.

By KolayVibe Editorial · Jun 5, 2026 · updated Jun 5

DeepSeek's 38-page V4 technical report walks through their MoE routing, expert specialization patterns, and a compute breakdown that puts pretraining at 3.8M H800-equivalent hours.

Most interesting: the routing analysis suggests certain experts effectively specialize as language-pair translators, supporting the 'innate multilingual structure' hypothesis.

Weights are not released; only the paper.

FAQ

Are DeepSeek-V4 weights publicly available?

No. DeepSeek released only the 38-page technical report. Earlier DeepSeek releases were open-weight, so the closed launch represents a notable change of posture.

How much compute went into DeepSeek-V4 pretraining?

The paper reports 3.8M H800-equivalent hours for pretraining, with an additional breakdown across MoE routing finetuning and supervised post-training stages.

primary source arXiv ↗