Every projected window for DeepSeek V4 has passed. Reuters reported that DeepSeek withheld its forthcoming model from US chipmakers, confirming a delay that the AI community had been tracking since the Lunar New Year window closed without a release.
What the delay doesn’t obscure is the architectural ambition. In January 2026, DeepSeek and Peking University published arXiv paper 2601.07372, titled “Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models.” The paper describes a system called Engram, a conditional memory module that provides O(1) knowledge retrieval by separating stored knowledge from the reasoning process. That’s a significant architectural departure from standard attention mechanisms, and it’s the most direct technical signal available about what V4 is being built to do.
Analysis suggests the delays may be linked to challenges training on Huawei’s Ascend AI chips, including stability and software tooling issues, though DeepSeek has not publicly confirmed this as the cause. The inference draws from secondary analysis, not a primary technical disclosure.
Meanwhile, community reports suggest a quieter development: a “V4 Lite” update reportedly expanded DeepSeek’s production model context window to 1 million tokens around March 9, 2026, though DeepSeek has not officially confirmed this. The full V4 model is widely expected to feature a Mixture-of-Experts architecture at roughly 1 trillion parameters, though DeepSeek has not confirmed specifications.
For developers evaluating open-weight model options, V4 remains a significant pending release. The Engram paper is worth reading directly, it’s the primary technical source available on what DeepSeek is building while the community waits.