20250731-The_best_available_open_weight_LLMs_now_come_from_

原文摘要

Something that has become undeniable this month is that the best available open weight models now come from the Chinese AI labs.

I continue to have a lot of love for Mistral, Gemma and Llama but my feeling is that Qwen, Moonshot and Z.ai have positively smoked them over the course of July.

Here's what came out this month, with links to my notes on each one:

Moonshot Kimi-K2-Instruct - 11th July, 1 trillion parameters
Qwen Qwen3-235B-A22B-Instruct-2507 - 21st July, 235 billion
Qwen Qwen3-Coder-480B-A35B-Instruct - 22nd July, 480 billion
Qwen Qwen3-235B-A22B-Thinking-2507 - 25th July, 235 billion
Z.ai GLM-4.5 and GLM-4.5 Air - 28th July, 355 and 106 billion
Qwen Qwen3-30B-A3B-Instruct-2507 - 29th July, 30 billion
Qwen Qwen3-30B-A3B-Thinking-2507 - 30th July, 30 billion

Notably absent from this list is DeepSeek, but that's only because their last model release was DeepSeek-R1-0528 back in April.

The only janky license among them is Kimi K2, which uses a non-OSI-compliant modified MIT. Qwen's models are all Apache 2 and Z.ai's are MIT.

The larger Chinese models all offer their own APIs and are increasingly available from other providers. I've been able to run versions of the Qwen 30B and GLM-4.5 Air 106B models on my own laptop.

I can't help but wonder if part of the reason for the delay in release of OpenAI's open weights model comes from a desire to be notably better than this truly impressive lineup of Chinese models.

<p>Tags: <a href="https://simonwillison.net/tags/open-source">open-source</a>, <a href="https://simonwillison.net/tags/qwen">qwen</a>, <a href="https://simonwillison.net/tags/openai">openai</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/local-llms">local-llms</a>, <a href="https://simonwillison.net/tags/llms">llms</a>, <a href="https://simonwillison.net/tags/ai-in-china">ai-in-china</a></p>

原文链接

进一步信息揣测

中国AI实验室的开源模型性能已超越西方主流模型：Qwen、Moonshot和Z.ai的模型在7月的表现明显优于Mistral、Gemma和Llama，暗示中国在开源大模型领域可能已形成技术优势，但这一趋势未被主流媒体广泛报道。
模型参数规模的隐藏竞争：中国实验室发布的模型参数规模（如1万亿、480亿）远超同期西方模型，可能反映其在算力资源或训练方法上的突破，但具体技术细节未公开。
许可证策略的差异：Kimi K2使用非OSI兼容的修改版MIT许可证，可能隐含对商业使用的限制或知识产权保护策略，而Qwen和Z.ai采用更宽松的Apache 2/MIT许可证，反映不同的开源商业化思路。
本地化部署的可行性：Qwen 30B和GLM-4.5 Air等大模型可在个人笔记本电脑上运行，暗示中国模型在轻量化或优化技术上的进步，但相关压缩技术文档未公开。
OpenAI可能因中国模型推迟发布：作者推测OpenAI延迟开放权重模型是为了避免被中国模型“碾压”，侧面反映行业内部对技术竞争的高度敏感，且西方公司可能面临来自中国的压力。
DeepSeek的“缺席”可能隐含战略调整：DeepSeek自4月后未发布新模型，可能在进行技术迭代或转向闭源，这种沉默在快速迭代的AI领域值得关注。
API与生态布局：中国大模型通过自有API和第三方提供商加速渗透市场，实际商业化运作可能比公开报道的更激进，但具体合作条款和分成模式未披露。
“Thinking”与“Instruct”版本的分化：Qwen推出同名模型的两种变体（如Qwen3-30B-Thinking/Instruct），可能对应不同的推理优化方向，但官方未详细解释其技术差异。