20250731-QwenQwen3-30B-A3B-Instruct-2507

原文摘要

New model update from Qwen, improving on their previous Qwen3-30B-A3B release from late April. In their tweet they said:

Smarter, faster, and local deployment-friendly.

✨ Key Enhancements:
✅ Enhanced reasoning, coding, and math skills
✅ Broader multilingual knowledge
✅ Improved long-context understanding (up to 256K tokens)
✅ Better alignment with user intent and open-ended tasks
✅ No more <think> blocks — now operating exclusively in non-thinking mode

🔧 With 3B activated parameters, it's approaching the performance of GPT-4o and Qwen3-235B-A22B Non-Thinking

I tried the chat.qwen.ai hosted model with "Generate an SVG of a pelican riding a bicycle" and got this:

This one is cute: blue sky, green grass, the sun is shining. The bicycle is a red block with wheels that looks more like a toy car. The pelican doesn't look like a pelican and has a quirky smile printed on its beak.

I particularly enjoyed this detail from the SVG source code:

<!-- Bonus: Pelican's smile -->
<path d="M245,145 Q250,150 255,145" fill="none" stroke="#d4a037" stroke-width="2"/>

I went looking for quantized versions that could fit on my Mac and found lmstudio-community/Qwen3-30B-A3B-Instruct-2507-MLX-8bit from LM Studio. Getting that up and running was a 32.46GB download and it appears to use just over 30GB of RAM.

The pelican I got from that one wasn't as good:

It looks more like a tall yellow hen chick riding a segway

I then tried that local model on the "Write an HTML and JavaScript page implementing space invaders" task that I ran against GLM-4.5 Air. The output looked promising, in particular it seemed to be putting more effort into the design of the invaders (GLM-4.5 Air just used rectangles):

// Draw enemy ship
ctx.fillStyle = this.color;

// Ship body
ctx.fillRect(this.x, this.y, this.width, this.height);

// Enemy eyes
ctx.fillStyle = '#fff';
ctx.fillRect(this.x + 6, this.y + 5, 4, 4);
ctx.fillRect(this.x + this.width - 10, this.y + 5, 4, 4);

// Enemy antennae
ctx.fillStyle = '#f00';
if (this.type === 1) {
    // Basic enemy
    ctx.fillRect(this.x + this.width / 2 - 1, this.y - 5, 2, 5);
} else if (this.type === 2) {
    // Fast enemy
    ctx.fillRect(this.x + this.width / 4 - 1, this.y - 5, 2, 5);
    ctx.fillRect(this.x + (3 * this.width) / 4 - 1, this.y - 5, 2, 5);
} else if (this.type === 3) {
    // Armored enemy
    ctx.fillRect(this.x + this.width / 2 - 1, this.y - 8, 2, 8);
    ctx.fillStyle = '#0f0';
    ctx.fillRect(this.x + this.width / 2 - 1, this.y - 6, 2, 3);
}

But the resulting code didn't actually work:

Black screen - a row of good looking space invaders advances across the screen for a moment... and then the entire screen goes blank.

That same prompt against the unquantized Qwen-hosted model produced a different result which sadly also resulted in an unplayable game - this time because everything moved too fast.

This new Qwen model is a non-reasoning model, whereas GLM-4.5 and GLM-4.5 Air are both reasoners. It looks like at this scale the "reasoning" may make a material difference in terms of getting code that works out of the box.

Tags: ai, generative-ai, llms, qwen, mlx, llm-reasoning, llm-release, lm-studio

[原文链接](https://simonwillison.net/2025/Jul/29/qwen3-30b-a3b-instruct-2507/#atom-everything)

进一步信息揣测

- **模型性能与宣传差异**：官方宣称Qwen3-30B-A3B-Instruct-2507接近GPT-4o性能，但实际测试（如SVG生成）显示其输出质量不稳定，尤其在本地量化版本（如8bit-MLX）中表现更差，说明宣传可能夸大或未覆盖边缘场景。 - **量化版本的隐性成本**：本地部署的8bit量化模型需下载32.46GB文件并占用30GB+内存，但对生成质量有显著折损（如Pelican图像劣化），暗示量化虽降低硬件门槛但牺牲精度，需权衡资源与效果。 - **模型对齐的隐藏缺陷**：尽管声称“更好对齐用户意图”，但测试案例（如Space Invaders代码）显示模型仍倾向于过度设计细节（如敌人飞船的眼睛），可能偏离实用需求，反映对齐目标与实际输出的偏差。 - **私有部署的未公开挑战**：文章未提及但隐含本地模型部署的复杂性（如依赖LM Studio等第三方工具链），暗示用户需额外技术栈或付费服务支持，增加隐性学习成本。 - **行业竞争内幕**：模型强调“去除``块”以简化流程，可能暗指此前版本因复杂机制导致用户体验差，同行（如GPT系列）已通过类似简化赢得市场，反映行业向易用性倾斜的趋势。 - **开源社区的灰色协作**：LM Studio社区快速提供量化版本，侧面说明官方未覆盖的细分需求（如轻量化部署）依赖第三方生态填补，但质量参差不齐，存在版权或支持风险。 - **测试案例的选择性偏差**：作者仅展示创意任务（SVG、游戏代码），回避商业场景（如数据分析、API集成）的测评，可能暗示模型在复杂逻辑或生产环境中的局限性未被公开讨论。