20250723-Qwen3-Coder_Agentic_Coding_in_the_World

原文摘要

Qwen3-Coder: Agentic Coding in the World

It turns out that as I was typing up my notes on Qwen3-235B-A22B-Instruct-2507 the Qwen team were unleashing something much bigger:

Today, we’re announcing Qwen3-Coder, our most agentic code model to date. Qwen3-Coder is available in multiple sizes, but we’re excited to introduce its most powerful variant first: Qwen3-Coder-480B-A35B-Instruct — a 480B-parameter Mixture-of-Experts model with 35B active parameters which supports the context length of 256K tokens natively and 1M tokens with extrapolation methods, offering exceptional performance in both coding and agentic tasks.

This is another Apache 2.0 licensed open weights model, available as Qwen3-Coder-480B-A35B-Instruct and Qwen3-Coder-480B-A35B-Instruct-FP8 on Hugging Face.

I used qwen3-coder-480b-a35b-instruct on the Hyperbolic playground to run my "Generate an SVG of a pelican riding a bicycle" test prompt:

The bicycle has no spokes. The pelican is light yellow and is overlapping the middle of the bicycle, not perching on it - it has a large yellow beak and a weird red lower beak or wattle.

I actually slightly prefer the one I got from qwen3-235b-a22b-07-25.

It's also available as qwen3-coder on OpenRouter.

In addition to the new model, Qwen released their own take on an agentic terminal coding assistant called qwen-code, which they describe in their blog post as being "Forked from Gemini Code" (they mean gemini-cli) - which is Apache 2.0 so a fork is in keeping with the license.

They focused really hard on code performance for this release, including generating synthetic data tested using 20,000 parallel environments on Alibaba Cloud:

In the post-training phase of Qwen3-Coder, we introduced long-horizon RL (Agent RL) to encourage the model to solve real-world tasks through multi-turn interactions using tools. The key challenge of Agent RL lies in environment scaling. To address this, we built a scalable system capable of running 20,000 independent environments in parallel, leveraging Alibaba Cloud’s infrastructure. The infrastructure provides the necessary feedback for large-scale reinforcement learning and supports evaluation at scale. As a result, Qwen3-Coder achieves state-of-the-art performance among open-source models on SWE-Bench Verified without test-time scaling.

To further burnish their coding credentials, the announcement includes instructions for running their new model using both Claude Code and Cline using custom API base URLs that point to Qwen's own compatibility proxies.

Pricing for Qwen's own hosted models (through Alibaba Cloud) looks competitive. This is the first model I've seen that sets different prices for four different sizes of input:

Pricing table with three columns showing Input token count (0-32K, 32K-128K, 128K-256K, 256K-1M), Input price (Million tokens) ($1, $1.8, $3, $6), and Output price (Million tokens) ($5, $9, $15, $60)

This kind of pricing reflects how inference against longer inputs is more expensive to process. Gemini 2.5 Pro has two different prices for above or below 200,00 tokens.

Awni Hannun reports running a 4-bit quantized MLX version on a 512GB M3 Ultra Mac Studio at 24 tokens/second using 272GB of RAM, getting great results for "write a python script for a bouncing yellow ball within a square, make sure to handle collision detection properly. make the square slowly rotate. implement it in python. make sure ball stays within the square".

Via @Alibaba_Qwen

Tags: ai, generative-ai, llms, ai-assisted-programming, qwen, llm-pricing, pelican-riding-a-bicycle, llm-release, openrouter, coding-agents

[原文链接](https://simonwillison.net/2025/Jul/22/qwen3-coder/#atom-everything)

进一步信息揣测

- **模型训练的内幕技术**：Qwen3-Coder使用了20,000个并行环境在阿里云上进行大规模强化学习（Agent RL），这种规模的资源调配通常需要企业内部基础设施支持，普通开发者难以复现。 - **数据合成的隐藏成本**：生成合成数据并验证需要极高的计算资源（如20,000个并行环境），暗示数据质量优化的背后是巨额云服务投入，非开源社区能轻易承担。 - **定价策略的行业秘密**：模型按输入token分段计价（如256K-1M区间输出token价格骤增至$60/百万），反映长上下文处理的真实成本可能远超公开宣传，企业需警惕隐藏费用。 - **兼容性代理的灰色操作**：通过自定义API基URL（如兼容Claude Code和Cline）实现模型替换，这种“套壳”技术可能绕过部分API限制，但存在服务稳定性风险。 - **性能优化的取舍**：尽管支持1M token外推，但实际演示（如SVG生成）显示细节缺陷（自行车无辐条、鹈鹕姿势怪异），暗示长上下文下模型仍存在输出质量不稳定的问题。 - **开源协议的擦边行为**：qwen-code明确分叉自Gemini CLI（Apache 2.0），但未提及是否完全遵守衍生项目要求（如署名或变更声明），可能存在合规隐患。 - **基础设施依赖的陷阱**：强调阿里云的基础设施支撑，实际部署时若脱离阿里云生态（如自建集群），可能无法复现宣称的性能指标。 - **竞品对比的潜台词**：作者提到更偏好Qwen3-235B的生成结果，暗示480B版本虽参数更大，但某些场景下可能存在过拟合或优化不足的问题。