原文摘要
Here are a few more model releases from today, to round out a very busy July:
- Cohere released Command A Vision, their first multi-modal (image input) LLM. Like their others it's open weights under Creative Commons Attribution Non-Commercial, so you need to license it (or use their paid API) if you want to use it commercially.
- San Francisco AI startup Deep Cogito released four open weights hybrid reasoning models, cogito-v2-preview-deepseek-671B-MoE, cogito-v2-preview-llama-405B, cogito-v2-preview-llama-109B-MoE and cogito-v2-preview-llama-70B. These follow their v1 preview models in April at smaller 3B, 8B, 14B, 32B and 70B sizes. It looks like their unique contribution here is "distilling inference-time reasoning back into the model’s parameters" - demonstrating a form of self-improvement. I haven't tried any of their models myself yet.
- Mistral released Codestral 25.08, an update to their Codestral model which is specialized for fill-in‑the‑middle autocomplete as seen in text editors like VS Code, Zed and Cursor.
- And an anonymous stealth preview model called Horizon Alpha running on OpenRouter was released yesterday and is attracting a lot of attention.
<p>Tags: <a href="https://simonwillison.net/tags/llm-release">llm-release</a>, <a href="https://simonwillison.net/tags/openrouter">openrouter</a>, <a href="https://simonwillison.net/tags/mistral">mistral</a>, <a href="https://simonwillison.net/tags/generative-ai">generative-ai</a>, <a href="https://simonwillison.net/tags/cohere">cohere</a>, <a href="https://simonwillison.net/tags/ai">ai</a>, <a href="https://simonwillison.net/tags/llms">llms</a></p>
进一步信息揣测
- Cohere的商用限制:Command A Vision虽然是开源权重,但采用CC-NC许可证,商业用途需额外授权或使用付费API,这可能导致企业面临隐性成本或法律风险。
- Deep Cogito的“推理蒸馏”技术:其模型通过“将推理时逻辑蒸馏回模型参数”实现自我改进,这种技术细节未公开具体实现方式,可能是其核心竞争力,但缺乏第三方验证效果。
- 匿名模型Horizon Alpha的炒作策略:通过OpenRouter匿名发布并迅速吸引关注,可能是刻意制造的“神秘营销”手段,利用社区好奇心推动早期测试,但实际性能可能未经验证。
- Mistral Codestral的垂直领域适配:专注于代码补全(如VS Code、Zed等编辑器),暗示其可能针对开发者工具链优化,但未提及是否与这些编辑器存在私下合作或定制协议。
- Deep Cogito的模型规模跳跃:从v1的3B-70B直接跃升至v2的109B-671B,可能依赖未公开的算力资源或训练技巧,但快速迭代背后可能存在技术债务风险。
- 开源模型的商业化路径:Cohere和Deep Cogito均采用“开源权重+商业许可”模式,反映行业趋势——通过开源吸引开发者,再通过企业级服务盈利,但实际商用门槛可能高于预期。