20250722-Advanced_version_of_Gemini_with_Deep_Think_officia

原文摘要

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad

OpenAI beat them to the punch in terms of publicity by publishing their results on Saturday, but a team from Google Gemini achieved an equally impressive result on this year's International Mathematics Olympiad scoring a gold medal performance with their custom research model.

(I saw an unconfirmed rumor that the Gemini team had to wait until Monday for approval from Google PR - this turns out to be inaccurate, see update below.)

It's interesting that Gemini achieved the exact same score as OpenAI, 35/42, and were able to solve the same set of questions - 1 through 5, failing only to answer 6, which is designed to be the hardest question.

Each question is worth seven points, so 35/42 cents corresponds to full marks on five out of the six problems.

Only 6 of the 630 human contestants this year scored all 7 points for question 6 this year, and just 55 more had greater than 0 points for that question.

OpenAI claimed their model had not been optimized for IMO questions. Gemini's model was different - emphasis mine:

We achieved this year’s result using an advanced version of Gemini Deep Think – an enhanced reasoning mode for complex problems that incorporates some of our latest research techniques, including parallel thinking. This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought.

To make the most of the reasoning capabilities of Deep Think, we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions.

The Gemini team, like the OpenAI team, achieved this result with no tool use or internet access for the model.

Gemini's solutions are listed in this PDF. If you are mathematically inclined you can compare them with OpenAI's solutions on GitHub.

Last year Google DeepMind achieved a silver medal in IMO, solving four of the six problems using custom models called AlphaProof and AlphaGeometry 2:

First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.

This year's result, scoring gold with a single model, within the allotted time and with no manual step to translate the problems first, is much more impressive.

Update: Concerning the timing of the news, DeepMind CEO Demis Hassabis says:

Btw as an aside, we didn’t announce on Friday because we respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved

We've now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!

OpenAI's Noam Brown:

Before we shared our results, we spoke with an IMO board member, who asked us to wait until after the award ceremony to make it public, a request we happily honored.

We announced at ~1am PT (6pm AEST), after the award ceremony concluded. At no point did anyone request that we announce later than that.

As far as I can tell the Gemini team was participating in an official capacity, while OpenAI were not. Noam again:

~2 months ago, the IMO emailed us about participating in a formal (Lean) version of the IMO. We’ve been focused on general reasoning in natural language without the constraints of Lean, so we declined. We were never approached about a natural language math option.

Neither OpenAI nor Gemini used Lean in their attempts, which would have counted as tool use.

Via Hacker News

Tags: mathematics, ai, openai, generative-ai, llms, gemini, llm-reasoning

[原文链接](https://simonwillison.net/2025/Jul/21/gemini-imo/#atom-everything)

进一步信息揣测

- **公关策略差异**:OpenAI选择在周六发布结果以抢占舆论先机,而Gemini团队可能因内部审批流程(如PR部门审核)延迟至周一发布,尽管官方否认了这一传言,但暗示科技公司存在对发布时机的策略性考量。 - **模型优化内幕**:Gemini的Deep Think模式并非“通用”优化,而是专门针对IMO进行了定制训练,包括使用**精选数学问题高质量解数据集**和**IMO解题技巧提示**,这与OpenAI声称的“未针对IMO优化”形成对比,实际两者均进行了针对性调整。 - **技术细节未公开**:Gemini采用的“并行思考”技术和新型强化学习方法(如多步推理数据利用)未在公开论文或文档中详细说明,属于内部研究机密,可能涉及未发布的算法突破。 - **行业竞争暗流**:尽管双方模型得分相同(35/42),但Gemini强调其“单模型、无人工翻译、限时完成”的优势,间接贬低OpenAI可能依赖的额外技术或人工干预(如去年DeepMind需手动翻译题目)。 - **IMO题目设计内幕**:第6题作为“压轴难题”仅有6名人类选手得满分(630人中),55人部分得分,暗示AI与顶尖人类选手的差距仍集中在极端复杂问题上,且IMO可能故意设置此类题目作为分水岭。 - **时间成本对比**:去年DeepMind的模型需3天解单题,今年Gemini在比赛规定时间内完成,侧面反映技术迭代速度远超公开报道,但未透露具体算力消耗或硬件升级细节。 - **学术合作灰色地带**:Gemini团队提到“IMO董事会要求AI实验室统一公布结果”,但OpenAI和Gemini仍存在发布时间竞争,揭示学术界与企业的潜在利益冲突或非正式协议。