20250728-Introducing_OSS_Rebuild_Open_Source,_Rebuilt_to_La

原文摘要

Introducing OSS Rebuild: Open Source, Rebuilt to Last

Major news on the Reproducible Builds front: the Google Security team have announced OSS Rebuild, their project to provide build attestations for open source packages released through the NPM, PyPI and Crates ecosystom (and more to come).

They currently run builds against the "most popular" packages from those ecosystems:

Through automation and heuristics, we determine a prospective build definition for a target package and rebuild it. We semantically compare the result with the existing upstream artifact, normalizing each one to remove instabilities that cause bit-for-bit comparisons to fail (e.g. archive compression). Once we reproduce the package, we publish the build definition and outcome via SLSA Provenance. This attestation allows consumers to reliably verify a package's origin within the source history, understand and repeat its build process, and customize the build from a known-functional baseline

The only way to interact with the Rebuild data right now is through their Go CLI tool. I reverse-engineered it using Gemini 2.5 Pro and derived this command to get a list of all of their built packages:

 gsutil ls -r 'gs://google-rebuild-attestations/**'

There are 9,513 total lines, here's a Gist. I used Claude Code to count them across the different ecosystems (discounting duplicates for different versions of the same package):

  • pypi: 5,028 packages
  • cratesio: 2,437 packages
  • npm: 2,048 packages

Then I got a bit ambitious... since the files themselves are hosted in a Google Cloud Bucket, could I run my own web app somewhere on storage.googleapis.com that could use fetch() to retrieve that data, working around the lack of open CORS headers?

I got Claude Code to try that for me (I didn't want to have to figure out how to create a bucket and configure it for web access just for this one experiment) and it built and then deployed https://storage.googleapis.com/rebuild-ui/index.html, which did indeed work!

Screenshot of Google Rebuild Explorer interface showing a search box with placeholder text "Type to search packages (e.g., 'adler', 'python-slugify')..." under "Search rebuild attestations:", a loading file path "pypi/accelerate/0.21.0/accelerate-0.21.0-py3-none-any.whl/rebuild.intoto.jsonl", and Object 1 containing JSON with "payloadType": "in-toto.io Statement v1 URL", "payload": "...", "signatures": [{"keyid": "Google Cloud KMS signing key URL", "sig": "..."}]

It lets you search against that list of packages from the Gist and then select one to view the pretty-printed newline-delimited JSON that was stored for that package.

The output isn't as interesting as I was expecting, but it was fun demonstrating that it's possible to build and deploy web apps to Google Cloud that can then make fetch() requests to other public buckets.

Hopefully the OSS Rebuild team will add a web UI to their project at some point in the future.

Via Hacker News

Tags: google, packaging, pypi, security, npm, ai, generative-ai, llms, ai-assisted-programming, supply-chain, vibe-coding, claude-code

[原文链接](https://simonwillison.net/2025/Jul/23/oss-rebuild/#atom-everything)

进一步信息揣测

- **绕过CORS限制的实战技巧**:通过将网页部署到Google Cloud Storage(`storage.googleapis.com`域名下),可以绕过跨域限制直接调用同域下的API(如访问其他公开Bucket数据),无需配置CORS头部。这是一种非官方但有效的变通方案。 - **Google内部工具的反向工程方法**:使用Gemini 2.5 Pro等AI工具逆向解析Google CLI工具的隐藏功能(如列出所有重建包的GS路径),揭示了未公开的数据访问方式(`gsutil ls -r`命令)。 - **SLSA Provenance的实际应用内幕**:Google通过标准化构建流程(如消除压缩差异等不稳定因素)实现二进制比对,但具体启发式规则和自动化细节未公开,这些是保证重建可靠性的核心。 - **云存储的隐秘用途**:Google Cloud Bucket不仅用于存储,还可直接托管静态网页(如示例中的`rebuild-ui`),且能动态加载同域其他Bucket数据,这一用法未被官方文档重点宣传。 - **数据规模的非官方统计**:通过Claude Code分析GS路径得出的包数量(PyPI 5028、Crates 2437、NPM 2048)是第三方统计结果,官方未主动披露具体覆盖范围。 - **构建验证的局限性**:当前输出(JSON文件)的实际信息量有限,可能隐藏了更详细的构建日志或验证逻辑,但未开放给外部用户,需依赖后续API更新。 - **行业合作潜规则**:项目优先支持NPM/PyPI/Crates等主流生态,暗示其他仓库(如Maven)需社区推动或商业合作才能纳入,反映了开源安全投入的优先级策略。