Skip to content
Provenance Brief
Primary Source

Official announcement from Openai. These are their claims—they have marketing incentives.

Why we no longer evaluate SWE-bench Verified

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress.

Read Original

Why we no longer evaluate SWE-bench Verified

TLDR

SWE-bench Verified is increasingly contaminated and mismeasures frontier coding progress.

Open
O open S save B back M mode