Research

Academic or research source. Check the methodology, sample size, and whether it's been replicated.

OpenAI wants to retire the AI coding benchmark that everyone has been competing on

OpenAI says the popular SWE-bench Verified coding benchmark is broken: most tasks are flawed enough to reject correct solutions, and leading AI models have likely seen the answers during training.

The Decoder · Feb 23, 2026 19:08 UTC · ~4 min read

Read Original

OpenAI wants to retire the AI coding benchmark that everyone has been competing on

TLDR

OpenAI says the popular SWE-bench Verified coding benchmark is broken: most tasks are flawed enough to reject correct solutions, and leading AI models have likely seen the answers during training.

Open

O open S save B back M mode