AI Digest
← Back to all articles
OpenAI
·OpenAI·1 min read

# OpenAI Abandons SWE-bench Verified Over Contamination Concerns

OpenAI announced it will no longer use SWE-bench Verified to evaluate its AI coding models, citing serious quality issues that undermine the benchmark's reliability.

The company identified two critical problems with the popular coding benchmark: flawed tests and training data leakage. According to OpenAI, these issues mean SWE-bench Verified no longer accurately measures how well cutting-edge AI models can write and fix code.

Training data contamination occurs when test questions appear in the data used to train AI models, allowing them to essentially "cheat" by memorizing answers rather than demonstrating genuine problem-solving ability. This makes it impossible to know if a model truly understands coding or has simply seen the solutions before.

OpenAI is now recommending SWE-bench Pro as an alternative benchmark for evaluating AI coding capabilities.