SWE-bench Verified no longer measures frontier coding capabilities

(openai.com)

340 points | by kmdupree 4 days ago ago

206 comments