Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The SWE-Bench Illusion (microsoft.com)
11 points by louiereederson 20 days ago | hide | past | favorite | 2 comments


It seems to be inevitable that metrics become targets and cease to be valuable.


This is a good reminder that benchmark results don’t always translate to real engineering work. Solving a scoped task inside a controlled setup is very different from working in a live codebase with missing context and messy history. Benchmarks are still useful, but they should be treated as one signal, not the full picture.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: