📰 News Berkeley Researchers Break Every Major AI Benchmark — The Scores Are Meaningless
A zero-capability agent scored 100% on SWE-bench, WebArena, and Terminal-Bench without writing a single line of solution code. Berkeley's research exposes systemic failures in how AI is measured and marketed.