joelceth pfp
joelceth

@joelceth

anthropic and openai released their new models on the same day at the same time. opus 4.6 thinks deep, codex 5.3 runs fast. benchmarks show no clear winner. so which one actually gets the job done? opus 4.6 offers a 1 million token context window for the first time. it can read an entire project's codebase in one go. sixteen independent agents worked together to build a c compiler from scratch, and that compiler successfully compiled the linux kernel. on top of that, it found 500 previously unknown security vulnerabilities in open source software without anyone asking it to. codex 5.3 is the first model that helped build itself. it debugged its own training runs and managed its deployment infrastructure. it runs 25 percent faster than its predecessor and does the same job with less than half the tokens. it scores 77.3 percent on terminal tasks, beating opus at 65.4 percent.
2 replies
12 recasts
63 reactions