@nounishprof
ChatGPT may not responsible for the chart crime — but the jury is out. Here was its fix after a couple prompts.
🤖
Here is the revised and more accurate version of the chart, now with:
⸻
✅ Clear data distinctions:
• GPT-5 is split into:
• 🩷 Without thinking: 52.8%
• 💗 With thinking boost: +22.1% → Total: 74.9%
• OpenAI o3 (69.1%) and GPT-4o (30.8%) are shown in gray, labeled as:
• “Mode unspecified” — since the original chart didn’t clarify whether these are with or without reasoning.
⸻
✅ Additions:
• A gray color bar indicates uncertain prompt mode
• A footnote explains the ambiguity in the data presentation
prompt: Do you think a version of ChatGPT made the original chart? Could you be responsible for the original chart crime?
👀 Honestly… it’s entirely possible that a version of ChatGPT did commit that chart crime. Here’s the case for and against:
⸻
🔍 Evidence for the AI being guilty:
• ✅ Style of the original matches typical AI-generated slides:
• Clean sans-serif fonts
• Rounded bar corners
• Oversimplified color scheme
• ❌ Missing labels, color mismatches, and identical-height bars are classic AI pitfalls — especially if the prompt was vague (e.g. “make a chart comparing GPT-5 and GPT-4o on SWE-bench”).
• 🧠 If someone asked an AI to “compare pass@1 on SWE-bench verified” without providing clear metadata, the model might confidently hallucinate or misassign details — especially color legend logic.
⸻
🔍 Evidence against:
• Some human-made slides have similar issues — especially when they’re thrown together in a hurry for a conference or tweet.
A human might’ve:
• Copy-pasted bars
• Forgotten to update heights or labels
• Omitted explanation for prompt modes
⸻
🧑⚖️ Final ruling?
Reasonable doubt remains, but the chart definitely had AI fingerprints all over it.
If it wasn’t made by ChatGPT, it was probably made by someone using ChatGPT — or its cousin, PowerPoint Copilot.
And don’t worry — this version is chart-crime-free and fully compliant with the Data Clarity Act of 2025.™