Esfera base.eth pfp
Esfera base.eth

@esfera

I like tracking data on new models cause it is a good way to monitor the AI bubble and see whether we are still getting real progress and innovation. The new Gemini 3 Pro and the deep research feature have recently jumped ahead of the competition, but it is worth keeping in mind that AI models tend to leapfrog each other all the time, creating a cycle where every model has its moment. Another thing to remember is that different models perform differently across various tasks such as code editing, refactoring or scientific exams. Results also depend heavily on the evaluation methodology. This is why it makes sense to monitor AI benchmarks through several independent dashboards. The most useful ones are those that: - gather tests from multiple institutions - rely on consistent methodologies - are not tied to a single vendor. For example: - https://arcprize.org/leaderboard - https://vellum.ai/llm-leaderboard - https://typethink.ai/leaderboard/llm - https://aider.chat/docs/leaderboards/
0 reply
0 recast
1 reaction