@sidshekhar
AI models today are evaluated on college exam questions, not real world tasks.
from our real world trials, models differ vastly in their performance on real world tasks.
specifically:
- tool calling
- parsing and understanding data effectively
- executing actions (via APIs and SDKs)
some of the vertical-specific tasks we've used to evaluate ai models while building our ai wallet gina:
- sending a transaction
- swap from one asset into multiple assets
- execute cross-chain swaps
- fetch and analyze historical price data for multiple assets