@eito
Loved hearing @clefourrier on @latentspacepod on missing in current LLM benchmarks!
In particular Calibration - In QA contexts, how calibrated are the log likelihood probabilities for the correct answers?
This is key for "measuring hallucination" in LLMs, and defo the way forward