@s5eeo
This video nicely illustrates why prepping relevant context for LLMs - rather than indiscriminately dumping a ton of text into the context - is important for LLM-based applications, even beyond optimizing cost and response times. This is because performance degrades more with increasing input length than a lot of devs might expect.
This is still true for models that are touted to support a 1 million or even 10 million context window, with great results on needle-in-the-haystack benchmarks.
https://youtu.be/TUjQuC4ugak