Artificial Intelligence (AI)

Hey folks!
I recently came across something super interesting and wanted to share it with you all.

Paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

The authors introduce LLaVA-Scissor, a training-free token compression strategy for video large language models. Unlike most methods that rely on attention scores (which often miss out on important semantic areas), this one smartly uses Semantic Connected Components (SCC) to group tokens into meaningful, non-overlapping regions across both space and time!

It’s an elegant approach that retains semantic richness while significantly cutting down token redundancy. Impressively, it holds up across challenging tasks like video question answering, long video understanding, and multi-choice benchmarks, especially when working with low token counts.

🔗 Check out their project here: arxiv.org/abs/2506.21862

If you’re exploring video LLMs or care about efficient token usage, this is definitely worth a read!