qt pfp

qt

@qt

2637 Following
24764 Followers


qt pfp
1 reply
1 recast
9 reactions

qt pfp
3 replies
0 recast
17 reactions

qt pfp
0 reply
0 recast
11 reactions

qt pfp
0 reply
0 recast
3 reactions

qt pfp
4 replies
3 recasts
40 reactions

qt pfp
0 reply
1 recast
13 reactions

qt pfp
0 reply
0 recast
2 reactions

qt pfp
0 reply
0 recast
6 reactions

qt pfp
"We introduce \model, a multimodal, late-interaction retriever that jointly indexes four modalities: video frames, transcribed speech, on-screen text, and other metadata. \model jointly encodes all modalities within a unified multimodal backbone for improved contextualization and is trained to enhance dynamic modality selection via two key innovations. First, to overcome the lack of training data for multimodal retrieval, we introduce MultiVENT 2.0++, a large-scale synthetic training dataset built on MultiVENT 2.0 (a dataset of event-centric videos in various languages paired with English queries) with modality-targeted queries to teach modality selection. Next, we propose a modality-aware contrastive loss that jointly trains according to a standard contrastive objective alongside an objective for learning correct modality usage." https://arxiv.org/html/2506.06144v1
1 reply
0 recast
2 reactions

qt pfp
0 reply
0 recast
7 reactions

qt pfp
1 reply
2 recasts
6 reactions

qt pfp
0 reply
3 recasts
8 reactions

qt pfp
2 replies
1 recast
24 reactions

qt pfp
1 reply
1 recast
8 reactions

qt pfp
0 reply
2 recasts
5 reactions

qt pfp
1 reply
0 recast
9 reactions

qt pfp
2 replies
0 recast
10 reactions

qt pfp
0 reply
0 recast
4 reactions

qt pfp
0 reply
0 recast
2 reactions

qt pfp
0 reply
0 recast
8 reactions