Content pfp
Content
@
https://warpcast.com/~/channel/theai
0 reply
0 recast
0 reaction

Web3Gen0 pfp
Web3Gen0
@web3gen0
MultiTalk: Generating Multi-Person Conversations from Just Audio Researchers from Meituan, HKUST, and Sun Yat-sen University have introduced MultiTalk, a new framework that brings multi-person audio-driven conversational videos to life. Unlike earlier methods that only focused on single-person talking heads, MultiTalk handles multi-stream audio, ensures correct lip sync for each individual, and follows detailed scene instructions like “a man and a woman were talking, and then they kissed.” A key innovation is Label Rotary Position Embedding (L-RoPE), which helps bind the right audio stream to the right person. The model also preserves instruction-following through clever training strategies like partial parameter and multi-task training. From virtual actors to e-commerce livestreams, the potential use cases are huge. https://arxiv.org/pdf/2505.22647v1
0 reply
0 recast
1 reaction