Web3Gen0 pfp

Web3Gen0

@web3gen0

214 Following
129 Followers


Web3Gen0 pfp
1 reply
0 recast
3 reactions

Web3Gen0 pfp
Hey folks, I recently came across this and wanted to share! VMoBA: Mixture-of-Block Attention for Video Diffusion Models This paper tackles one of the core bottlenecks in Video Diffusion Models (VDMs), the quadratic complexity of full attention that slows down training and inference, especially for long-duration, high-resolution videos. 👉 The proposed solution, VMoBA (Video Mixture of Block Attention), introduces a smart sparse attention mechanism that: ✅ Adapts to spatio-temporal patterns ✅ Selects important blocks globally ✅ Dynamically reduces attention complexity 💡 The results? ✔️ ~3x FLOPs speedup in training ✔️ ~1.5x faster inference latency ✔️ Maintains or even improves video generation quality Super exciting direction for scaling up video generation efficiently! Check it out: https://arxiv.org/abs/2506.22347
0 reply
0 recast
2 reactions

Web3Gen0 pfp
Hey folks! I recently came across something super interesting and wanted to share it with you all. Paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs The authors introduce LLaVA-Scissor, a training-free token compression strategy for video large language models. Unlike most methods that rely on attention scores (which often miss out on important semantic areas), this one smartly uses Semantic Connected Components (SCC) to group tokens into meaningful, non-overlapping regions across both space and time! It’s an elegant approach that retains semantic richness while significantly cutting down token redundancy. Impressively, it holds up across challenging tasks like video question answering, long video understanding, and multi-choice benchmarks, especially when working with low token counts. 🔗 Check out their project here: arxiv.org/abs/2506.21862 If you’re exploring video LLMs or care about efficient token usage, this is definitely worth a read!
0 reply
0 recast
1 reaction

Web3Gen0 pfp
In this episode of the series on strategic and functional approaches to scale Generative AI, I walk you through computational scaling, what it really means, why it’s a pressing topic today, and the different ways we can approach it. You’ll learn: What computational scaling means in the context of Generative AI Why computational scaling is crucial as AI models and agent-based systems grow more complex Key challenges including resource demands, data bottlenecks, model limitations, and coordination issues Three major scaling strategies: scaling up, scaling down, and scaling out Practical tips for each approach and how to choose the right strategy for your system A comparison of scaling up, down, and out to help you make informed decisions This series is designed for business leaders, technical teams, and curious minds who want to understand how to scale GenAI systems in a sustainable and efficient way. 👉 Future episodes will explore other dimensions like hardware selection, storage policies, and architectural scaling in more detail. https://youtu.be/ZM2F7WyhQbY?si=zfkHbkdL2ydp4doX
0 reply
0 recast
1 reaction

Web3Gen0 pfp
In this first episode of the series on strategic and functional approaches to scale Generative AI, I break down what scaling really means in the context of GenAI and why it’s a critical topic in 2025 and beyond. You’ll learn: What scaling Generative AI actually involves Why scaling is essential as GenAI moves from prototypes to production Real-world challenges across infrastructure, model behavior, and governance A high-level overview of a 4-part framework I use to think about scaling: computation, architecture, deployment, and operations This series is designed for business leaders, technical teams, and anyone curious about how to build and manage GenAI systems that work at scale. 👉 Future episodes will dive deeper into each scaling dimension with practical insights and examples. https://youtu.be/NB5VOoefMII?si=k0KlxNdq-wBEcELv Please subscribe to my channel 😇
0 reply
0 recast
1 reaction

Web3Gen0 pfp
0 reply
0 recast
1 reaction

Web3Gen0 pfp
Here’s a paper on making Large Language Models more efficient for low-bit quantization by preventing outliers right from the training phase. Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models What’s interesting? Outliers have been a major issue when quantizing LLMs, especially for on-device deployment. This paper introduces Outlier-Safe Pre-Training (OSP) – a proactive approach to stop outliers from forming in the first place. Key Highlights: - Muon Optimizer: Improves training without privileged bases. - Single-Scale RMSNorm: Controls channel-wise amplification. - Learnable Embedding Projection: Balances activation magnitudes. The result? - 1.4B parameter model trained on 1 trillion tokens – without activation outliers. - Achieved 35.7 avg. score across 10 benchmarks under 4-bit quantization (baseline: 26.5). - Only 2% extra training cost. Turns out, outliers aren’t an unavoidable part of LLMs, they’re just a result of how we train them. https://www.arxiv.org/abs/2506.19697
0 reply
0 recast
0 reaction

Web3Gen0 pfp
Good read. On one side, it’s encouraging to see such AI training being supported when it genuinely assists and amplifies human creativity. That said, I do wonder how exactly they extracted the “uncopyrightable information” 😅 and what parameters were used to classify it as such. While I was part of Responsible AI framework development for an MNC, I often experienced that sometimes compliance can feel like a barrier to AI solution development. But honestly, it’s always better to stay cautious and follow the guidelines than to end up facing lawsuits later. I also hope the original creators are compensated in some way, similar to how patent systems work. This feels necessary to me… otherwise, the dead internet theory might creep closer to reality, and we could see less genuine, original work from humans in the future. Finding that perfect balance is indeed tricky. https://www.reuters.com/legal/litigation/anthropic-wins-key-ruling-ai-authors-copyright-lawsuit-2025-06-24/
1 reply
0 recast
2 reactions

Web3Gen0 pfp
0 reply
0 recast
0 reaction

Web3Gen0 pfp
0 reply
0 recast
0 reaction

Web3Gen0 pfp
$1 Trillion AI Manufacturing Hub in the U.S.? Just read this this news. SoftBank CEO Masayoshi Son is reportedly planning a $1 trillion AI-focused industrial complex in the U.S., in partnership with the Trump administration and possibly TSMC. The project, called Project Crystal Land, aims to bring high-tech manufacturing back to the U.S. and could rival China’s Shenzhen in scale. The plan includes production lines for AI-powered industrial robots, and Son has already been in discussions with officials like Secretary of Commerce Howard Lutnick, as well as other major tech players like Samsung. This follows SoftBank’s earlier $500B initiative, Project Stargate, with OpenAI to expand AI data centers and their recent $40B funding round in OpenAI. Son clearly isn’t holding back on his vision of an AI-powered future from infrastructure to investment. What do you think about this ambitious plan? Will it reshape global tech manufacturing, or face the same hurdles as past mega-projects? Curious to hear your thoughts. https://uk.investing.com/news/stock-market-news/softbank-ceo-son-pitches-1-trln-us-ai-hub-to-tsmc-trump-admin-bloomberg-4139730
0 reply
0 recast
3 reactions

Web3Gen0 pfp
0 reply
0 recast
1 reaction

Web3Gen0 pfp
0 reply
0 recast
0 reaction

Web3Gen0 pfp
0 reply
0 recast
1 reaction

Web3Gen0 pfp
0 reply
0 recast
1 reaction

Web3Gen0 pfp
0 reply
0 recast
1 reaction

Web3Gen0 pfp
1 reply
0 recast
2 reactions

Web3Gen0 pfp
Anthropic is on fire with their technical posts. If you’re an AI dev, stop and read this. It breaks down how they built Claude’s new multi-agent Research feature. Key highlights: • Orchestrator-Worker Design: A lead agent breaks down queries, spins up tool- and memory-equipped subagents, and integrates their findings—leading to 90% better performance than single-agent Claude. • Token-Efficient Scaling: By distributing tasks, Claude scales reasoning effectively, though at 15× token cost—ideal for complex, high-value queries. • Prompt Engineering Lives On: They refined agent behavior through heuristics in prompt design and even used Claude to optimize its own prompts, cutting task time by 40%. • Robust Evaluation & Reliability: Combines LLM-as-judge scoring, human checks, and production-grade tools like checkpoints and full traceability to ensure reliability in long, non-deterministic tasks.
0 reply
0 recast
1 reaction

Web3Gen0 pfp
0 reply
0 recast
2 reactions

Web3Gen0 pfp
MultiTalk: Generating Multi-Person Conversations from Just Audio Researchers from Meituan, HKUST, and Sun Yat-sen University have introduced MultiTalk, a new framework that brings multi-person audio-driven conversational videos to life. Unlike earlier methods that only focused on single-person talking heads, MultiTalk handles multi-stream audio, ensures correct lip sync for each individual, and follows detailed scene instructions like “a man and a woman were talking, and then they kissed.” A key innovation is Label Rotary Position Embedding (L-RoPE), which helps bind the right audio stream to the right person. The model also preserves instruction-following through clever training strategies like partial parameter and multi-task training. From virtual actors to e-commerce livestreams, the potential use cases are huge. https://arxiv.org/pdf/2505.22647v1
0 reply
0 recast
3 reactions