eggman 🔵 pfp
eggman 🔵

@eggman.eth

I think whatever’s going on at xAI will probably be featured in a wolf of wall street type video someday. Their developments make zero sense. Grok used to be open-weights, and kinda garbage. Then Deepseek V3 came out. Suddenly, Grok was smart. But no longer releasing their weights (or any model info). Grok Image used to be barebones Flux1. xAI acknowledged this at the time. After Kontext was released by BFL (creators of Flux1), suddenly Grok can edit images. But it’s “Grok Image” now. Grok Video is the worst offender. It released shortly after the WAN2.1 weights were open-sourced, and was a carbon copy of it - except it knew what a cybertruck was. Same framerate, same artifacting, same everything. Then the WAN2.2 weights were released, and suddenly Grok Video had a massive upgrade. Out of nowhere. Keep in mind this was after Veo3.0, and not far off from Sora 2. xAI, with their monstrous gpu capacity, should be directly competing. But there’s not even a research paper. The upgrade remains in-line with WAN 2.2. MM-Audio starts getting popular as a tool for adding audio in post to non-video models. Once again, it’s all open weights - and suddenly, Grok has audio. But it’s not very good. Much like MM-Audio. Recently, LTX-2 debuted, bringing video with audio to the masses with an open weights release. A few days later, Grok Video suddenly has full audio capability. And a bunch of other capabilities which shipped with LTX-2, like longer video durations and far greater support for a wide range of aspect ratios. So, what’s actually going on? How was xAI so shockingly far behind Google and OpenAI for video, yet managed to catch up exactly when open source/open weights models got released to the public? How did they even manage to be around for so long without contributing a single research paper to the industry? It’s genuinely as if there’s one dude sitting there with a big collection of videos, waiting to fine-tune them on top of open source models the moment they get unveiled. And given xAI have so much GPU compute available, they’d be able to fine-tune a model extremely fast. fwiw, there’s nothing wrong with fine-tunes; that’s how 99% of the world “creates” their own models at home. But the key word is “at home”. As in, you DON’T have billions of dollars and a supposed expert r&d team with access to one of the world’s largest compute clusters. I’ve always been shocked at how low quality most of Grok’s stuff is compared to SOTA providers - but the recent video leap following on a few days after the LTX-2 release just makes for way too many coincidences in a row. I genuinely think all those GPUs are mostly sitting idle except for inference, and a quick fine-tune when there’s a new open source/open weights model to try pass off as your own.
5 replies
2 recasts
31 reactions