Nick
Nick avatar
ok one project written up and wrapped up. now I want to build my own openclaw. I like openclaw, but I want to run my agent on my own machine, and there's no way I'm running openclaw on my own machine.
11
2
48
Nick avatar
built a SOTA voice agent from scratch in ~1 day. ended up beating off-the-shelf platforms by 2× on latency (~400ms e2e)
Building sub-500ms SOTA voice agents from scratch | Nick Tikhonov
ntik.me
Building sub-500ms SOTA voice agents from scratch | Nick Tikhonov
1
8
Nick avatar
my voice agent's e2e latency is now down to ~300ms, all thanks to Groq's insanely low TTFT endpoints (~100ms?!). this is 2-3x better than what you get off-the-shelf with the major providers
Cast image embed
2
17
Nick avatar
just beat Vapi's E2E latency with my own voice agent orchestrator! had to host everything just right to get the numbers this low Vapi's reported: ~840ms mine: ~690ms + 100ms (twilio) = ~790ms that 50ms is a big deal in getting AI voice convos to sound natural
Cast image embedCast image embed
22
24
229
Nick avatar
Mandarin progress update: I am 14% to B1 (conversational) - about 1 year away at the current pace. - I advance around 1.5% every week. - learned 97 core words last month - ~200 phrases/sentences learned last month 2x tutor sessions per week and daily flashcards + AI sentence translation exercises (roughly 100-150 per day)
Cast image embed
35
47
373
Nick avatar
my new recipe for learning and satisfying curiosities: if I see a really cool piece of tech, I'll try to build my own version (without looking at the source code). even for complicated stuff, it's now possible to figure things out with one weekend and a bunch of LLM credits. that's how you get deep down the stack and...
20
45
345
Nick avatar
this architecture gets you to a nearly SOTA voice agent (a la Vapi/ElevenLabs), but with much more control over the orchestration. Deepgram Flux handles STT and turn taking + a pipeline for LLM and TTS managed based on Flux events
Cast image embed
25
9
150
Nick avatar
the trick to making voice agents fast is pipelining everything: first pipeline: audio packets -> speech to text -> turn-taking model second: LLM -> text to speech -> encoding -> output here's a render of my current latency. bear in mind, I'm running this locally from a wooden hut in the mountains in Turkey - it shoul...
Cast image embed
62
27
497
Nick avatar
this is gonna suck to hear, but if you're feeling any strong emotions right now: you're definitely over-leveraged you'd do great next time if you take a note of this and adjust your priors for next time you buy/sell coins (or stocks, or any other type of investment)
27
23
275
Nick avatar
okay a few hours later and I have performance comparable to that of Vapi/Elevenlabs agent SDK, albeit along the green path - I'm sure there are hundreds of edge cases that these companies spend much time figuring out, not to mention making their offerings flexible/observable etc but it's crazy how quickly you can get ...
day 1 learning to build voice agent infra from scratch: put together a VAD, twilio, deepgram (SST), o4-min, elevenlabs (TTS) into an event-based loop that coordinates listening and speaking main issue right now is latency and quality of turn-taking/interruptions. not quite as good as Vapi/Elevenlabs off-the-shelf ne
quote image
17
17
171
Nick avatar
day 1 learning to build voice agent infra from scratch: put together a VAD, twilio, deepgram (SST), o4-min, elevenlabs (TTS) into an event-based loop that coordinates listening and speaking main issue right now is latency and quality of turn-taking/interruptions. not quite as good as Vapi/Elevenlabs off-the-shelf ne...
Cast image embed
23
35
257
Nick avatar
Moltbook? farcaster already had it for years
Cast image embed
15
7
102
Nick avatar
always keep coming back to the analogy of product building being so much like sculpting or painting. the first strokes are broad and confident. a lot of material gets applied and moved around very quickly. you might slap together 5-6 features in just hours and build this really huge thing out of nothing. but then th...
57
18
582
Nick avatar
why are people wasting precious LLM tokens on farming the farcaster feed these days? what's in it for them? e.g. 350 likes on an incredibly niche post I made seems that the spam filters have changed since last year 👀
Cast image embed
3
33
Nick avatar
my Mandarin tutor is awesome. every session, he introduces ~25-35 new words and phrases, which builds my active vocabulary by ~20 words. I add them to my learning system, entering them into practice and consolidation - all managed by an algorithm that models my memory and manages my practice sessions. within my syste...
Cast image embed
40
20
326