@nt
day 1 learning to build voice agent infra from scratch:
put together a VAD, twilio, deepgram (SST), o4-min, elevenlabs (TTS) into an event-based loop that coordinates listening and speaking
main issue right now is latency and quality of turn-taking/interruptions. not quite as good as Vapi/Elevenlabs off-the-shelf
next, I'm going to use Deepgram's Flux for both STT and endpointing