@nt
the trick to making voice agents fast is pipelining everything:
first pipeline: audio packets -> speech to text -> turn-taking model
second: LLM -> text to speech -> encoding -> output
here's a render of my current latency. bear in mind, I'm running this locally from a wooden hut in the mountains in Turkey - it should be significantly faster once I host it!