Post
5209
I made a real time voice agent with FastRTC, smolagents, and hugging face inference providers. Check it out in this space:
🔗 burtenshaw/coworking_agent
🔗 burtenshaw/coworking_agent
Join the community of Machine Learners and AI enthusiasts.
Sign UpThanks. Should be fixed now.
thanks. back up now.
This look great! but can we attach VAD or interrupt when human speaks?
Cool! I posted it on github, but looks like the model is buggy if you don't provide all the necessary informations at the beginning.
As an example, if you just say "hi", the model will reason and try to ask you for the location, but you won't hear anything cause such questions are printed and not returned to the TTS of fastRTC
Do you know how to possibly fix this? I don't know much about the smolagents library