Talk over your AI
like a person.
A drop-in Python voice engine for agents: Whisper ASR, Piper TTS, barge-in detection, turn-taking, audio routing, runtime invariants, and a validated state machine. This is the engine that runs Buddy, Gary, Lil Homie, and Ada.
Barge-in detection
VAD-powered interruption with debounced handoff. Talk over the AI mid-sentence and it stops immediately.
Floor management
Turn-taking arbitration. User and agent never talk over each other endlessly. Floor ownership is explicit and logged.
Validated state machine
IDLE → LISTENING → THINKING → SPEAKING → INTERRUPTING. Every transition validated. Illegal moves rejected.
Worker pipeline
Listen, think, playback, keyboard interrupt, and invariant monitor workers. All threaded, all coordinated.
Audio pub-sub routing
Broadcaster routes mic audio to multiple consumers simultaneously. Ring buffer for configurable capture duration.
Runtime health
Invariant checker catches illegal states, dead workers, and broken transitions on a 2-second loop.
Four production agents run on this engine daily.
from voice2 import VoiceEngine, VoiceConfig
def my_agent(txt: str) -> str:
# Your LLM call — Claude, GPT, local model, whatever
return "You said: " + txt
config = VoiceConfig()
engine = VoiceEngine(config, ask_fn=my_agent)
engine.start()
engine.join() Mic opens. Whisper listens. Your agent thinks. Piper speaks. User interrupts at any point. That's the whole loop.
faster-whisper (MIT) · Piper (MIT) · sounddevice (MIT) · numpy (BSD)
Third-party components remain under their original licenses. AIIT-Voice2 packages the orchestration, state control, floor management, interrupt handling, and worker pipeline.
MIT License. 2,300 lines of engine code. Full source. Full support via email.
Engine architecture — Rhet Wike + Claude Opus 4.6, Council Hill OK, 2026.
Production validation — Buddy, Gary, Lil Homie, Ada — daily use since April 2026.