An open-source TTS model distilled from the ZipVoice architecture into four inference steps, delivering voice cloning with 1 GB of VRAM and 150x real-time speed. It also compares itself with the other TTS models covered on this blog.
A local-first voice cloning, TTS, and audiobook app that brings Qwen3-TTS, Chatterbox, Kokoro, and IndexTTS-2 into a single GUI. It uses a FastAPI backend, Flutter UI, and an MCP server.