VoxCPM2 sits in the tokenizer-free corner. Mapped vs F5-TTS, CosyVoice2, Irodori-TTS, Style-Bert-VITS2; plus why Japanese TTS still leans on OpenJTalk.
SB Intuitions released sarashina2.2-tts, an LLM-based TTS model focused on Japanese. It clones speaker voice and style from short reference audio without fine-tuning, and handles Japanese-English code-switching.
An open-source TTS model distilled from the ZipVoice architecture into four inference steps, delivering voice cloning with 1 GB of VRAM and 150x real-time speed. It also compares itself with the other TTS models covered on this blog.
A local-first voice cloning, TTS, and audiobook app that brings Qwen3-TTS, Chatterbox, Kokoro, and IndexTTS-2 into a single GUI. It uses a FastAPI backend, Flutter UI, and an MCP server.