#TTS

11 articles

TechJun 14, 2026updated12 min

ZONOS2 on an 8GB RTX 4060 Laptop (WSL2): it runs, but ~20x slower than realtime

Tested ZONOS2 on an 8GB RTX 4060 Laptop (WSL2): the 15.3GB bf16 weights run via Windows system-memory fallback, a KV-cache override, and the CUDA toolkit at ~1/20 realtime. Plus a Japanese name-accent gotcha with A/B audio.

AI TTS Speech Synthesis ZONOS2 Zyphra HuggingFace Japanese Experiment

TechMay 13, 202611 min

VoxCPM2 and OSS TTS in 2026: Irodori-TTS, F5-TTS, and Japanese fine-tune notes

VoxCPM2 sits in the tokenizer-free corner. Mapped vs F5-TTS, CosyVoice2, Irodori-TTS, Style-Bert-VITS2; plus why Japanese TTS still leans on OpenJTalk.

AI TTS Speech Synthesis Voice Cloning Local AI Open Source Fine-tuning

TechApr 30, 2026updated10 min

NII's 48,000-Hour Audio Dataset Is Raw Material for TTS

NII/LLMC released CC Audio and Archive.org Audio Dataset. URL lists, metadata, and a downloader covering 48,000+ hours of Japanese audio. What it actually contains and how it fits into TTS, ASR, and audio model training.

AI Voice AI Speech Synthesis Speech Recognition TTS STT LLM Machine Learning

TechApr 28, 20266 min

Sarashina2.2-TTS Is a Japanese-First Zero-Shot Voice Synthesis Model

SB Intuitions released sarashina2.2-tts, an LLM-based TTS model focused on Japanese. It clones speaker voice and style from short reference audio without fine-tuning, and handles Japanese-English code-switching.

AI TTS Voice Synthesis LLM Voice Cloning

TechMar 17, 20264 min

LuxTTS - lightweight ZipVoice-based voice cloning that runs in 1 GB of VRAM

An open-source TTS model distilled from the ZipVoice architecture into four inference steps, delivering voice cloning with 1 GB of VRAM and 150x real-time speed. It also compares itself with the other TTS models covered on this blog.

AI TTS Speech Synthesis OSS Voice Cloning

TechFeb 14, 20266 min

MimikaStudio - a local TTS app that unifies multiple engines in one GUI

A local-first voice cloning, TTS, and audiobook app that brings Qwen3-TTS, Chatterbox, Kokoro, and IndexTTS-2 into a single GUI. It uses a FastAPI backend, Flutter UI, and an MCP server.

AI TTS Speech Synthesis Voice Cloning Flutter

TechFeb 12, 20267 min

MioTTS - a lightweight LLM-based TTS built from a custom codec

MioTTS from Aratako is a family of 0.1B to 2.6B Japanese-English TTS models built from scratch around the custom MioCodec. Its key feature is that it runs directly in llama.cpp and Ollama.

AI TTS Speech Synthesis Open Source LLM

TechFeb 7, 20266 min

Qwen3-TTS — Open-source speech synthesis with a single pip install

A technical overview of Qwen3‑TTS from Alibaba’s Qwen team: one‑line pip install, 3‑second voice cloning, natural‑language voice design, and support for 10 languages including Japanese. Apache 2.0 licensed.

AI TTS Speech Synthesis Open Source LLM

TechFeb 3, 2026updated3 min

KugelAudio — Open‑Source 7B‑Parameter TTS (ComfyUI‑Compatible)

Text‑to‑Speech covering 24 European languages with voice cloning. An open‑source model that outperformed ElevenLabs in the authors’ A/B tests.

ComfyUI TTS Speech Synthesis AI

TechJan 15, 2026updated4 min

Pocket TTS — Lightweight Text-to-Speech on CPU

Open-source TTS with 100M parameters that runs faster than real time on CPU. Supports voice cloning.

AI Speech Synthesis TTS Open Source

TechJan 10, 20266 min

Building a Voice-Chat AI (1): Voice API Survey

Aiming for a characterful AI with an avatar and voice chat, I started by comparing voice APIs.

AI Speech Synthesis Speech Recognition TTS STT Gemini OpenAI ChatGPT VOICEVOX Google Cloud