#Voice AI

2 articles

TechApr 30, 2026updated10 min

NII's 48,000-Hour Audio Dataset Is Raw Material for TTS

NII/LLMC released CC Audio and Archive.org Audio Dataset. URL lists, metadata, and a downloader covering 48,000+ hours of Japanese audio. What it actually contains and how it fits into TTS, ASR, and audio model training.

AI Voice AI Speech Synthesis Speech Recognition TTS STT LLM Machine Learning

TechFeb 6, 20266 min

Qwen3-Omni: An omni-modal MoE that unifies text, image, speech, and video with 3B active parameters

A technical walkthrough of Alibaba's Qwen3-Omni-30B-A3B. An omni-modal model that activates only 3B out of 30B and responds with speech from text/image/audio/video inputs. The article organizes the Thinker–Talker architecture, benchmarks, and the overall Qwen3 MoE family.

AI LLM Open Source Multimodal Voice AI