A local-first voice cloning, TTS, and audiobook app that brings Qwen3-TTS, Chatterbox, Kokoro, and IndexTTS-2 into a single GUI. It uses a FastAPI backend, Flutter UI, and an MCP server.
Upscaling images loaded via the Load Image node was producing garbled output. Fixed it by addressing the non-contiguous tensor issue — a one-line patch to comfy/utils.py.
MioTTS from Aratako is a family of 0.1B to 2.6B Japanese-English TTS models built from scratch around the custom MioCodec. Its key feature is that it runs directly in llama.cpp and Ollama.
Went 0-for-13 trying to train an Illustrious-XL LoRA on a Mac Studio M1 Max 64GB. With help from multiple AI agents, pinpointed the root causes and finally succeeded on a RunPod RTX 4090. The full record: three fatal parameters and the sd-scripts trap.
A look at ActionMesh from Meta AI Research. Feed it a video and it outputs animated 3D meshes in .glb format for tools like Blender and Unity. This article covers input limits, runtime requirements, and practical use with AI video generation.
A deep dive into the claude-code-best-practice repository (1,500+ stars on GitHub), covering CLAUDE.md tips, the Command-Agent-Skills 3-layer architecture, Hooks-based notifications, RPI workflow, and more.
Liquid AI's LFM2.5 uses a hybrid of short-range convolutions and attention, achieving edge optimization without SSMs. This article covers the architecture, benchmarks, and community use cases.
ByteDance’s Seedance 2.0 has been released on Dreamina. From the perspective of someone who has been using Wan 2.x and ComfyUI locally, I considered how the "ease" differs between local and cloud-based video generation services.
A summary of ComfyUI's 'The Complete AI Upscaling Handbook' covering the difference between conservative and creative upscaling, model selection by use case, and benchmarks for both image and video.
A technical overview of Qwen3‑TTS from Alibaba’s Qwen team: one‑line pip install, 3‑second voice cloning, natural‑language voice design, and support for 10 languages including Japanese. Apache 2.0 licensed.
Technical details of UltraFlux-v1, a model that pushes FLUX.1-dev into native 4K generation. It covers the differences from Z-Image and FLUX.2 Klein, its RoPE extensions and VAE improvements, and practical caveats.
A technical walkthrough of Alibaba's Qwen3-Omni-30B-A3B. An omni-modal model that activates only 3B out of 30B and responds with speech from text/image/audio/video inputs. The article organizes the Thinker–Talker architecture, benchmarks, and the overall Qwen3 MoE family.