#HuggingFace

4 articles

TechJun 14, 2026updated12 min

ZONOS2 on an 8GB RTX 4060 Laptop (WSL2): it runs, but ~20x slower than realtime

Tested ZONOS2 on an 8GB RTX 4060 Laptop (WSL2): the 15.3GB bf16 weights run via Windows system-memory fallback, a KV-cache override, and the CUDA toolkit at ~1/20 realtime. Plus a Japanese name-accent gotcha with A/B audio.

AI TTS Speech Synthesis ZONOS2 Zyphra HuggingFace Japanese Experiment

TechJun 9, 202610 min

Laxhar's SenseNova U1 LoRA trainer: bf16 on 32GB GPU, ~20GB peak VRAM

Laxhar's U1 trainer needs 32GB+ GPU, bf16 only — 4bit broke gen tower. Prefix offload keeps ~20GB peak. 8-step LoRA stack, A3B MoE compat, official training code gap.

AI Image Generation LoRA HuggingFace MoE

TechMay 19, 2026updated9 min

Lance 3B unified multimodal: 40GB VRAM, RunPod costs, and why weights are split

40GB+ VRAM for a 3B model. VBench 85.11 beats dedicated 14B video generators. RunPod GPU costs from $2.2/session. The 'unified' model still ships as two checkpoint files.

AI マルチモーダル画像生成動画生成 VLM オープンソース HuggingFace

TechApr 10, 202610 min

Sentence Transformers v5.4 Adds Unified Embeddings for Text, Image, Audio, and Video

Sentence Transformers v5.4 adds multimodal support. Eight embedding models and four rerankers including Qwen3-VL and NVIDIA Nemotron can now be used through a unified API.

AI Embedding Multimodal RAG HuggingFace Python