#AI

253 articles

TechFeb 6, 20266 min

UltraFlux-v1 - a native 4K image generation model based on FLUX.1-dev

Technical details of UltraFlux-v1, a model that pushes FLUX.1-dev into native 4K generation. It covers the differences from Z-Image and FLUX.2 Klein, its RoPE extensions and VAE improvements, and practical caveats.

AI Image Generation FLUX 4K

TechFeb 6, 20266 min

Qwen3-Omni: An omni-modal MoE that unifies text, image, speech, and video with 3B active parameters

A technical walkthrough of Alibaba's Qwen3-Omni-30B-A3B. An omni-modal model that activates only 3B out of 30B and responds with speech from text/image/audio/video inputs. The article organizes the Thinker–Talker architecture, benchmarks, and the overall Qwen3 MoE family.

AI LLM Open Source Multimodal Voice AI

TechFeb 5, 20264 min

UI-TARS-1.5-7B: a vision AI agent that reached SOTA in GUI grounding

A technical look at ByteDance's UI-TARS-1.5-7B, which beats OpenAI CUA and Claude 3.7 by a wide margin at identifying GUI elements from screenshots, and can run locally with a desktop app.

AI LLM Agent Open Source

TechFeb 4, 20265 min

Qwen3-Coder-Next: A Local Coding Agent with 3B Active Parameters

Technical overview of Alibaba’s Qwen3-Coder-Next. An ultra-efficient MoE with 80B parameters but only 3B activated, runs even on a single RTX 4090. Brings 70%+ SWE-Bench performance to local use.

AI LLM Open Source Agent

TechFeb 4, 20263 min

ACE-Step 1.5: AI Music Generation Gets a Full Architecture Overhaul

ACE-Step V1.5 has been released with a hybrid LM+DiT architecture, 50+ language support, and 4GB VRAM minimum — a major evolution from V1.0.

AI Audio Generation Local LLM

TechFeb 4, 2026updated3 min

InfiniteTalk: Audio-Driven Lip Sync Built on Wan 2.1

Published as an official ComfyUI workflow, InfiniteTalk is a lip-sync model specialized in generating mouth animation from audio files. This article covers how it differs from MOVA and Vidu Q3 and what models it requires.

AI Video Generation ComfyUI Lip Sync

TechFeb 4, 20265 min

UI UX Pro Max Skill: Comparing an AI UI-generation skill with my earlier articles

A comparison between the 'UI UX Pro Max Skill' for AI coding assistants such as Claude Code and the UI/UX improvement articles I wrote earlier. Which works better: automatic inference or explicit human intent?

Claude Code AI UI UX

TechFeb 4, 20263 min

AnimeGamer: AI That Understands Game State to Generate Anime Videos

AnimeGamer, developed by Tencent ARC Lab, generates anime-style videos while tracking game-state transitions. It takes a fundamentally different approach from general-purpose video generation models.

AI Video Generation Game Anime

TechFeb 3, 20264 min

MOVA: the first open-source model that generates video and audio together

MOVA-720p from the OpenMOSS team is an open-source model that generates video and audio in a single pass. This article covers how it differs from closed models like Vidu Q3 and what its architecture looks like.

AI Video Generation Audio Generation Open Source

TechFeb 3, 20263 min

LingBot-World: Ant Group's open-source real-time world model

Robbyant, an Ant Group subsidiary, released LingBot-World, a world model that generates interactive video in real time from a single image. This article covers how it differs from conventional video generators, its technical features, and Apple Silicon support.

AI Video Generation World Model Robotics

TechFeb 3, 2026updated5 min

How to reproduce NovelAI Precise Reference locally: ComfyUI + IP-Adapter

Preparatory notes for reproducing NovelAI's Precise Reference feature locally with ComfyUI + IP-Adapter. Covers setup steps and parameter tuning.

AI Image Generation ComfyUI IP-Adapter NovelAI Apple Silicon

TechFeb 3, 20263 min

planning-with-files: a Manus-style plan management skill for Claude Code

A file-system-based approach to carrying plans, research, and progress across sessions by using files as extended memory.

Claude Code AI