Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.
Anthropic accused three Chinese AI companies of distilling Claude, and on the same day OpenAI retired SWE-bench Verified. Training fraud and evaluation flaws exposed simultaneously on February 23, 2026.
Andrej Karpathy coined "Claws" as an upper layer for AI agents, and June Kim answered the same question from a different angle with the Cord framework implemented with MCP and SQLite. This piece organizes the shift from single-shot agents to autonomous coordination systems from both conceptual and implementation perspectives.
Two February 2026 papers on reducing inference cost: Together AI’s Consistency DLM (up to 14.5× faster) and MIT/Harvard’s Attention Matching KV compaction (50× compaction in seconds).
NVIDIA has released Nemotron-Nano-9B-v2-Japanese. It takes first place in the sub-10B category on Nejumi Leaderboard 4, delivering strong performance in Japanese knowledge, QA, and tool calling.
Anthropic has released the mid-sized model Claude Sonnet 4.6. In Claude Code evaluations, 70% of users preferred it over Sonnet 4.5, and 59% preferred it over Opus 4.5, while pricing remains unchanged.
How to configure VRAM/main memory split on the GMKtec EVO-X2 (Strix Halo) for local LLM inference. A 29.6GB model ran fine with just 8GB of dedicated VRAM.
Building an NSFW-capable local LLM on the GMKtec EVO-X2 (Strix Halo). Getting GPU inference at ~11 tokens/s with LM Studio and MS3.2-24B-Magnum-Diamond.
MioTTS from Aratako is a family of 0.1B to 2.6B Japanese-English TTS models built from scratch around the custom MioCodec. Its key feature is that it runs directly in llama.cpp and Ollama.
Liquid AI's LFM2.5 uses a hybrid of short-range convolutions and attention, achieving edge optimization without SSMs. This article covers the architecture, benchmarks, and community use cases.
A technical overview of Qwen3‑TTS from Alibaba’s Qwen team: one‑line pip install, 3‑second voice cloning, natural‑language voice design, and support for 10 languages including Japanese. Apache 2.0 licensed.