Sarvam AI released 30B and 105B models trained entirely in India—from pretraining through RL—featuring support for 22 constitutionally recognized Indian languages and inference optimizations.
A summary of GPT-5.3 Instant’s hallucination reductions and safety regressions, GPT-5.4’s computer use, Tool Search, and 1M-token context, plus Saguaro’s 5× inference speedups.
All variants of huihui-ai's Qwen 3.5 abliterated produced garbage tokens. GLM-4.7-Flash abliterated had a broken chat template. The official version with thinking disabled turned out to be the right answer.
Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.
Andrej Karpathy coined "Claws" as an upper layer for AI agents, and June Kim answered the same question from a different angle with the Cord framework implemented with MCP and SQLite. This piece organizes the shift from single-shot agents to autonomous coordination systems from both conceptual and implementation perspectives.
Two February 2026 papers on reducing inference cost: Together AI’s Consistency DLM (up to 14.5× faster) and MIT/Harvard’s Attention Matching KV compaction (50× compaction in seconds).
NVIDIA has released Nemotron-Nano-9B-v2-Japanese. It takes first place in the sub-10B category on Nejumi Leaderboard 4, delivering strong performance in Japanese knowledge, QA, and tool calling.
How to configure VRAM/main memory split on the GMKtec EVO-X2 (Strix Halo) for local LLM inference. A 29.6GB model ran fine with just 8GB of dedicated VRAM.
Building an NSFW-capable local LLM on the GMKtec EVO-X2 (Strix Halo). Getting GPU inference at ~11 tokens/s with LM Studio and MS3.2-24B-Magnum-Diamond.
A technical overview of Qwen3‑TTS from Alibaba’s Qwen team: one‑line pip install, 3‑second voice cloning, natural‑language voice design, and support for 10 languages including Japanese. Apache 2.0 licensed.