LLM articles - Page 7 | lilting channel

Tech Feb 28, 2026 15 min

Automated OCR Error Detection and Correction with Encoder Models + Local LLM

Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.

NLP OCR Machine Learning Python BERT LUKE Ollama LLM WSL2 NDLOCR-Lite Experiment

Tech Feb 24, 2026 8 min

Large-Scale Unauthorized Distillation of Claude and the Collapse of SWE-bench Hit on the Same Day

Anthropic accused three Chinese AI companies of distilling Claude, and on the same day OpenAI retired SWE-bench Verified. Training fraud and evaluation flaws exposed simultaneously on February 23, 2026.

AI Security Anthropic DeepSeek Benchmark LLM OpenAI SWE-bench

Tech Feb 22, 2026 updated 7 min

AI Agent Orchestration: Claws and Cord

Andrej Karpathy coined "Claws" as an upper layer for AI agents, and June Kim answered the same question from a different angle with the Cord framework implemented with MCP and SQLite. This piece organizes the shift from single-shot agents to autonomous coordination systems from both conceptual and implementation perspectives.

AI AI Agents MCP LLM Architecture Karpathy

Tech Feb 20, 2026 updated 13 min

Accelerating LLM Inference: CDLM and Attention Matching KV Compaction

Two February 2026 papers on reducing inference cost: Together AI’s Consistency DLM (up to 14.5× faster) and MIT/Harvard’s Attention Matching KV compaction (50× compaction in seconds).

AI LLM Inference Optimization KV Cache Diffusion models

Tech Feb 18, 2026 3 min

NVIDIA Nemotron 2 Nano 9B Japanese - The No.1 Japanese Model Under 10B for Sovereign AI

NVIDIA has released Nemotron-Nano-9B-v2-Japanese. It takes first place in the sub-10B category on Nejumi Leaderboard 4, delivering strong performance in Japanese knowledge, QA, and tool calling.

NVIDIA LLM Nemotron Japanese AI

Tech Feb 18, 2026 3 min

Claude Sonnet 4.6 Released, Sometimes Beating Opus 4.5 in Coding

Anthropic has released the mid-sized model Claude Sonnet 4.6. In Claude Code evaluations, 70% of users preferred it over Sonnet 4.5, and 59% preferred it over Opus 4.5, while pricing remains unchanged.

Claude Anthropic LLM Claude Code

Tech Feb 15, 2026 5 min

Exposing a Local LLM as an External API via Tailscale VPN

How I used Tailscale VPN and a ConoHa VPS to make a local LLM accessible over the internet through an OpenAI-compatible API.

AI LLM Tailscale VPN VPS Experiment

Tech Feb 15, 2026 updated 5 min

Optimizing VRAM and Memory Allocation on Strix Halo for Local LLMs

How to configure VRAM/main memory split on the GMKtec EVO-X2 (Strix Halo) for local LLM inference. A 29.6GB model ran fine with just 8GB of dedicated VRAM.

AI LLM Memory Optimization AMD LM Studio Experiment

Tech Feb 15, 2026 6 min

Setting Up a Local LLM on the GMKtec EVO-X2 (Strix Halo)

Building an NSFW-capable local LLM on the GMKtec EVO-X2 (Strix Halo). Getting GPU inference at ~11 tokens/s with LM Studio and MS3.2-24B-Magnum-Diamond.

AI LLM Local LLM LM Studio AMD Experiment

Tech Feb 12, 2026 7 min

MioTTS - a lightweight LLM-based TTS built from a custom codec

MioTTS from Aratako is a family of 0.1B to 2.6B Japanese-English TTS models built from scratch around the custom MioCodec. Its key feature is that it runs directly in llama.cpp and Ollama.

AI TTS Speech Synthesis Open Source LLM

Tech Feb 8, 2026 5 min

LFM2.5 - a hybrid architecture that's neither Transformer nor Mamba

Liquid AI's LFM2.5 uses a hybrid of short-range convolutions and attention, achieving edge optimization without SSMs. This article covers the architecture, benchmarks, and community use cases.

AI LLM Edge AI Architecture

Tech Feb 7, 2026 6 min

Qwen3-TTS — Open-source speech synthesis with a single pip install

A technical overview of Qwen3‑TTS from Alibaba’s Qwen team: one‑line pip install, 3‑second voice cloning, natural‑language voice design, and support for 10 languages including Japanese. Apache 2.0 licensed.