#LLM

127 articles

TechMar 6, 202610 min

Back-to-back releases of OpenAI GPT-5.3/5.4 and Saguaro-driven inference speedups

A summary of GPT-5.3 Instant’s hallucination reductions and safety regressions, GPT-5.4’s computer use, Tool Search, and 1M-token context, plus Saguaro’s 5× inference speedups.

LLM OpenAI GPT Inference Optimization Speculative Decoding AI Safety Computer Use

TechMar 4, 20267 min

Amazon Bedrock Mantle's OpenAI-compatible API is now generally available

AWS has made OpenAI API compatibility for the Bedrock Mantle distributed inference engine generally available, letting existing OpenAI SDK code run against open-weight models such as DeepSeek and Mistral.

AWS Amazon Bedrock OpenAI API LLM

TechMar 1, 202611 min

The Reason Qwen 3.5 Failed on Radeon 8060S Was an Outdated AMD Driver

Isolating the cause of Qwen 3.5 failing on ROCm/Vulkan via CPU inference, llama-server, and LM Studio — an AMD driver update resolved everything.

AI LLM Local LLM AMD llama.cpp Ollama LM Studio Experiment

TechFeb 28, 2026updated12 min

Qwen 3.5 abliterated in Ollama: broken outputs, chat-template failures, and the official-model workaround

Hands-on test of huihui-ai Qwen 3.5 abliterated models in Ollama: garbage-token failures, GLM-4.7-Flash chat-template breakage, and why the official model with thinking disabled worked better.

AI LLM Ollama Local LLM AMD LM Studio Vulkan ROCm Experiment

TechFeb 28, 202615 min

Automated OCR Error Detection and Correction with Encoder Models + Local LLM

Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.

NLP OCR Machine Learning Python BERT LUKE Ollama LLM WSL2 NDLOCR-Lite Experiment

TechFeb 24, 20268 min

Large-Scale Unauthorized Distillation of Claude and the Collapse of SWE-bench Hit on the Same Day

Anthropic accused three Chinese AI companies of distilling Claude, and on the same day OpenAI retired SWE-bench Verified. Training fraud and evaluation flaws exposed simultaneously on February 23, 2026.

AI Security Anthropic DeepSeek Benchmark LLM OpenAI SWE-bench

TechFeb 22, 2026updated7 min

AI Agent Orchestration: Claws and Cord

Andrej Karpathy coined "Claws" as an upper layer for AI agents, and June Kim answered the same question from a different angle with the Cord framework implemented with MCP and SQLite. This piece organizes the shift from single-shot agents to autonomous coordination systems from both conceptual and implementation perspectives.

AI AI Agents MCP LLM Architecture Karpathy

TechFeb 20, 2026updated13 min

Accelerating LLM Inference: CDLM and Attention Matching KV Compaction

Two February 2026 papers on reducing inference cost: Together AI’s Consistency DLM (up to 14.5× faster) and MIT/Harvard’s Attention Matching KV compaction (50× compaction in seconds).

AI LLM Inference Optimization KV Cache Diffusion models

TechFeb 18, 20263 min

NVIDIA Nemotron 2 Nano 9B Japanese - The No.1 Japanese Model Under 10B for Sovereign AI

NVIDIA has released Nemotron-Nano-9B-v2-Japanese. It takes first place in the sub-10B category on Nejumi Leaderboard 4, delivering strong performance in Japanese knowledge, QA, and tool calling.

NVIDIA LLM Nemotron Japanese AI

TechFeb 18, 20263 min

Claude Sonnet 4.6 Released, Sometimes Beating Opus 4.5 in Coding

Anthropic has released the mid-sized model Claude Sonnet 4.6. In Claude Code evaluations, 70% of users preferred it over Sonnet 4.5, and 59% preferred it over Opus 4.5, while pricing remains unchanged.

Claude Anthropic LLM Claude Code

TechFeb 15, 20265 min

Exposing a Local LLM as an External API via Tailscale VPN

How I used Tailscale VPN and a ConoHa VPS to make a local LLM accessible over the internet through an OpenAI-compatible API.

AI LLM Tailscale VPN VPS Experiment

TechFeb 15, 2026updated5 min

Optimizing VRAM and Memory Allocation on Strix Halo for Local LLMs

How to configure VRAM/main memory split on the GMKtec EVO-X2 (Strix Halo) for local LLM inference. A 29.6GB model ran fine with just 8GB of dedicated VRAM.

AI LLM Memory Optimization AMD LM Studio Experiment