#LLM

115 articles

Tech Apr 25, 2026 5 min

OpenAI ships GPT-5.5 and GPT-5.5 Pro on the API

OpenAI shipped GPT-5.5 and GPT-5.5 Pro on the API. A practical rundown of the 1M+ context, the new reasoning.effort default, image input behavior, prompt caching, and pricing.

OpenAI LLM API AI

Tech Apr 25, 2026 updated 11 min

Ling-flash-2.0 MXFP4 (bailing_moe) on SwiftLM + M1 Max 64GB: working config, support check, --stream-experts notes

Hands-on running inclusionAI Ling-flash-2.0 (100B / 6.1B active, MXFP4 quant, 54.7GB) on SwiftLM via mlx-swift-lm on an M1 Max 64GB. Covers bailing_moe + MXFP4 support check in mlx-swift, the startup surprise, and what --stream-experts actually saves.

Apple Silicon LLM MLX Local LLM Swift SwiftLM MoE MXFP4 Ant Group Experiment

Tech Apr 24, 2026 13 min

Running SwiftLM on M1 Max 64GB and Comparing It to Ollama and MLX-lm

A hands-on build and run of the Swift-based LLM inference server SwiftLM on an M1 Max 64GB. Covers Qwen3.6-35B-A3B and Qwen3.5-122B-A10B, with the same BST, BBS, and persona tests used in the existing Ollama and MLX-lm write-ups.

Apple Silicon LLM MLX Local LLM Swift SwiftLM MoE Experiment

Tech Apr 24, 2026 9 min

TRACER trains a surrogate from LLM classification API logs and swaps in via a parity gate

TRACER, a recent arXiv paper, takes the input/output logs of an LLM classification endpoint and reuses them as training data, then swaps in a lightweight surrogate only on regions that pass a parity gate to cut inference cost. The surrogate absorbs 83–100% of traffic on a 77-class intent dataset and 100% on a 150-class one, while correctly refusing to deploy on an NLI task — that refusal behavior is the interesting part.

AI LLM Machine Learning Paper Inference Optimization

Tech Apr 24, 2026 9 min

Japan's Digital Agency open-sources its government AI "Gennai" with RAG, self-hosted LLM, and legal-AI templates under commercial-friendly licenses

Japan's Digital Agency released parts of Gennai, the generative AI platform it runs for central-government staff, on GitHub under MIT / CC BY 4.0. The web app and cloud-specific AI templates for AWS, Azure, and Google Cloud are bundled together so local governments and private companies can redeploy the same stack.

AI LLM RAG Open Source National strategy AWS Azure Google Cloud

Tech Apr 24, 2026 updated 11 min

DeepSeek V4 Preview specs: V4-Pro 1.6T and V4-Flash 284B open under MIT, 1M context, 27% inference FLOPs of V3.2

DeepSeek V4 Preview ships V4-Pro (1.6T/49B active) and V4-Flash (284B/13B active) as open weights under MIT, both with 1M context. CSA+HCA hybrid attention, mHC, and the Muon optimizer cut per-token FLOPs at 1M tokens to 27% of V3.2. Day-one API and chat.deepseek.com mode switch covered.

LLM DeepSeek Chinese AI MoE Open Model AI Agent

Tech Apr 24, 2026 updated 14 min

Tencent Hy3-preview (295B) vs Ant Ling-2.6-flash (104B): two open Chinese MoEs released the same week

Two open-weight Chinese MoEs landed within 24 hours: Ant Ling-2.6-flash (104B/7.4B active, 7x token-efficiency claim) and Tencent Hy3-preview (295B/21B active, frontier-tier open weights). Specs, licenses, and how they line up against DeepSeek-V3 and GLM-4.5.

LLM Chinese AI MoE Open Model AI Agent Local LLM OpenRouter

Tech Apr 23, 2026 updated 9 min

Xiaomi ships MiMo-V2.5 and MiMo-V2.5-Pro together — 1M omnimodal and 1,000-tool agent, API only

Xiaomi launched two MiMo-V2.5 models at once. MiMo-V2.5-Pro hits SWE-bench Pro 57.2, Claw-Eval 63.8, and τ3-Bench 72.9 — frontier-tier — while MiMo-V2.5 brings native omnimodality plus a 1M context. Both are API-only for now; open weights are promised but unscheduled.

AI LLM Chinese AI MoE AI Agents Multimodal Xiaomi

Tech Apr 23, 2026 8 min

NVIDIA NIM opens free hosted inference across 100+ models on an OpenAI-compatible endpoint that OpenClaw and Cursor plug into directly

NVIDIA's build.nvidia.com serves a free inference API that covers 100+ models including MiniMax M2.7, GLM-5, Kimi K2.5, DeepSeek, GPT-OSS, and Sarvam-M. Because integrate.api.nvidia.com/v1 is OpenAI-compatible, OpenClaw, OpenCode, Zed, and Cursor can call it directly.

NVIDIA LLM API OpenAI AI Coding OpenClaw

Tech Apr 23, 2026 21 min

Running open-notebook on M1 Max Without Docker or Cloud APIs, and Letting qwen3.6:35b Read Its Own Article

The NotebookLM clone open-notebook assumes Docker and cloud APIs by default. I installed SurrealDB natively, ran four processes in tmux, and wired everything through Ollama's qwen3.6:35b and bge-m3. I fed it the Qwen3.6 benchmark article I wrote this morning, and it answered with the correct numbers.

AI LLM ローカルLLM Ollama Qwen Apple Silicon RAG OSS 実験

Tech Apr 23, 2026 13 min

Qwen3.6-27B Dense vs Qwen3.6-35B-A3B MoE on M1 Max — MLX Was 2× Faster Than Ollama

Tried Qwen3.6-27B on both Ollama and MLX. Ollama couldn't load the VL-projector-embedded GGUF, MLX ran it at 11 tok/s. On the side, running 35B-A3B under MLX was roughly 2× faster than the Ollama GGUF. Also had both models build a BBS to gauge intent handling.

LLM Local LLM Qwen Ollama MLX Apple Silicon MoE Experiment

Tech Apr 23, 2026 6 min

Math for reading AI articles: the full 5-article series

A hub for the 5-article series that organizes math symbols in AI and LLM articles for reading, not solving. Covers equations, vectors and matrices, probability and statistics, derivatives, and gradient descent with backprop, plus a reading-order guide for different backgrounds.

AI LLM 機械学習数式入門