Gradient descent, SGD and Adam, backpropagation, vanishing/exploding gradients with residual connections, and learning rate schedules — organized around what each piece is doing at a high level. The goal is reading training logs and model card numbers, not computing anything.
A minimum set of calculus for reading AI and LLM articles — d/dx, e, the chain rule, partial derivatives, and gradients. Focus on what the symbols are doing, not on solving the formulas.
A minimum set of probability and statistics for reading AI and LLM articles — conditional probability, cross-entropy, perplexity, and temperature are the main ones; rigorous Bayes and MLE derivations stay out of scope.
A minimum set of vectors and matrices for reading AI and LLM articles — the dot product and matrix product are the main two; determinants, inverses, and eigenvalues stay out of scope.
A minimum set of math for reading AI, LLM, and image-generation articles — the aim isn't to derive anything, just to recognize weighted sums, S-curves, probabilities, and the 'nudge toward the answer' step of training.
Hands-on Qwen3.6-35B-A3B (23GB 4bit GGUF) on M1 Max 64GB via Ollama 0.20.6. Generation speed stays at 27 tok/s — same as Qwen3.5-35B-A3B — but the same prompt produces 13× more thinking tokens. Multi-turn behavior, persona handling, and a three-tier NSFW probe included.
Alibaba's Qwen3.6-Max-Preview and Moonshot AI's Kimi K2.6 were released within a 24-hour window on April 20–21, 2026. A side-by-side look at specs, benchmarks, distribution, and agent-side features for the two flagships.
A three-link chain of mmap → MTLBuffer(bytesNoCopy) → Wasmtime MemoryCreator that makes a Wasm linear memory share the same physical bytes as a Metal GPU buffer. Llama 3.2 1B runs at 9ms/token on M1.
Two simultaneous announcements from Cloudflare Agents Week 2026: Agent Memory manages agent recall via Durable Objects, Vectorize, and Workers AI, while isitagentready.com scores how well sites are prepared for agents.
Alibaba's Qwen team released Qwen3.6-35B-A3B as open weights. A 40-layer hybrid of Gated DeltaNet, Gated Attention, and MoE hits 73.4 on SWE-bench Verified, 37.0 on MCPMark, and 1397 on QwenWebBench.
LLM safety stacks five layers — input filter, system prompt, RLHF, Constitutional AI, output filter — and each provider blocks at different layers. A breakdown of where abliterated vs uncensored models cut, and the default censorship level baked into local LLMs.
Google DeepMind's AI writing tool Fabula was demoed at CHI 2026 by Piotr Mirowski. Co-designed with 42 professional writers using convergent iteration for story structure. But the timeline shows Fabula was first demoed in May 2025 and entered early access in September 2025 — still a research prototype with no general availability.