The NotebookLM clone open-notebook assumes Docker and cloud APIs by default. I installed SurrealDB natively, ran four processes in tmux, and wired everything through Ollama's qwen3.6:35b and bge-m3. I fed it the Qwen3.6 benchmark article I wrote this morning, and it answered with the correct numbers.
Tried Qwen3.6-27B on both Ollama and MLX. Ollama couldn't load the VL-projector-embedded GGUF, MLX ran it at 11 tok/s. On the side, running 35B-A3B under MLX was roughly 2× faster than the Ollama GGUF. Also had both models build a BBS to gauge intent handling.
Hands-on Qwen3.6-35B-A3B (23GB 4bit GGUF) on M1 Max 64GB via Ollama 0.20.6. Generation speed stays at 27 tok/s — same as Qwen3.5-35B-A3B — but the same prompt produces 13× more thinking tokens. Multi-turn behavior, persona handling, and a three-tier NSFW probe included.
Alibaba's Qwen3.6-Max-Preview and Moonshot AI's Kimi K2.6 were released within a 24-hour window on April 20–21, 2026. A side-by-side look at specs, benchmarks, distribution, and agent-side features for the two flagships.
Alibaba's Qwen team released Qwen3.6-35B-A3B as open weights. A 40-layer hybrid of Gated DeltaNet, Gated Attention, and MoE hits 73.4 on SWE-bench Verified, 37.0 on MCPMark, and 1397 on QwenWebBench.
Tested WAI-Anima v1, Anima preview3-base, and WAI-Illustrious v160 side by side on M1 Max 64GB ComfyUI with same seed/prompt. WAI-Anima inherits Anima's atmospheric lighting and natural running poses but still loses to WAI-Illustrious on tag control and character consistency. Includes i2i pipeline test (denoise 0.5), ~275s generation times, and how the Anima derivative ecosystem (WAI-Anima, CottonAnima, Kirazuri, RDBT) expanded in two months.
Tested 5 approaches including Qwen Image Edit, JS color reduction, and Illustrious i2i + LoRA. Illustrious i2i alone turned out to be the fastest and lightest solution for pixel art conversion.
I tested local Vision LLMs (Gemma 3, Qwen2.5-VL, Llama 3.2 Vision, Gemma 4) to see if they could look at character illustrations and pixel art and generate RPG-style stats in JSON format.
Benchmarking NII's LLM-jp-4-32B-A3B-thinking on EVO-X2 (Ryzen AI Max+ 395) with ROCm. 62.9 t/s vs Qwen3.5-35B-A3B's 44.7 t/s. Covers thinking control issues, KV cache trade-offs, knowledge cutoff, Japanese quality comparisons, code generation tests, and training data composition.
Qwen3.5-35B-A3B is an SSM+Attention hybrid where only 10 of 40 layers consume KV cache. Going from ctx-size 4096 to 65536 on llama-server + Vulkan added just 800MB VRAM with zero throughput loss. Tested on Strix Halo (Ryzen AI Max+ 395), with q8_0 KV quant benchmarks.
Diagnosed a 7x speed regression for Qwen Image Edit on M1 Max 64GB ComfyUI after an update. Root cause: MPS BF16 matmul runs ~2x slower than FP16, compounded by an FP16 attention bug. Benchmark numbers and the working fix.
Flash-MoE is a C/Metal inference engine that runs Qwen3.5-397B-A17B on a MacBook Pro M3 Max at 4.36 tokens/s. With expert streaming from SSD and hand-written Metal shaders, it fits the 209GB model into a 48GB memory budget.