#Qwen

47 articles

TechApr 23, 202621 min

Running open-notebook on M1 Max Without Docker or Cloud APIs, and Letting qwen3.6:35b Read Its Own Article

The NotebookLM clone open-notebook assumes Docker and cloud APIs by default. I installed SurrealDB natively, ran four processes in tmux, and wired everything through Ollama's qwen3.6:35b and bge-m3. I fed it the Qwen3.6 benchmark article I wrote this morning, and it answered with the correct numbers.

AI LLM ローカルLLM Ollama Qwen Apple Silicon RAG OSS 実験

TechApr 23, 202613 min

Qwen3.6-27B Dense vs Qwen3.6-35B-A3B MoE on M1 Max — MLX Was 2× Faster Than Ollama

Tried Qwen3.6-27B on both Ollama and MLX. Ollama couldn't load the VL-projector-embedded GGUF, MLX ran it at 11 tok/s. On the side, running 35B-A3B under MLX was roughly 2× faster than the Ollama GGUF. Also had both models build a BBS to gauge intent handling.

LLM Local LLM Qwen Ollama MLX Apple Silicon MoE Experiment

TechApr 21, 2026updated11 min

Qwen3.6-35B-A3B on M1 Max via Ollama 0.20.6: 27 tok/s same as 3.5, but 13× thinking tokens

Hands-on Qwen3.6-35B-A3B (23GB 4bit GGUF) on M1 Max 64GB via Ollama 0.20.6. Generation speed stays at 27 tok/s — same as Qwen3.5-35B-A3B — but the same prompt produces 13× more thinking tokens. Multi-turn behavior, persona handling, and a three-tier NSFW probe included.

LLM Local LLM Qwen Ollama Apple Silicon MoE Experiment

TechApr 21, 2026updated11 min

Qwen3.6-Max-Preview and Kimi K2.6 landed nearly back-to-back — lining up both flagship coding models

Alibaba's Qwen3.6-Max-Preview and Moonshot AI's Kimi K2.6 were released within a 24-hour window on April 20–21, 2026. A side-by-side look at specs, benchmarks, distribution, and agent-side features for the two flagships.

LLM Qwen Kimi Moonshot AI MoE Agent Coding

TechApr 17, 2026updated10 min

Qwen3.6-35B-A3B pairs Gated DeltaNet with MoE and raises the bar on agentic coding

Alibaba's Qwen team released Qwen3.6-35B-A3B as open weights. A 40-layer hybrid of Gated DeltaNet, Gated Attention, and MoE hits 73.4 on SWE-bench Verified, 37.0 on MCPMark, and 1397 on QwenWebBench.

LLM Local LLM Qwen MoE Agent Coding

TechApr 16, 2026updated13 min

WAI-Anima v1 vs WAI-Illustrious on M1 Max ComfyUI: brings Anima's atmospheric backgrounds but loses on tag control and character consistency

Tested WAI-Anima v1, Anima preview3-base, and WAI-Illustrious v160 side by side on M1 Max 64GB ComfyUI with same seed/prompt. WAI-Anima inherits Anima's atmospheric lighting and natural running poses but still loses to WAI-Illustrious on tag control and character consistency. Includes i2i pipeline test (denoise 0.5), ~275s generation times, and how the Anima derivative ecosystem (WAI-Anima, CottonAnima, Kirazuri, RDBT) expanded in two months.

AI Image Generation ComfyUI Qwen Apple Silicon Stable Diffusion LoRA Experiment Anima WAI-Anima

TechApr 14, 202610 min

Can Qwen Image Edit Convert Photos to Pixel Art?

Tested 5 approaches including Qwen Image Edit, JS color reduction, and Illustrious i2i + LoRA. Illustrious i2i alone turned out to be the fastest and lightest solution for pixel art conversion.

Qwen Image Generation Apple Silicon Experiment

TechApr 14, 202610 min

Can Local Vision LLMs Extract RPG Stats from Character Art?

I tested local Vision LLMs (Gemma 3, Qwen2.5-VL, Llama 3.2 Vision, Gemma 4) to see if they could look at character illustrations and pixel art and generate RPG-style stats in JSON format.

AI Local LLM VLM Image Recognition Ollama Gemma Qwen Apple Silicon Experiment

TechApr 6, 202611 min

LLM-jp-4-32B-A3B on ROCm + Strix Halo: 41% Faster Than Qwen3.5

Benchmarking NII's LLM-jp-4-32B-A3B-thinking on EVO-X2 (Ryzen AI Max+ 395) with ROCm. 62.9 t/s vs Qwen3.5-35B-A3B's 44.7 t/s. Covers thinking control issues, KV cache trade-offs, knowledge cutoff, Japanese quality comparisons, code generation tests, and training data composition.

AI LLM Local LLM llama.cpp AMD ROCm MoE Qwen Experiment

TechMar 31, 2026updated8 min

Qwen3.5-35B-A3B on llama-server (Vulkan + Strix Halo): 4K → 65K context for only 800MB more VRAM

Qwen3.5-35B-A3B is an SSM+Attention hybrid where only 10 of 40 layers consume KV cache. Going from ctx-size 4096 to 65536 on llama-server + Vulkan added just 800MB VRAM with zero throughput loss. Tested on Strix Halo (Ryzen AI Max+ 395), with q8_0 KV quant benchmarks.

LLM Local LLM llama.cpp AMD Vulkan KV Cache Qwen Benchmark

TechMar 26, 2026updated11 min

Qwen Image Edit on M1 Max went 80s→10min after a ComfyUI update: MPS BF16 is the cause

Diagnosed a 7x speed regression for Qwen Image Edit on M1 Max 64GB ComfyUI after an update. Root cause: MPS BF16 matmul runs ~2x slower than FP16, compounded by an FP16 attention bug. Benchmark numbers and the working fix.

ComfyUI Qwen Apple Silicon MPS PyTorch Experiment

TechMar 23, 20267 min

Flash-MoE: Running a 397B-parameter model on a 48GB MacBook

Flash-MoE is a C/Metal inference engine that runs Qwen3.5-397B-A17B on a MacBook Pro M3 Max at 4.36 tokens/s. With expert streaming from SSD and hand-written Metal shaders, it fits the 209GB model into a 48GB memory budget.

Inference MPS LLM Qwen MoE Local LLM