Tech Apr 3, 2026 8 min Running Lemonade on Strix Halo (EVO-X2): Vulkan Shared Memory Leaks and ROCm Stability Real-world testing of AMD Lemonade v10.0.1 on Ryzen AI Max+ 395. LLM, image generation, speech recognition, and TTS running simultaneously, NPU Hybrid execution, Vulkan vs ROCm benchmarks, and discovering shared memory leaks. AMD Local LLM Vulkan ROCm NPU llama.cpp GPU Inference Optimization Benchmark Experiment
Tech Mar 31, 2026 8 min Scaling Qwen3.5-35B-A3B from 4K to 65K Context with Only 800MB Extra VRAM Qwen3.5-35B-A3B is an SSM+Attention hybrid where only 10 of 40 layers use KV cache. Expanding ctx-size from 4096 to 65536 on llama-server added just 800MB VRAM with zero speed loss. Includes q8_0 KV quantization benchmarks and TurboQuant status. LLM Local LLM llama.cpp AMD Vulkan KV Cache Qwen Benchmark
Tech Feb 26, 2026 5 min Running ComfyUI + WAI-Illustrious on an RTX 4060 Laptop (8GB VRAM) ComfyUI + WAI-Illustrious v16.0 on an RTX 4060 Laptop (8GB VRAM). Generates 1024x1024 in 15 seconds without lowvram, even with LoRA. ComfyUI Stable Diffusion Image Generation GPU Benchmark