Lemonade is AMD's open-source local AI server that manages multiple backends like llama.cpp and FastFlowLM across GPU/NPU/CPU, serving text, image, and audio generation through an OpenAI-compatible API.
Qwen3.5-35B-A3B is an SSM+Attention hybrid where only 10 of 40 layers consume KV cache. Going from ctx-size 4096 to 65536 on llama-server + Vulkan added just 800MB VRAM with zero throughput loss. Tested on Strix Halo (Ryzen AI Max+ 395), with q8_0 KV quant benchmarks.
After updating to AMD Software 26.3.1 on a GMKtec EVO-X2 (Ryzen AI Max+ 395), Vulkan backend fails to allocate device memory properly and falls back to CPU. Investigation and workaround by changing BIOS VRAM allocation from 48GB/16GB to 32GB/32GB.
Hands-on test of huihui-ai Qwen 3.5 abliterated models in Ollama: garbage-token failures, GLM-4.7-Flash chat-template breakage, and why the official model with thinking disabled worked better.