Radeon 8060S (gfx1151) Vulkan Broke Again After AMD Driver Update

Last month, updating the AMD driver to 26.2.2 finally got Vulkan GPU inference working properly. The abliterated Qwen 3.5 hit 54 t/s, the VRAM shared memory priority issue was resolved, and I thought the EVO-X2’s GPU was finally usable.

Then I updated to AMD Software 26.3.1, and it broke again. Loading a model via Vulkan reports “success,” but in reality the data never lands in VRAM and everything falls back to CPU processing through system RAM. The same model with the same settings that got 34-54 t/s on driver 26.2.2 degraded to 4-9 t/s. Further investigation revealed that repeated Vulkan load failures leak device memory, and it never recovers until you reboot. Right after a fresh reboot you get 50+ t/s, so there’s almost certainly a bug in the driver’s memory management. Ultimately, changing the BIOS VRAM allocation from 48GB/16GB to 32GB/32GB got Q6_K running stably at 53.7 t/s.

Environment

Item	Value
PC	GMKtec EVO-X2 (NucBox_EVO-X2)
CPU	AMD Ryzen AI Max+ 395 (Zen 5 / 16C 32T)
GPU	AMD Radeon 8060S (gfx1151, RDNA 3.5)
Memory	64GB UMA (System 16GB / VRAM 48GB)
OS	Windows 11 Pro 10.0.26200
AMD Driver	32.0.23033.1002 (built: 2026-03-09)
AMD Software	26.3.1 (installed: 2026-03-22)
LM Studio	0.4.8
llama-server	b8183 (66d65ec29)

The Ryzen AI Max+ 395 is an APU with CPU and GPU on the same die, sharing memory in a UMA (Unified Memory Architecture) configuration. The 64GB of physical memory is shared between system and VRAM, with BIOS set to allocate 48GB to VRAM. UMA details and VRAM allocation optimization are covered in a previous article. The “VRAM gets placed in shared memory instead of dedicated” issue reported there was fixed in driver 26.2.2, but it’s back in 26.3.1. Unlike discrete GPUs, the CPU and GPU share the same memory bus, so when the Vulkan driver’s memory management goes wrong, the impact is severe.

Symptoms

1. GPU Not Used (CPU Fallback)

Loading a model with the Vulkan backend reports success, but it actually runs on CPU through system RAM instead of GPU compute.

Test	Speed	Measured on 26.2.2
b8183 Vulkan Q4_K_XL (20.7GB)	9.5 t/s	35 t/s
b8183 Vulkan Q6_K + `--no-direct-io`	4.4 t/s	41 t/s
b8183 CPU IQ2_XXS (9.8GB)	13.4 t/s	14.4 t/s

In the previous benchmarks with driver 26.2.2, Q4_K_M hit 35 t/s, Q6_K hit 41 t/s, and the abliterated Q6_K went up to 54 t/s. Now through Vulkan it’s slower than pure CPU inference. The model data is sitting in system RAM instead of VRAM, and GPU kernels are reading from system RAM (or not running at all). Main memory usage is abnormally high, confirming that data isn’t in VRAM.

2. ErrorOutOfDeviceMemory

Vulkan reports 54,305 MiB free, yet can’t allocate the 26.8GB Q6_K model.

vk::Queue::submit: ErrorOutOfDeviceMemory

VK_ERROR_OUT_OF_DEVICE_MEMORY is a Vulkan API error code meaning device (GPU) memory allocation failed. Getting this error with plenty of free space means the driver is either mismanaging memory pools, or mapping memory types incorrectly in the UMA environment.

3. Garbage Output (mmap=true)

Loading Q4_K_XL with mmap=true produces completely garbled model output.

说完ak觉得 ,一ば った跟不会 . sieme 只言却 ...

A meaningless jumble of Chinese, Japanese, and English. This is a classic symptom of model weight data not being correctly transferred to VRAM (or getting corrupted during transfer). The driver is likely miscalculating addresses or DMA transfers when copying mmap’d host memory data to Vulkan buffers.

Setting mmap=false fixes the output but severely degrades speed. In the previous benchmarks, using --no-mmap reduced system memory usage from 28GB to 8.8GB and slightly improved speed, so it’s natural to conclude the mmap-related driver implementation regressed in 26.3.1.

4. Latest Build (b8560) Makes Things Worse

Updating to llama.cpp’s latest build b8560 made things even worse.

ggml_vulkan: Device memory allocation of size 907872256 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory

Can’t even allocate 0.9GB (~907MB). On b8183, 20GB-class models could load (slowly), but on b8560 even gigabyte-scale allocations fail. Could be a change in llama.cpp’s Vulkan memory allocation strategy, but it’s compounding with the driver-side issue.

5. Device Memory Leak

While repeatedly testing the above symptoms, another nasty problem surfaced. Repeated Vulkan load failures leak device memory with no recovery.

Running the same test right after a reboot reproduces the article’s benchmark performance (50+ t/s). But after a few load failures or ErrorOutOfDeviceMemory errors, no model can properly allocate VRAM anymore. Even after killing LM Studio or llama-server processes, VRAM usage doesn’t return to normal.

So the speed degradation and allocation errors above aren’t “always happening” — they’re “once it fails, the whole session is poisoned.” The driver fails to release Vulkan resources and never returns the allocated device memory. In a UMA environment, VRAM leaks directly impact system-wide memory availability since VRAM and system RAM share the same physical memory pool.

This discovery changes the interpretation of the test results. The 9.5 t/s and 4.4 t/s numbers from earlier were likely measured in a state with accumulated memory leaks. In a clean state right after reboot, the numbers might be somewhat better. Still unstable either way, though.

Timeline

graph TD
    A["3/1: Driver 26.2.2<br/>Q6_K Vulkan 34-48 t/s<br/>Working normally"] --> B["3/9: Current driver build date"]
    B --> C["3/13: AMD chipset driver update"]
    C --> D["3/16: Q6_K load attempt<br/>ErrorOutOfDeviceMemory"]
    D --> E["3/22: AMD Software 26.3.1<br/>installed"]
    E --> F["3/23: Q4_K_XL mmap=false<br/>19.5 t/s works but slow"]
    F --> G["3/27: LM Studio 0.4.8 update"]
    G --> H["3/28: Q4_K_XL mmap=true<br/>22.6 t/s but garbage output"]
    H --> I["3/28: Q5_K_XL / Q6_K<br/>ErrorOutOfDeviceMemory"]
    I --> J["3/28: Memory leak discovered<br/>Reboot restores 50+ t/s"]

    style A fill:#4CAF50,color:#fff
    style D fill:#FF9800,color:#fff
    style H fill:#f44336,color:#fff
    style I fill:#f44336,color:#fff
    style J fill:#2196F3,color:#fff

Date	Event	Vulkan Performance
3/1	Article written. Driver 26.2.2	34-48 t/s, working normally
3/9	Current driver build date	-
3/13	AMD chipset driver update	-
3/16	Q6_K load attempt	ErrorOutOfDeviceMemory
3/22	AMD Software 26.3.1 installed	-
3/23	Q4_K_XL load (mmap=false)	19.5 t/s (works but halved)
3/27	LM Studio 0.4.8 update	-
3/28	Q4_K_XL (mmap=true)	22.6 t/s but garbage output
3/28	Q5_K_XL / Q6_K	ErrorOutOfDeviceMemory
3/28	Memory leak discovered	Reboot restores 50+ t/s

A few notable points.

Q6_K was already failing with ErrorOutOfDeviceMemory on 3/16. The chipset driver update on 3/13 might have been the trigger. AMD Software 26.3.1 wasn’t installed until 3/22, so the problem predates that.

On 3/23, Q4_K_XL ran at 19.5 t/s, but the same model was doing 35 t/s in early March. It looked like it was “working,” but performance had already halved. VRAM placement was probably only partially functional. Given the memory leak issue, leaks had likely already accumulated by this point.

On 3/28 I confirmed the memory leak. Since 50+ t/s comes back right after a reboot, the driver’s memory management does work correctly in its initial state. Leaks start after load failures or errors and accumulate throughout the session.

Isolating the Problem

Here’s the evidence for whether this is a driver issue or a llama.cpp issue.

Evidence Pointing to Driver

Same llama.cpp build (b8183) and same models worked fine on driver 26.2.2 in early March. Measured values of 35 t/s for Q4_K_M, 41 t/s for Q6_K, and 54 t/s for abliterated Q6_K are on record
The shared memory priority issue fixed in driver 26.2.2 has resurfaced with the same symptoms
ErrorOutOfDeviceMemory with plenty of free VRAM (driver memory management issue)
Data corruption with mmap=true (host-device memory transfer issue). In the previous benchmarks, --no-mmap could avoid double-mapping system memory, but this time things are broken at an earlier stage
50+ t/s right after reboot. Degrades as leaks accumulate during the session (driver resource release issue)
gfx1151 is a new RDNA 3.5 chip with immature driver support

Possible llama.cpp Contribution

Things got worse on b8560 (possible Vulkan memory allocation strategy change)
But b8183 was already degraded, so it’s not the root cause

UMA-Specific Issues

UMA environments handle Vulkan memory types (DEVICE_LOCAL, HOST_VISIBLE, etc.) differently from discrete GPUs
The driver may not be correctly advertising DEVICE_LOCAL memory in the UMA environment, or selecting the wrong memory type during allocation
Driver issues in this UMA environment have been a consistent problem since the initial EVO-X2 setup. The “placed in shared memory instead of VRAM” issue found in the VRAM allocation article was confirmed fixed in driver 26.2.2, but the same symptoms are back in 26.3.1
Memory leaks are especially severe in UMA because leaked VRAM also impacts system RAM availability. With a discrete GPU, a GPU memory leak leaves system RAM untouched, but with UMA it eats into the shared pool

The gfx1151 Vulkan problems aren’t just me — they’ve been reported across multiple projects.

Issue	Description	Status
lmstudio-ai/lmstudio-bug-tracker#1048	Vulkan v1.52.0 doesn’t use VRAM on gfx1151. 3x slower	fixed-in-next-update
ggml-org/llama.cpp#16832	Vulkan UMA memory detection bug (only recognizes 32GB)	Closed (PR #17110)
ggml-org/llama.cpp#20354	Missing GATED_DELTA_NET shader in Vulkan (gfx1151)	Open
ggml-org/llama.cpp#18741	Strix Halo Vulkan load failure. Workaround: `--no-direct-io`	Closed
GPUOpen-Drivers/AMDVLK#413	AMDVLK 2GB allocation limit	Open
LostRuins/koboldcpp#1980	koboldcpp also can’t detect 8060S VRAM	-

The same problems appear in koboldcpp and LM Studio, not just llama.cpp, which confirms this is a driver-side issue. The AMDVLK#413 2GB allocation limit could be directly related to the ErrorOutOfDeviceMemory seen here.

What I Tried

Workaround	Result
`--no-mmap`	Load succeeds but GPU not used (9.5 t/s)
`--no-direct-io`	Q6_K loads but GPU not used (4.4 t/s)
llama.cpp b8560 (latest)	Worse. Can’t even allocate 0.9GB
LM Studio mmap OFF	Gets mmap=false but ErrorOutOfDeviceMemory
PC reboot	Clears memory leak, 50+ t/s restored

Post-Reboot Test Results

After discovering the memory leak, I rebooted and ran the same command from the article in a clean state.

llama-server.exe -m "Huihui-Qwen3.5-35B-A3B-abliterated.Q4_K_M.gguf" --port 8080 --ctx-size 4096 --reasoning-budget 0 --n-gpu-layers 99 --no-mmap

Item	Result
Load	Success (Vulkan0 = 19,905 MiB)
Generation speed	54.9 - 57.7 t/s
Article benchmark	49.18 t/s
Output quality	Normal (Japanese output fine)

Right after reboot, speeds exceeded the article’s benchmark. However, 272 MiB was placed in Vulkan_Host, so data isn’t entirely in VRAM (some goes through shared memory). On driver 26.2.2, no Vulkan_Host spillover occurred, confirming that 26.3.1 changed the memory placement logic.

Reproducing the Memory Leak

Comparison before and after reboot.

State	Result
Before reboot	`Device memory allocation of size 1025582592 failed` — 1GB allocation failed
After reboot	`Vulkan0 model buffer size = 19905.15 MiB` — 19.9GB allocated successfully, 54.9 t/s

Repeated Vulkan load failures (ErrorOutOfDeviceMemory) cause the driver to never release device memory. Eventually even 0.9GB can’t be allocated. Only a PC reboot recovers it. This means that “fail → retry” loops during testing progressively worsen the state. Large model load attempts need to be done carefully.

Root Cause

Investigation revealed multiple overlapping issues.

AMD Driver 26.3.1 Vulkan Memory Management Regression: The driver (built 2026-03-09) regressed Vulkan device memory management for the Radeon 8060S UMA. The driver reports 54GB of free Vulkan memory, but in practice some allocations land in shared memory, and large models (Q5_K_XL 24.6GB+) fail with ErrorOutOfDeviceMemory
Vulkan Device Memory Leak: The driver doesn’t release memory on load failure. Repeated attempts exhaust device memory until even small allocations fail. Only recoverable by PC reboot
Shared Memory Fallback: Part of the model data (Vulkan_Host buffer) gets placed in shared memory instead of VRAM. This didn’t happen on driver 26.2.2. Performance is acceptable for now, but it’s not the “shared memory not used” state

Fix: Change BIOS VRAM Allocation to 32GB/32GB

Changing the BIOS from 48GB VRAM / 16GB system to 32GB VRAM / 32GB system resolved the ErrorOutOfDeviceMemory.

Model	48/16	32/32
Q4_K_M abliterated (19.7GB)	54.9 t/s (only right after reboot)	Works fine
Q6_K (26.8GB)	ErrorOutOfDeviceMemory	53.7 t/s

Vulkan free memory reported values.

Config	free memory	Q6_K
48/16	54,305 MiB	Won’t load
32/32	46,522 MiB	Loads (53.7 t/s)

The reported VRAM is lower, yet the loadable model size is larger.

Why Less VRAM Loads Bigger Models

AMD driver 26.3.1 uses system RAM as a transfer buffer during model loading. With 48/16, the system side only has 16GB, and after OS + Parsec + Tailscale etc. consume about 7GB, the remaining ~9GB isn’t enough for transfer buffers, causing ErrorOutOfDeviceMemory.

With 32/32, the system side has about 25GB free, providing ample room for transfer buffers. Vulkan reports less VRAM on paper, but the effectively loadable model size increases.

graph LR
    subgraph "48/16 Config"
        A1["VRAM 48GB"] --> B1["Q6_K 26.8GB<br/>Load attempt"]
        C1["System 16GB<br/>~9GB free"] --> D1["Transfer buffer<br/>insufficient"]
        D1 --> E1["ErrorOutOfDeviceMemory"]
    end

    subgraph "32/32 Config"
        A2["VRAM 32GB"] --> B2["Q6_K 26.8GB<br/>Load success"]
        C2["System 32GB<br/>~25GB free"] --> D2["Transfer buffer<br/>sufficient"]
        D2 --> B2
    end

    style E1 fill:#f44336,color:#fff
    style B2 fill:#4CAF50,color:#fff

In the previous VRAM allocation analysis, the conclusion was “more VRAM is better.” But with driver 26.3.1, you also need headroom on the system RAM side. In a UMA environment, the VRAM/system RAM split directly affects driver behavior, so the optimal configuration can change with each driver update.

A few caveats.

397 MiB is placed in Vulkan_Host, so it’s not running entirely from VRAM
On driver 26.2.2, Q6_K worked fine with 48/16, so this workaround is specific to driver 26.3.1
Repeated Vulkan load failures cause memory leaks, so reboot immediately after a failure

Operating Procedure

Set BIOS to 32GB VRAM / 32GB system
Run llama-server directly with b8183 + --no-mmap
If Vulkan load fails, reboot immediately (memory leaks make things progressively worse)
LM Studio has separate Vulkan backend issues — direct llama-server execution recommended

Didn’t expect a BIOS settings change to fix this. Reducing VRAM from 48GB to 32GB and having Q6_K load successfully is counterintuitive. The previous VRAM allocation analysis concluded “more VRAM is better,” but after a driver change, the system RAM side needs headroom too. From the initial EVO-X2 setup to VRAM allocation analysis, the Ollama total failure, the driver update fixing everything, and now this regression plus BIOS workaround — this is the fourth round of troubleshooting. UMA + RDNA 3.5 is not mature yet.

Qwen 3.5’s Radeon 8060S Total Failure Was Caused by AMD Drivers — Benchmark records from driver 26.2.2 getting 34-54 t/s. The baseline for this regression
Optimizing VRAM and Memory Allocation on Strix Halo — VRAM allocation and memory management in UMA. First report of the shared memory priority issue
Setting Up Local LLMs on the EVO-X2 — Initial setup and early Vulkan troubles
Trying to Run abliterated Models on Ollama and Failing Completely — Qwen 3.5 total failure on the old driver
Exposing Local LLM as an External API via VPN — EVO-X2 GPU inference exposed via Tailscale. When Vulkan breaks, this also falls back to CPU