Tech 14 min read

Radeon 8060S (gfx1151) Vulkan Broke Again After AMD Driver Update

Last month, updating the AMD driver to 26.2.2 finally got Vulkan GPU inference working properly. The abliterated Qwen 3.5 hit 54 t/s, the VRAM shared memory priority issue was resolved, and I thought the EVO-X2’s GPU was finally usable.

Then I updated to AMD Software 26.3.1, and it broke again. Loading a model via Vulkan reports “success,” but in reality the data never lands in VRAM and everything falls back to CPU processing through system RAM. The same model with the same settings that got 34-54 t/s on driver 26.2.2 degraded to 4-9 t/s. Further investigation revealed that repeated Vulkan load failures leak device memory, and it never recovers until you reboot. Right after a fresh reboot you get 50+ t/s, so there’s almost certainly a bug in the driver’s memory management. Ultimately, changing the BIOS VRAM allocation from 48GB/16GB to 32GB/32GB got Q6_K running stably at 53.7 t/s.

Environment

ItemValue
PCGMKtec EVO-X2 (NucBox_EVO-X2)
CPUAMD Ryzen AI Max+ 395 (Zen 5 / 16C 32T)
GPUAMD Radeon 8060S (gfx1151, RDNA 3.5)
Memory64GB UMA (System 16GB / VRAM 48GB)
OSWindows 11 Pro 10.0.26200
AMD Driver32.0.23033.1002 (built: 2026-03-09)
AMD Software26.3.1 (installed: 2026-03-22)
LM Studio0.4.8
llama-serverb8183 (66d65ec29)

The Ryzen AI Max+ 395 is an APU with CPU and GPU on the same die, sharing memory in a UMA (Unified Memory Architecture) configuration. The 64GB of physical memory is shared between system and VRAM, with BIOS set to allocate 48GB to VRAM. UMA details and VRAM allocation optimization are covered in a previous article. The “VRAM gets placed in shared memory instead of dedicated” issue reported there was fixed in driver 26.2.2, but it’s back in 26.3.1. Unlike discrete GPUs, the CPU and GPU share the same memory bus, so when the Vulkan driver’s memory management goes wrong, the impact is severe.

Symptoms

1. GPU Not Used (CPU Fallback)

Loading a model with the Vulkan backend reports success, but it actually runs on CPU through system RAM instead of GPU compute.

TestSpeedMeasured on 26.2.2
b8183 Vulkan Q4_K_XL (20.7GB)9.5 t/s35 t/s
b8183 Vulkan Q6_K + --no-direct-io4.4 t/s41 t/s
b8183 CPU IQ2_XXS (9.8GB)13.4 t/s14.4 t/s

In the previous benchmarks with driver 26.2.2, Q4_K_M hit 35 t/s, Q6_K hit 41 t/s, and the abliterated Q6_K went up to 54 t/s. Now through Vulkan it’s slower than pure CPU inference. The model data is sitting in system RAM instead of VRAM, and GPU kernels are reading from system RAM (or not running at all). Main memory usage is abnormally high, confirming that data isn’t in VRAM.

2. ErrorOutOfDeviceMemory

Vulkan reports 54,305 MiB free, yet can’t allocate the 26.8GB Q6_K model.

vk::Queue::submit: ErrorOutOfDeviceMemory

VK_ERROR_OUT_OF_DEVICE_MEMORY is a Vulkan API error code meaning device (GPU) memory allocation failed. Getting this error with plenty of free space means the driver is either mismanaging memory pools, or mapping memory types incorrectly in the UMA environment.

3. Garbage Output (mmap=true)

Loading Q4_K_XL with mmap=true produces completely garbled model output.

说完ak觉得 ,一ば った跟不会 . sieme 只言却 ...

A meaningless jumble of Chinese, Japanese, and English. This is a classic symptom of model weight data not being correctly transferred to VRAM (or getting corrupted during transfer). The driver is likely miscalculating addresses or DMA transfers when copying mmap’d host memory data to Vulkan buffers.

Setting mmap=false fixes the output but severely degrades speed. In the previous benchmarks, using --no-mmap reduced system memory usage from 28GB to 8.8GB and slightly improved speed, so it’s natural to conclude the mmap-related driver implementation regressed in 26.3.1.

4. Latest Build (b8560) Makes Things Worse

Updating to llama.cpp’s latest build b8560 made things even worse.

ggml_vulkan: Device memory allocation of size 907872256 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory

Can’t even allocate 0.9GB (~907MB). On b8183, 20GB-class models could load (slowly), but on b8560 even gigabyte-scale allocations fail. Could be a change in llama.cpp’s Vulkan memory allocation strategy, but it’s compounding with the driver-side issue.

5. Device Memory Leak

While repeatedly testing the above symptoms, another nasty problem surfaced. Repeated Vulkan load failures leak device memory with no recovery.

Running the same test right after a reboot reproduces the article’s benchmark performance (50+ t/s). But after a few load failures or ErrorOutOfDeviceMemory errors, no model can properly allocate VRAM anymore. Even after killing LM Studio or llama-server processes, VRAM usage doesn’t return to normal.

So the speed degradation and allocation errors above aren’t “always happening” — they’re “once it fails, the whole session is poisoned.” The driver fails to release Vulkan resources and never returns the allocated device memory. In a UMA environment, VRAM leaks directly impact system-wide memory availability since VRAM and system RAM share the same physical memory pool.

This discovery changes the interpretation of the test results. The 9.5 t/s and 4.4 t/s numbers from earlier were likely measured in a state with accumulated memory leaks. In a clean state right after reboot, the numbers might be somewhat better. Still unstable either way, though.

Timeline

graph TD
    A["3/1: Driver 26.2.2<br/>Q6_K Vulkan 34-48 t/s<br/>Working normally"] --> B["3/9: Current driver build date"]
    B --> C["3/13: AMD chipset driver update"]
    C --> D["3/16: Q6_K load attempt<br/>ErrorOutOfDeviceMemory"]
    D --> E["3/22: AMD Software 26.3.1<br/>installed"]
    E --> F["3/23: Q4_K_XL mmap=false<br/>19.5 t/s works but slow"]
    F --> G["3/27: LM Studio 0.4.8 update"]
    G --> H["3/28: Q4_K_XL mmap=true<br/>22.6 t/s but garbage output"]
    H --> I["3/28: Q5_K_XL / Q6_K<br/>ErrorOutOfDeviceMemory"]
    I --> J["3/28: Memory leak discovered<br/>Reboot restores 50+ t/s"]

    style A fill:#4CAF50,color:#fff
    style D fill:#FF9800,color:#fff
    style H fill:#f44336,color:#fff
    style I fill:#f44336,color:#fff
    style J fill:#2196F3,color:#fff
DateEventVulkan Performance
3/1Article written. Driver 26.2.234-48 t/s, working normally
3/9Current driver build date-
3/13AMD chipset driver update-
3/16Q6_K load attemptErrorOutOfDeviceMemory
3/22AMD Software 26.3.1 installed-
3/23Q4_K_XL load (mmap=false)19.5 t/s (works but halved)
3/27LM Studio 0.4.8 update-
3/28Q4_K_XL (mmap=true)22.6 t/s but garbage output
3/28Q5_K_XL / Q6_KErrorOutOfDeviceMemory
3/28Memory leak discoveredReboot restores 50+ t/s

A few notable points.

Q6_K was already failing with ErrorOutOfDeviceMemory on 3/16. The chipset driver update on 3/13 might have been the trigger. AMD Software 26.3.1 wasn’t installed until 3/22, so the problem predates that.

On 3/23, Q4_K_XL ran at 19.5 t/s, but the same model was doing 35 t/s in early March. It looked like it was “working,” but performance had already halved. VRAM placement was probably only partially functional. Given the memory leak issue, leaks had likely already accumulated by this point.

On 3/28 I confirmed the memory leak. Since 50+ t/s comes back right after a reboot, the driver’s memory management does work correctly in its initial state. Leaks start after load failures or errors and accumulate throughout the session.

Isolating the Problem

Here’s the evidence for whether this is a driver issue or a llama.cpp issue.

Evidence Pointing to Driver

Possible llama.cpp Contribution

  • Things got worse on b8560 (possible Vulkan memory allocation strategy change)
  • But b8183 was already degraded, so it’s not the root cause

UMA-Specific Issues

  • UMA environments handle Vulkan memory types (DEVICE_LOCAL, HOST_VISIBLE, etc.) differently from discrete GPUs
  • The driver may not be correctly advertising DEVICE_LOCAL memory in the UMA environment, or selecting the wrong memory type during allocation
  • Driver issues in this UMA environment have been a consistent problem since the initial EVO-X2 setup. The “placed in shared memory instead of VRAM” issue found in the VRAM allocation article was confirmed fixed in driver 26.2.2, but the same symptoms are back in 26.3.1
  • Memory leaks are especially severe in UMA because leaked VRAM also impacts system RAM availability. With a discrete GPU, a GPU memory leak leaves system RAM untouched, but with UMA it eats into the shared pool

The gfx1151 Vulkan problems aren’t just me — they’ve been reported across multiple projects.

IssueDescriptionStatus
lmstudio-ai/lmstudio-bug-tracker#1048Vulkan v1.52.0 doesn’t use VRAM on gfx1151. 3x slowerfixed-in-next-update
ggml-org/llama.cpp#16832Vulkan UMA memory detection bug (only recognizes 32GB)Closed (PR #17110)
ggml-org/llama.cpp#20354Missing GATED_DELTA_NET shader in Vulkan (gfx1151)Open
ggml-org/llama.cpp#18741Strix Halo Vulkan load failure. Workaround: --no-direct-ioClosed
GPUOpen-Drivers/AMDVLK#413AMDVLK 2GB allocation limitOpen
LostRuins/koboldcpp#1980koboldcpp also can’t detect 8060S VRAM-

The same problems appear in koboldcpp and LM Studio, not just llama.cpp, which confirms this is a driver-side issue. The AMDVLK#413 2GB allocation limit could be directly related to the ErrorOutOfDeviceMemory seen here.

What I Tried

WorkaroundResult
--no-mmapLoad succeeds but GPU not used (9.5 t/s)
--no-direct-ioQ6_K loads but GPU not used (4.4 t/s)
llama.cpp b8560 (latest)Worse. Can’t even allocate 0.9GB
LM Studio mmap OFFGets mmap=false but ErrorOutOfDeviceMemory
PC rebootClears memory leak, 50+ t/s restored

Post-Reboot Test Results

After discovering the memory leak, I rebooted and ran the same command from the article in a clean state.

llama-server.exe -m "Huihui-Qwen3.5-35B-A3B-abliterated.Q4_K_M.gguf" --port 8080 --ctx-size 4096 --reasoning-budget 0 --n-gpu-layers 99 --no-mmap
ItemResult
LoadSuccess (Vulkan0 = 19,905 MiB)
Generation speed54.9 - 57.7 t/s
Article benchmark49.18 t/s
Output qualityNormal (Japanese output fine)

Right after reboot, speeds exceeded the article’s benchmark. However, 272 MiB was placed in Vulkan_Host, so data isn’t entirely in VRAM (some goes through shared memory). On driver 26.2.2, no Vulkan_Host spillover occurred, confirming that 26.3.1 changed the memory placement logic.

Reproducing the Memory Leak

Comparison before and after reboot.

StateResult
Before rebootDevice memory allocation of size 1025582592 failed — 1GB allocation failed
After rebootVulkan0 model buffer size = 19905.15 MiB — 19.9GB allocated successfully, 54.9 t/s

Repeated Vulkan load failures (ErrorOutOfDeviceMemory) cause the driver to never release device memory. Eventually even 0.9GB can’t be allocated. Only a PC reboot recovers it. This means that “fail → retry” loops during testing progressively worsen the state. Large model load attempts need to be done carefully.

Root Cause

Investigation revealed multiple overlapping issues.

  1. AMD Driver 26.3.1 Vulkan Memory Management Regression: The driver (built 2026-03-09) regressed Vulkan device memory management for the Radeon 8060S UMA. The driver reports 54GB of free Vulkan memory, but in practice some allocations land in shared memory, and large models (Q5_K_XL 24.6GB+) fail with ErrorOutOfDeviceMemory
  2. Vulkan Device Memory Leak: The driver doesn’t release memory on load failure. Repeated attempts exhaust device memory until even small allocations fail. Only recoverable by PC reboot
  3. Shared Memory Fallback: Part of the model data (Vulkan_Host buffer) gets placed in shared memory instead of VRAM. This didn’t happen on driver 26.2.2. Performance is acceptable for now, but it’s not the “shared memory not used” state

Fix: Change BIOS VRAM Allocation to 32GB/32GB

Changing the BIOS from 48GB VRAM / 16GB system to 32GB VRAM / 32GB system resolved the ErrorOutOfDeviceMemory.

Model48/1632/32
Q4_K_M abliterated (19.7GB)54.9 t/s (only right after reboot)Works fine
Q6_K (26.8GB)ErrorOutOfDeviceMemory53.7 t/s

Vulkan free memory reported values.

Configfree memoryQ6_K
48/1654,305 MiBWon’t load
32/3246,522 MiBLoads (53.7 t/s)

The reported VRAM is lower, yet the loadable model size is larger.

Why Less VRAM Loads Bigger Models

AMD driver 26.3.1 uses system RAM as a transfer buffer during model loading. With 48/16, the system side only has 16GB, and after OS + Parsec + Tailscale etc. consume about 7GB, the remaining ~9GB isn’t enough for transfer buffers, causing ErrorOutOfDeviceMemory.

With 32/32, the system side has about 25GB free, providing ample room for transfer buffers. Vulkan reports less VRAM on paper, but the effectively loadable model size increases.

graph LR
    subgraph "48/16 Config"
        A1["VRAM 48GB"] --> B1["Q6_K 26.8GB<br/>Load attempt"]
        C1["System 16GB<br/>~9GB free"] --> D1["Transfer buffer<br/>insufficient"]
        D1 --> E1["ErrorOutOfDeviceMemory"]
    end

    subgraph "32/32 Config"
        A2["VRAM 32GB"] --> B2["Q6_K 26.8GB<br/>Load success"]
        C2["System 32GB<br/>~25GB free"] --> D2["Transfer buffer<br/>sufficient"]
        D2 --> B2
    end

    style E1 fill:#f44336,color:#fff
    style B2 fill:#4CAF50,color:#fff

In the previous VRAM allocation analysis, the conclusion was “more VRAM is better.” But with driver 26.3.1, you also need headroom on the system RAM side. In a UMA environment, the VRAM/system RAM split directly affects driver behavior, so the optimal configuration can change with each driver update.

A few caveats.

  • 397 MiB is placed in Vulkan_Host, so it’s not running entirely from VRAM
  • On driver 26.2.2, Q6_K worked fine with 48/16, so this workaround is specific to driver 26.3.1
  • Repeated Vulkan load failures cause memory leaks, so reboot immediately after a failure

Operating Procedure

  1. Set BIOS to 32GB VRAM / 32GB system
  2. Run llama-server directly with b8183 + --no-mmap
  3. If Vulkan load fails, reboot immediately (memory leaks make things progressively worse)
  4. LM Studio has separate Vulkan backend issues — direct llama-server execution recommended

Didn’t expect a BIOS settings change to fix this. Reducing VRAM from 48GB to 32GB and having Q6_K load successfully is counterintuitive. The previous VRAM allocation analysis concluded “more VRAM is better,” but after a driver change, the system RAM side needs headroom too. From the initial EVO-X2 setup to VRAM allocation analysis, the Ollama total failure, the driver update fixing everything, and now this regression plus BIOS workaround — this is the fourth round of troubleshooting. UMA + RDNA 3.5 is not mature yet.