Based on EE Times' interview with AMD AI Software VP Anush Elangovan, we assess the ROCm vs CUDA ecosystem gap. Includes hands-on experience with ROCm breaking four times on Strix Halo, plus practical guidance on choosing between NVIDIA, AMD, and Apple Silicon.
MegaTrain flips the GPU-centric paradigm by treating CPU memory as primary storage and the GPU as a transient compute device, enabling full-precision training of 100B+ LLMs on a single GPU with up to 12.2x throughput over DeepSpeed ZeRO-3.
Lemonade is AMD's open-source local AI server that manages multiple backends like llama.cpp and FastFlowLM across GPU/NPU/CPU, serving text, image, and audio generation through an OpenAI-compatible API.
After updating to AMD Software 26.3.1 on a GMKtec EVO-X2 (Ryzen AI Max+ 395), Vulkan backend fails to allocate device memory properly and falls back to CPU. Investigation and workaround by changing BIOS VRAM allocation from 48GB/16GB to 32GB/32GB.
Why ComfyUI breaks on NVIDIA Blackwell (sm_120) GPUs with 'no kernel image is available for execution' errors, and a working setup using PyTorch Nightly, xformers removal, SageAttention, and NVFP4 quantization. Tested on RTX PRO 6000 Blackwell.
Setup notes for running WAI-Illustrious SDXL v16 on ComfyUI with an 8GB RTX 4060 Laptop. 1024x1024 generates in ~15 seconds without --lowvram, and a LoRA still loads. CUDA 12.8 portable build and path gotchas included.
VectorWare has announced the first implementation of Rust's Future trait and async/await running on GPUs by adapting the Embassy executor to a GPU environment.