#PyTorch

6 articles

TechJun 26, 202613 min

Wan 14B & FastWan on Radeon 8060S: ZLUDA fails, TheRock gfx1151 wheel works

Tested local Wan video gen on a Radeon 8060S (Strix Halo, 48GB UMA, Windows). ZLUDA can't run stock PyTorch; AMD's TheRock gfx1151 wheel gives native ROCm. FastWan 1.3B in 4min, Wan 14B I2V in 13.6min — VAE decode and 16GB-RAM Segfaults are the real limits.

AI Video Generation Wan ROCm AMD PyTorch Experiment

TechApr 21, 2026updated19 min

TRELLIS.2 trellis-mac port tested on M1 Max 64GB: setup, generation time, MPS bottlenecks

Hands-on run of trellis-mac (the CUDA-free port of TRELLIS.2) on M1 Max 64GB. Setup via uv with PyTorch 2.11.0 MPS, applied mps_compat.py patches, and recorded actual generation time vs the M4 Pro 24GB 3.5-minute reference, plus where the bottlenecks land on Apple Silicon.

AppleSilicon MPS PyTorch 3D ML 実験

TechApr 20, 2026updated9 min

Running TRELLIS.2 on Apple Silicon MPS: a CUDA-free port

A port that replaces TRELLIS.2's CUDA-only libraries (flash_attn, nvdiffrast, sparse 3D convolution) with pure-PyTorch equivalents and runs Microsoft's 4B image-to-3D model on an M4 Pro in about 3.5 minutes without any NVIDIA GPU.

AppleSilicon MPS PyTorch 3D ローカルLLM ML

TechApr 16, 202614 min

How Far Has AMD ROCm Come in Catching Up to CUDA?

Based on EE Times' interview with AMD AI Software VP Anush Elangovan, we assess the ROCm vs CUDA ecosystem gap. Includes hands-on experience with ROCm breaking four times on Strix Halo, plus practical guidance on choosing between NVIDIA, AMD, and Apple Silicon.

AMD NVIDIA ROCm CUDA GPU AI Infrastructure PyTorch MLX Apple Silicon

TechMar 26, 2026updated11 min

Qwen Image Edit on M1 Max went 80s→10min after a ComfyUI update: MPS BF16 is the cause

Diagnosed a 7x speed regression for Qwen Image Edit on M1 Max 64GB ComfyUI after an update. Root cause: MPS BF16 matmul runs ~2x slower than FP16, compounded by an FP16 attention bug. Benchmark numbers and the working fix.

ComfyUI Qwen Apple Silicon MPS PyTorch Experiment

TechFeb 13, 2026updated5 min

Fixing Corrupted ComfyUI Upscale Output on Mac MPS with contiguous()

Upscaling images loaded via the Load Image node was producing garbled output. Fixed it by addressing the non-contiguous tensor issue — a one-line patch to comfy/utils.py. Added a 2026-04-29 follow-up after a ComfyUI update wiped the patch and the bug came back, with the upstream PyTorch issue and a recurrence-detection snippet.

ComfyUI Apple Silicon PyTorch MPS Experiment