#Memory Optimization

3 articles

Tech Apr 9, 2026 15 min

MegaTrain Trains a 120B-Parameter LLM on a Single GPU at Full Precision

MegaTrain flips the GPU-centric paradigm by treating CPU memory as primary storage and the GPU as a transient compute device, enabling full-precision training of 100B+ LLMs on a single GPU with up to 12.2x throughput over DeepSpeed ZeRO-3.

LLM Machine Learning GPU DeepSpeed Memory Optimization

Tech Feb 15, 2026 updated 5 min

Optimizing VRAM and Memory Allocation on Strix Halo for Local LLMs

How to configure VRAM/main memory split on the GMKtec EVO-X2 (Strix Halo) for local LLM inference. A 29.6GB model ran fine with just 8GB of dedicated VRAM.

AI LLM Memory Optimization AMD LM Studio Experiment

Tech Feb 14, 2026 5 min

Why are image-generation VAEs so heavy? Comparing the Qwen-Image and HunyuanImage architectures

An explanation of why Qwen-Image-Edit's VAE is so heavy, how HunyuanImage 2.1 chose a 32x high-compression VAE instead, and how Kohya's memory-optimization work fits in.

AI Image Generation VAE Qwen HunyuanImage Memory Optimization