Tech 10 min read

FramePack F1 on RTX 4060 Laptop 8GB: 5s clip in 56min, the wall is RAM not VRAM

IkesanContents

This started with FastWan. It’s FastVideo’s accelerated take on the Wan family (Wan is an open video-generation model), promising video in just a few steps. If it could cut output time that much, it was worth a try. That light a motivation was enough to start checking whether I could run it on the laptop I had on hand.

The machine: RTX 4060 Laptop (8GB VRAM) / 32GB RAM / Windows 11. As an AI video-generation rig, it’s underpowered. The question was how far local video generation goes on a box like this.

Short version up front: VRAM was never the problem — it sat at 5.75GB the whole time. The wall was RAM. A 26GB model that won’t fit in 32GB of system memory spills to the pagefile, and a 5-second clip ended up taking 56 minutes.

FastWan’s speed doesn’t materialize on a 4060

I started by pinning down where FastWan’s speed actually comes from. The key is VSA (Video Sparse Attention). Ordinary dense attention (computing attention across every frame pair, brute force) scales roughly with the square of the frame count, but VSA drops the brute force and computes sparsely, easing that quadratic. That’s where the speedup is.

But the GPUs VSA targets are the H100 / A100 / 4090 — the 4060 isn’t on the list. On 8GB you fall back to dense attention plus offloading anyway, so you never see VSA’s speed. FastWan’s main selling point doesn’t come alive on the hardware I had.

With no reason left to pick it for speed, I changed the target. Pumping out short clips fast isn’t interesting. Even at some cost to quality, being able to produce a longer video in a realistic amount of time is worth more on a machine like this. What fits that direction is FramePack (by lllyasviel, based on HunyuanVideo 13B). Its design philosophy is the opposite of Wan’s.

TraitWan/FastWan (dense)FramePack
Time vs length~quadratic (explodes for long clips)linear (just proportional to length)
VRAM vs lengthgrows with length → OOMconstant (independent of length)
Min VRAM-6GB
Methodclip generationnext-frame prediction (i2v)

Memory is constant with respect to length, and time is linear. The floor is 6GB, so 8GB has room to spare. It’s i2v (image to video), so it’s continuous with the kana-chan i2v workflow I did last time. If length is the goal, this is it — so I went with FramePack’s F1 variant (an improved version that predicts forward one frame at a time).

I wasn’t worried about VRAM to begin with. Last time, in Running WAN 2.2 on an RTX 4060 (8GB VRAM) in ComfyUI, I squeezed the 22GB Wan 14B Rapid into 8GB with --lowvram and ran it at 4 steps / 111 seconds. Even on 8GB VRAM, a largish model loads as long as you can offload. I assumed the same would hold here.

Setup and the 40GB of models

Installation leaves no room to agonize. FramePack ships an official one-click Windows package (framepack_cu126_torch26, bundling CUDA 12.6 + PyTorch 2.6). Unpack it, bring the code up to date with a git pull (the update.bat equivalent), and you get the F1 launch script, demo_gradio_f1.py.

The heavy part is the initial model download — about 40GB total, pulled from three repositories.

RepositorySizeContents
hunyuanvideo-community/HunyuanVideo~16GBLlama-based text encoder + CLIP + VAE
lllyasviel/FramePack_F1_I2V_HY_20250503~26GB13B transformer (bf16)
lllyasviel/flux_redux_bfl~0.8GBSigLIP image encoder

Something already feels off at this point: VRAM never comes up once. What matters is disk, and the RAM I’d get stuck on next.

It crashes the moment it launches (WinError 1455)

Once the download finished and I launched it, it crashed during model loading, before generation even started.

OSError: ページング ファイルが小さすぎるため、この操作を完了できません。 (os error 1455)

The error: the paging file (virtual memory — the disk-backed area that covers what physical RAM can’t) is too small. The launch script loads models into CPU RAM in the order text encoder → VAE → main transformer, but it reads the transformer (26GB) while the text encoder (16GB) is still resident, so it demands about 43GB of RAM at peak. The test machine has 32GB physical, and the auto-managed pagefile had reserved only 7GB, putting the combined commit limit at about 39GB. 43GB doesn’t fit, and the safetensors load dies with a MemoryError.

Back to the original premise: 8GB of VRAM is comfortably enough. FramePack floors at 6GB and runs on DynamicSwap (keeping the model weights in CPU RAM and sending only the layers it needs to the GPU each inference step, then pulling them back), so only a few GB ever sit on the GPU at once. What was stuck wasn’t the GPU but the CPU-side memory. Behind “runs on 6GB VRAM,” it demands 20GB-plus resident on the CPU side. That’s the part you can’t read off the official tagline.

At first I thought I could dodge it by reworking the load order. I bypassed the Gradio UI and called the generation routine directly from a script, processing the prompt with the text encoder and freeing it immediately, then loading the transformer afterward — reordering the steps. If 16GB and 26GB never sit resident at the same time, the peak drops.

The text-encoder stage started passing. But it crashed again on the transformer load. The reason was DynamicSwap’s design itself. The transformer’s 26GB has to stay resident in CPU RAM not just during loading but throughout inference. Shaving the load-time peak doesn’t change the fact that 26GB remains in the end. Subtract what the OS and resident apps use (measured at a dozen-plus GB) from 32GB physical, and the effective free space is about 19GB. 26GB won’t fit no matter what.

So on a 32GB-RAM machine, the fix isn’t a trick — you have to increase virtual memory itself. Even someone who just runs the GUI’s run.bat should hit this WinError 1455 on 32GB. “8GB VRAM is plenty, yet it crashes on RAM” was the biggest snag this time. The straightforward fixes are these.

FixWhat it isCost
Enlarge the pagefilemanually grow virtual memory (e.g. initial 24GB / max 40GB)eats disk; needs admin rights + a reboot
Add physical RAM64GB or more removes the need for a pagefilehardware spend
Quantized modelfp8/GGUF versions via ComfyUI FramePackWrapperleaves the official package; separate investigation

Getting it through without touching settings

The proper fix is enlarging the pagefile. But for someone who just wants to make a video, pulling up admin rights, configuring virtual memory, and rebooting is a hassle. I couldn’t be bothered to go that far just for video generation, so I forced it through with two moves that touch no settings.

The first is freeing up disk. An auto-managed pagefile grows on demand as long as there’s free disk to grow into. I deleted unused old models to widen free space to about 54GB, creating room for it to expand.

The second, and the real one, is pre-warming the pagefile. Before loading the main model, I inserted a routine that allocates and frees memory from Python in 256MB increments. Demand 26GB all at once and the auto pagefile can’t expand fast enough, so the process dies instantly (it drops to exit 5 without even a traceback). But raise the commit a little at a time, in small steps, and the pagefile grows smoothly. Once it’s big enough, entering the real load means 26GB lands in the headroom already made.

With these two, I ran it to the end without touching any virtual-memory settings. That said, it’s nothing more than a stopgap to make it run; the speed problem remains entirely.

56 minutes for 5 seconds

Starting from a front-facing peace-sign image of kana-chan (1024×1024), I generated 5 seconds’ worth at F1’s default settings (25 steps, distilled CFG 10, TeaCache on, 640 resolution bucket). Here are the numbers that came out.

ItemMeasured
Output4.83s / 145 frames / 608×640
Total time3363s (56 min)
Effective speed23.2 s/frame
Per section806 / 794 / 764 / 1000s (nearly constant = linear)
Peak VRAM5.75GB (reserved) / 4.42GB (allocated)

The official laptop reference figures (3060 / 3070ti laptop) are 1.5–2.5 s/frame. This is roughly 10x slower than that. Peak VRAM is 5.75GB, still 2GB-plus of headroom against 8GB. It isn’t slow from maxing out VRAM.

Why it’s 10x slower

Peeking at nvidia-smi during generation, the answer was plain. GPU utilization was 2–12%, and VRAM use was 3–4GB out of 8GB. The GPU is barely working.

The reason is memory. The 26GB transformer doesn’t fit in physical RAM, so part of it gets evicted to the pagefile (disk), and every time DynamicSwap reads out all layers each step, it re-reads the evicted portion from disk. The GPU just waits for the next layer to arrive, spending more time waiting than computing. That’s why utilization stays stuck in the single digits.

FramePack’s “linear cost, low VRAM” isn’t a lie in itself. The per-section times are nearly constant — genuinely linear in length — and VRAM in the 6GB class is enough. But on 32GB RAM, the slope of that line is an order of magnitude steeper. If 5 seconds takes 56 minutes, a one-minute video is roughly 11 hours by simple arithmetic. The original goal — “long clips at practical speed” — doesn’t hold up on this memory configuration.

Quality holds up

The speed is rough, but the footage that came out was good.

Across all 145 frames, kana-chan’s face, hair, and clothes are preserved. F1 stacks predictions in the forward direction, which makes it resistant to the drift where the image wanders away from the original the longer it runs. As advertised, it doesn’t fall apart even toward the end. The motion is natural too — lowering the hand from the peace sign and resting it on the cheek connects in one smooth sequence. Hand it a single still image, and it moves this much.

So, can you do it on a laptop?

You can. The quality is good, too. But it’s slow. That’s the answer on a 4060 Laptop 8GB / 32GB RAM.

The point worth holding onto is that the bottleneck was RAM, not VRAM. The 8GB of VRAM had headroom to spare the whole way. What got stuck was the physical memory that can’t hold the entire 26GB model. To land in the official speed range (1.5–2.5 s/frame), you need enough RAM to avoid evicting the model to disk — effectively 48–64GB. At 32GB, a few-second test clip is the limit of patience, and minute-long footage isn’t realistic.

“Runs on 6GB VRAM” is true. But the real requirement is CPU-side memory, not the GPU, so if you’re weighing video generation on a laptop, look at RAM capacity before VRAM. The next thing to pursue is either packing in 64GB of RAM, or using the fp8/GGUF versions via FramePackWrapper for ComfyUI to cut the CPU-resident footprint itself.

References