Can HappyHorse Run Locally?

Alibaba apparently released a video generation model called HappyHorse. The name sounds like another sketchy site, but this time there’s more to it.

TechNode reported that on April 10, 2026, Alibaba confirmed HappyHorse belongs to their ATH (Alibaba Token Hub) Innovation Unit. On Artificial Analysis’s video generation leaderboard, HappyHorse-1.0 sits near the top under the Alibaba-ATH name for Text-to-Video.

What matters to me isn’t “apparently it’s amazing” but: can I use it now, can I run it locally, and what GPU would I need? From my experience running LTX-2 and Wan 2.2 on an M1 Max 64GB, video generation has a huge gap between “fits in memory” and “runs at a usable speed.”

API-First as of April 27, 2026

Starting with what’s nearly confirmed.

Odaily’s April 20 article says HappyHorse-1.0 begins phased API testing from April 27, 2026, via Alibaba Cloud Bailian (Model Studio), initially targeting enterprise customers with commercial launch planned for May. The modelstudio.console.alibabacloud.com/ap-southeast-1 URL is the international region entry point for Model Studio.

However, the public docs still center on Wan-series models for video generation. Alibaba Cloud’s video generation documentation covers text-to-video, image-to-video, reference-based video, video editing, and digital humans. The video generation model list shows wan2.7-t2v, wan2.7-i2v, wan2.6-i2v-flash, and others as the recommended models.

So my read of the situation today:

Aspect	Status
Made by Alibaba/ATH	Practically confirmed
Model Studio access	Phased API test from April 27 per reports
Individual account access	Unconfirmed. Likely enterprise-first
Open weights	Not found
Local execution	Not yet possible. Definitely not right now

”API Available” and “Runs Locally” Are Different Things

Mixing these up leads to wrong conclusions.

On the Artificial Analysis page, HappyHorse-1.0’s API Pricing shows “Coming soon,” with no Open Weights link like LTX-2 or Wan 2.2 have. Since LTX-2 and Wan 2.2 A14B are listed as Open Weights on the same page, the leaderboard treatment is clearly different.

Third-party API provider Runware Docs lists alibaba:happyhorse@1.0 as a model ID, but the status is coming-soon. Specs mention T2V/I2V, 720p/1080p, 3-15 seconds, seed, watermark, and first-frame conditioning. But this describes an “API-callable design,” not a model you can download and load into ComfyUI.

Running locally requires at minimum:

Model weights
Inference code
Peripheral files (VAE, text encoder, etc.)
ComfyUI nodes, diffusers support, or a dedicated CLI
License and usage terms

None of these are available yet. So the answer to “Can I run HappyHorse locally?” as of April 27, 2026 is “No, because there are no open weights.” Spec estimates are possible, but there’s no execution path yet.

A 15B-Class Model Would Be Tough on M1 Max 64GB

Multiple community sources and peripheral articles suggest HappyHorse is 15B-class, generates video and audio simultaneously, and runs fast on H100. However, Alibaba Cloud’s official docs don’t confirm model size or recommended local GPU, so this remains unverified.

If it is a 15B-class video model, the local execution outlook is pretty grim.

Format	Rough Model Size	Runtime Assessment
FP16/BF16	~30GB+	Model alone might fit, but video latents, VAE, text encoder, and KV/intermediate tensors will blow past limits
FP8	~15GB+	Viable on NVIDIA. Apple Silicon tends to choke on FP8
4-bit quantized	~8GB+	Memory gets easier, but video quality and node support become issues

M1 Max 64GB has unified memory, which is different from a 64GB VRAM card. In my LTX-2 and Wan 2.2 article, I ran Wan 2.2 A14B in GGUF format on M1 Max 64GB, but 832x480 at roughly 2 seconds took 1 hour 22 minutes. LTX-2 worked in some GGUF scenarios, but the official pipeline produced NaN on MPS and quality wasn’t production-ready.

If HappyHorse includes simultaneous audio generation, there’s no reason to assume it’s lighter than Wan 2.2. Even if open weights drop, Mac might manage “it runs” but reaching production-usable speeds is unlikely.

If Waiting for RunPod, Look at 48GB+

When weights become available and you want to try RunPod, don’t default to the 24GB tier. Video generation has heavier intermediate tensors than image generation, and resolution, frame count, and sampler configuration can eat memory fast.

RunPod’s RTX 6000 Ada page lists 48GB VRAM, Secure Cloud $0.77/hr, Community Cloud $0.74/hr. At this price point, RTX 6000 Ada 48GB or L40S 48GB are the realistic starting options.

GPU	VRAM	Position for HappyHorse Testing
RTX 4090	24GB	Possible with quantized/low-res. Too tight for a first attempt
RTX 5090	32GB	Better than 24GB, but still nervous for 1080p video
RTX 6000 Ada / L40S	48GB	Start here. Best cost-to-headroom ratio
A100 80GB / H100 80GB	80GB	Fastest if benchmarks assume H100. Expensive for testing

My approach: start with RTX 6000 Ada 48GB right after open weights and ComfyUI/diffusers support arrive. Simple reason: the time wasted hitting OOM on 24GB and rebuilding the environment costs more. Run 720p/5s on 48GB first, check peak VRAM in the logs, then downsize to 5090 or 4090.

Renting an H100 upfront only makes sense if you want speed benchmarks matching official conditions. For hobby-level testing, the first goal is “confirming the workflow doesn’t break,” not peak performance.

Model Studio Still Runs on Wan 2.7

Looking only at Alibaba Cloud’s public docs, the current production video generation line is Wan 2.7/2.6. wan2.7-t2v handles high-quality text-to-video, wan2.7-i2v handles high-quality image-to-video, and wan2.6-i2v-flash is the cheaper image-to-video option. The text-to-video API example uses wan2.7-t2v-2026-04-25.

So even if HappyHorse enters Model Studio, it’ll likely start as a separate premium or experimental tier rather than replacing Wan. Particularly if its strengths are audio sync and multi-shot composition, it overlaps with Wan 2.7’s “audio synchronization, multi-shot narrative” capabilities.

From my use-case perspective, the existing decision framework hasn’t changed:

For local testing, use what’s available now: Wan 2.2 and LTX-2
For high-quality Cloud API, use Model Studio’s Wan 2.7 series
HappyHorse: evaluate once API invites or open weights land

In my January 2026 video generation AI article, I noted that i2v’s practical limit was “it struggles to produce what wasn’t in the source image” and “start/end frame specification is handy but intermediate frames are AI-decided.” No matter how strong HappyHorse is, without seeing how far it’s solved this controllability problem, leaderboard rankings alone aren’t enough to judge.

If HappyHorse Shows Up in the Console

Once HappyHorse appears in the Model Studio console, the first thing to check is whether the model ID can be fixed in API calls. Whether it’s invitation-only or callable from your own account changes the conversation entirely.

Does it accept only T2V, or also I2V, audio sync, and multiple reference images? For output, check 720p/1080p limits, duration caps, fps, and watermark behavior. Whether inference runs on ap-southeast-1 or mainland China also directly affects latency and data policy.

At the API testing stage, reproducibility matters more than pricing or SLAs. Can you get consistent results from the same prompt? Does seed work? How well does I2V preserve the reference image? For video generation models, the failure mode across 10 runs matters more than one lucky hit.

Release Announced but Not Connecting

Right after publishing this article, things moved. Alibaba AI engineer Yuichi Fujikawa announced on X: “HappyHorse, released today. Ahead of schedule, available on Model Studio right now.”

But X is flooded with “can’t see it” and “won’t connect” reports. Either it’s down from traffic or the staged rollout hasn’t reached all users yet.

This timing aligns with the “phased API test from April 27” reporting covered earlier. No weights were released, so nothing changes for the local execution story.