Tech 8 min read

ACE-Step UI Turns Local Music Generation into a Production App

IkesanContents

fspecii/ace-step-ui caught my eye, so I dug into it.
The name suggests another ACE-Step 1.5 frontend, but it’s not a “generate and done” UI.
It bundles library management, a player, AudioMass editing, Demucs stem separation, and video generation into something closer to a production app.

I’ve previously looked at ACE-Step itself in V1.0 notes and V1.5’s overhaul.
This time the focus isn’t model performance—it’s the shell around the model for daily use.

The Production Layer, Not the Model

ACE-Step UI calls ACE-Step 1.5’s Gradio API as its AI engine.
The frontend is React 18 + TypeScript + Tailwind CSS + Vite; the backend is Express.js + SQLite + better-sqlite3.
Generation history and playlists are stored in a local DB, and the browser gets a Spotify-like library with a bottom player bar.

graph LR
    A[Browser UI] --> B[Express API]
    B --> C[SQLite]
    B --> D[ACE-Step 1.5<br/>Gradio API]
    B --> E[FFmpeg<br/>Demucs<br/>AudioMass]
    D --> F[Generated Audio]
    E --> F

The official README lists Node.js 18+, Python 3.10+ or the Windows Portable Package, FFmpeg, and uv as requirements on the UI side.
GPUs with 4 GB VRAM are listed as supported, but 12 GB+ is recommended if you want the LLM features too.
Same story as ACE-Step 1.5’s own low-VRAM support—“it runs” and “Thinking Mode runs comfortably” are different things.

Filling the Gaps Gradio Doesn’t Cover

ACE-Step 1.5 ships with a Gradio UI, but it’s focused on model execution and parameter tweaking.
What ACE-Step UI adds is everything after generation—how you store, listen, trim, and reuse tracks.

AreaWhat ACE-Step UI adds
GenerationSimple/Custom Mode, BPM, key, time signature, length, batch generation
PromptingLyrics editor, structure tags, prompt templates, reuse of past settings
OrganizationHistory, search, likes, playlists, local SQLite storage
PlaybackBottom player, waveform, progress display
ProcessingAudioMass editing, Demucs stem separation, FFmpeg processing
ExtrasVideo generation with Pexels backgrounds, gradient jacket art

The quietly annoying part of local music generation isn’t generation itself—it’s the mess that follows.
A model that spits out tracks in seconds means filenames, settings, lyrics, and variant versions become untraceable fast.
ACE-Step UI pulls that toward “the bare minimum of a music production app.”

Thinking Mode Is Not a Low-VRAM Feature

ACE-Step 1.5 moved to a hybrid architecture: an LM (language model) designs the track, and a DiT (Diffusion Transformer) produces the audio.
ACE-Step UI’s AI Enhance and Thinking Mode use the LM side to flesh out genre tags, BPM, key, time signature, and lyrics structure.

The UI README recommends disabling Thinking Mode on 4 GB GPUs, switching to the PT backend, using batch size 1, and keeping track length short.
ACE-Step 1.5’s own GPU Compatibility Guide says the same—below 6 GB, disable LM initialization and run DiT-only.

That gap matters.
”Runs on 4 GB” is appealing if you just want audio out locally.
But if you want prompt understanding and composition planning end-to-end, you need 12 GB+ GPU or accept CPU inference waits for the LM.

Compatibility with ACE-Step 1.5 XL

ACE-Step 1.5 added the 4B DiT XL variants on April 2, 2026: xl-base, xl-sft, and xl-turbo.
The official README targets 12 GB+ with offloading or 20 GB+ without.

ACE-Step UI is a thin production layer built on ACE-Step 1.5’s API, not just a pretty face.
That means it naturally inherits model-side upgrades.
You start the main Gradio API with --enable-api and point the UI’s ACESTEP_API_URL to http://localhost:8001.

cd /path/to/ACE-Step-1.5
uv run acestep --port 8001 --enable-api --backend pt --server-name 127.0.0.1

cd /path/to/ace-step-ui
./start.sh

The Windows Portable Package ships a start-all.bat expecting C:\ACE-Step-1.5.
On Linux/macOS, start-all.sh looks for ../ACE-Step-1.5; set ACESTEP_PATH if it’s elsewhere.

Before Calling It a Local Suno Replacement

The README pushes “Open Source Suno Alternative” pretty hard.
But right now, the more useful question is whether ACE-Step 1.5 can be treated as a production tool, not whether it matches Suno’s output.

As of April 27, 2026, ACE-Step UI sits at roughly 1.1k GitHub stars with no formal releases.
56 commits—momentum is there, but there’s no stable release milestone yet.
ACE-Step 1.5 itself has about 9.7k stars, 12 releases, with v0.1.7 from April 24, 2026 being the latest.

Judging it as a cloud service replacement means weighing audio quality, lyrics tracking, copyright risk, ownership of generated output, and project continuity.
Judging it as a production aid—local storage, history, editing, and stem separation in one window makes it well worth trying.

Setup Order

Don’t start with ACE-Step UI.
Confirm that ACE-Step 1.5 itself runs stably on your GPU first.
Get one track out via the native Gradio UI or API, then layer the UI on top.
That makes debugging much easier.

CheckWhy
Can ACE-Step 1.5 generate standalone?Model/GPU/PyTorch environment issues surface here first
Does --enable-api startup work?ACE-Step UI connects via Gradio API
Is FFmpeg installed?Needed for track duration display and audio processing
Is the LM config right for your VRAM?Directly affects Thinking Mode and AI Enhance
Where does generation history land?SQLite and audio files grow—backup strategy matters

Music generation involves more “find that track later,” “compare takes,” and “quick trim” than image generation.
That’s exactly why ACE-Step UI is interesting.
It’s not about model benchmarks—it’s about shelving local output as production material.

Mac Runs MLX + MPS Side by Side

ACE-Step 1.5 works on Mac, but the internals differ from Linux/Windows.
The LM side runs on Apple’s MLX framework; the DiT side uses PyTorch’s MPS backend.
On Linux/Windows both run unified under PyTorch + CUDA, but Mac splits them.

ACE-Step 1.5’s macOS scripts start_gradio_ui_macos.sh and start_api_server_macos.sh auto-set ACESTEP_LM_BACKEND=mlx and --backend mlx.
nano-vllm (fast LM inference engine) is excluded on macOS arm64; mlx-lm handles LM inference instead.

The catch is ACE-Step UI’s start-all.sh.
It launches ACE-Step via uv run acestep-api --port 8001 without passing --backend mlx.
That means start-all.sh might fall back to the PyTorch LM backend on Mac.
Safer to start ACE-Step separately with start_api_server_macos.sh and only run start.sh for the UI.
Or set ACESTEP_LM_BACKEND=mlx in ACE-Step’s .env.

Apple Silicon Memory and Model Sizes

Apple Silicon uses unified memory—there’s no dedicated GPU VRAM.
ACE-Step 1.5’s recommended VRAM table translates directly to “available memory.”

On a 16 GB Mac, the realistic line is the 2B turbo DiT only (no LM), with quantization and offloading.
32 GB fits the 2B turbo/sft + 0.6B or 1.7B LM.
XL variants (4B DiT) are tight on 32 GB—there’s a reported issue (#1081) where Autoscore triggers duplicate LM loading and crashes.
64 GB+ handles XL, but that Autoscore memory leak remains unfixed.

Two environment variables are near-mandatory on macOS.

export PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0
export PYTORCH_ENABLE_MPS_FALLBACK=1

HIGH_WATERMARK_RATIO=0.0 removes the MPS memory ceiling.
Without it, OOM errors are frequent.
But it also disables system-level memory protection, so running on insufficient memory can destabilize macOS itself.
MPS_FALLBACK=1 routes MPS-unsupported operations to CPU.
Without it, runtime errors surface for certain operations.

Stem Separation Runs in the Browser

Whether Demucs works on Mac is a non-issue for ACE-Step UI specifically.
Its stem separation uses ONNX Runtime Web for in-browser execution, not Python Demucs.
The htdemucs_embedded.onnx model runs via WebGPU or WASM, so it’s independent of the OS GPU backend.
Safari 18+ supports WebGPU; older browsers fall back to WASM.

ACE-Step 1.5’s own Gradio UI does have track separation, but that one runs Python Demucs through PyTorch MPS.
Some MPS operations fall back to CPU, so it’s not fast.

LoRA and Training Are Rough on Mac

If you want to use LoRA through ACE-Step UI eventually, Mac is heavily constrained right now.

LoRA application doesn’t work with MLX DiT (issue #559).
The UI accepts LoRA settings, but output is identical to no-LoRA.
Making LoRA actually apply requires disabling MLX DiT and running PyTorch MPS instead—slower and prone to MPS memory leaks.

LoRA training itself also hits a wall: immediate OOM on M3 Pro 36 GB is reported (issue #282).
Training demands roughly 45 GB, exceeding MPS limits, and HIGH_WATERMARK_RATIO=0.0 doesn’t stabilize it.
Serious LoRA work needs a CUDA environment.

Generation Speed Is Several Times Slower Than CUDA

Official benchmarks put A100 at under 2 seconds per track and RTX 3090 at under 10.
There are no official Mac benchmarks, but community reports suggest 3-10x slower than CUDA.
torch.compile() doesn’t work on MPS, INT8 quantization has limited MPS optimization, and nano-vllm is excluded.

ACE-Step UI’s workflow isn’t “one track and done,” though.
It’s about building a library over time—listening, editing lyrics, organizing.
If generation takes tens of seconds but you’re reviewing the previous track or editing lyrics in the meantime, you don’t need CUDA-level throughput.
Batch generation overnight is a different story, but for interactive use, Mac might be viable enough.