Create explainer videos in code with Remotion + VOICEPEAK

The technical articles on this site are pretty text-heavy. I’d been thinking they might be hard to read. Video is easier to digest, but video editing is a hassle… While chatting with Gemini, I learned about a framework called “Remotion.”

Apparently, you can write videos in React. Since VOICEPEAK supports a CLI, you can generate speech directly from Node.js. If I combine these, maybe I can automate technical explainer videos—here’s a summary of what I found.

What is Remotion

Remotion is a framework for creating videos with React. You write scenes as React components and control timing through props.

If generative video like Sora is “gacha,” Remotion is “engineering.” The same code always produces the exact same output. Because it’s reproducible, you can put it under version control and even wire it into CI.

The preview supports hot reload: when you save code, it updates immediately. It feels like building videos with the same rhythm as Astro’s dev server.

Recently it’s been getting attention thanks to a skill for Claude Code. There’s a template that automates everything from line generation to code creation—you just say “Create an intro video for X” to Claude.

remotion-voicevox-template - Template for a Zundamon & Metan banter video
Yasuna’s note article - An experiment making Yukkuri-style videos with Remotion

Why VOICEPEAK

Explainer videos need narration. VOICEVOX is well-known too, but I chose VOICEPEAK because it supports a CLI.

Item	VOICEVOX	VOICEPEAK
Price	Free	Paid (from ¥10,000)
Integration	HTTP API	CLI
Quality	Good enough	More natural
Emotion control	Limited	Fine-grained via parameters

VOICEVOX exposes an HTTP API, so you need to run a server and send requests. VOICEPEAK can be invoked directly from the terminal, making it easy to call from shell scripts or Node.js.

You can specify emotion parameters like --emotion happy=100, so you can fully control direction such as “be sad here” or “go high-energy here” in code.

I own the “Commercial Use 6 Narrators Set,” so I proceed with that assumption.

Licensing (YouTube monetization?)

The main concern is licensing for monetization. From what I found, if you have the commercial-use set, no additional license is required.

According to AHS’s official page, the following activities are allowed with just the normal software purchase:

YouTube ad revenue (AdSense)
Super Chat (tips)
Affiliate links
Paid channels (memberships)

Seeing “no additional license required” across the board is reassuring. The commercial-use set also permits client work, so licensing shouldn’t be a problem.

Setup

Steps when using the template above.

# リポジトリをクローン
git clone https://github.com/nyanko3141592/remotion-voicevox-template.git my-video
cd my-video
npm install

# VOICEVOXを起動しておく（アプリかDockerで）

# プレビューサーバー起動
npm start

Open http://localhost:3000 in your browser to see the preview. It includes demo lines and audio, so you can check it right away.

When using VOICEPEAK

If you use VOICEPEAK, first check the narrator names.

# Macの場合
/Applications/voicepeak.app/Contents/MacOS/voicepeak --list-narrator

Core code: Call VOICEPEAK from Node.js

The template is designed for VOICEVOX’s HTTP API, but you can modify it to call VOICEPEAK’s CLI instead.

Example when calling from Node.js.

const { execSync } = require('child_process');

const VP_PATH = '/Applications/voicepeak.app/Contents/MacOS/voicepeak';

function generateVoice(text, outputPath, narrator, emotion = 'happy=50') {
  const cmd = `"${VP_PATH}" --say "${text}" --narrator "${narrator}" --emotion ${emotion} --out "${outputPath}"`;
  execSync(cmd);
}

// 使用例（ナレーター名は --list-narrator で確認）
generateVoice('Remotionを使えば動画をコードで作れます', './voices/001.wav', 'Japanese Female 1', 'happy=100');

Pre-generate the audio files and have Remotion simply load them; this tends to be more stable.

Integration with Remotion

On the Remotion side, get the length of the generated audio and place it on the timeline.

import { Audio, Sequence, staticFile } from 'remotion';

export const MyVideo = () => {
  return (
    <Sequence from={0} durationInFrames={90}>
      <Audio src={staticFile('voices/001.wav')} />
      {/* キャラクター画像や字幕 */}
    </Sequence>
  );
};

You can obtain the audio duration with libraries like music-metadata. Combine that with your script data (JSON) and generate a Sequence per line; the video then assembles itself automatically.

I haven’t fully tried this yet, but moving from “clicking around in an editor” to “writing code and building” is appealing. I think it will be useful when turning technical articles into videos.

I already own VOICEPEAK, so the next step is to set up Remotion and give it a try.