How to Stabilize the WebSpeech API on iOS
I tried porting the voice chat I built in AI Voice Chat (3) — Finally got it talking to the web. It ran smoothly on a PC, but when I spoke from an iPhone, input would arrive in choppy fragments or it would stop responding altogether. Even though it’s the same Apple ecosystem, Safari on macOS was fine—only iOS misbehaved. iOS’s WebSpeech API has many issues such as “stopping on its own,” “buffer clogging,” and “no recognition on the first attempt.”
You could solve this by using paid services like the Whisper API, but here are practical, no-cost countermeasures.
Basic Approach
- Singleton instance — Don’t
newit every time (prevents the system chime). - Push-to-talk — More stable than auto-restart with
continuous. - Warm up the mic beforehand — Mitigates first-recognition failure.
Implementation Example
// シングルトンで生成(ページ読み込み時に1回だけ)
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
recognition.lang = 'ja-JP';
recognition.interimResults = true;
recognition.continuous = true;
const btn = document.getElementById('micBtn');
// Push-to-Talk
btn.addEventListener('touchstart', (e) => {
e.preventDefault();
recognition.start();
});
btn.addEventListener('touchend', () => {
recognition.stop();
});
// ボタン外に指が出た時も止める
btn.addEventListener('touchcancel', () => recognition.stop());
// PC対応
btn.addEventListener('mousedown', () => recognition.start());
btn.addEventListener('mouseup', () => recognition.stop());
btn.addEventListener('mouseleave', () => recognition.stop());
// 結果処理
recognition.onresult = (event) => {
for (let i = event.resultIndex; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
const text = event.results[i][0].transcript;
console.log('認識結果:', text);
// ここでUIに反映
}
}
};
recognition.onerror = (event) => {
console.warn('エラー:', event.error);
};
Countermeasures for First-Recognition Failure
Warm up the mic in advance
async function warmupMic() {
try {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
stream.getTracks().forEach(track => track.stop());
} catch (e) {
console.warn('マイク許可が必要です');
}
}
// 初回ユーザージェスチャーで呼ぶ
document.body.addEventListener('click', () => {
warmupMic();
}, { once: true });
Unlock AudioContext
function unlockAudio() {
const ctx = new (window.AudioContext || window.webkitAudioContext)();
const buf = ctx.createBuffer(1, 1, 22050);
const src = ctx.createBufferSource();
src.buffer = buf;
src.connect(ctx.destination);
src.start(0);
ctx.resume();
}
Prime recognition with an empty run
function preloadRecognition() {
recognition.start();
setTimeout(() => recognition.stop(), 100);
}
Visual Feedback: “You can start talking”
Starting the mic takes a moment, so communicate the wait to the user.
btn.addEventListener('touchstart', async (e) => {
e.preventDefault();
recognition.start();
await new Promise(r => setTimeout(r, 300));
btn.classList.add('ready'); // ここで「話していいよ」表示
});
btn.addEventListener('touchend', () => {
recognition.stop();
btn.classList.remove('ready');
});
continuous: true vs false
| Mode | Pros | Cons |
|---|---|---|
continuous: false + auto-restart | Tends to be stable on iOS | Brief gap when restarting |
continuous: true + singleton | Less choppy, fewer sounds | Risk of buffer clogging on iOS |
With push-to-talk, continuous: true is fine. If you want it to keep listening automatically, use this hybrid:
const isIOS = /iPad|iPhone|iPod/.test(navigator.userAgent);
recognition.continuous = !isIOS;
let shouldBeListening = false;
recognition.onend = () => {
if (isIOS && shouldBeListening) {
setTimeout(() => recognition.start(), 200);
}
};
Background Handling
It tends to die when the page goes to the background, so add a guard.
document.addEventListener('visibilitychange', () => {
if (document.hidden) {
recognition.stop();
}
});
window.addEventListener('focus', () => {
// 必要なら再開処理
});
Conclusion
- Perfect reliability is unrealistic — That’s just how iOS’s WebSpeech API is.
- Push-to-talk + singleton + mic warm-up is the pragmatic answer.
- For production use, consider paid services such as the Whisper API.