Japanese OCR on the Web in 2025: Limits and Lessons

Turns out there are more situations in professional work where you need OCR than you’d expect.

You tell a client:

“Please send the manuscript as text data!”

And what arrives is a PDF. Open it up and you find scanned handwritten images. Adobe Acrobat AI isn’t available, and the volume isn’t large enough to send to an OCR service — but uploading to a web OCR service is a security risk too.

On top of that, I’ve had clients ask whether OCR processing is even possible. It’s not our specialty though, so we don’t have anything production-ready for it.

If we could build this once as a web implementation, we’d never be stuck again.

…So I tried various approaches, and concluded that the options are more limited than expected, and you just have to pick one based on your use case. This article summarizes a comparison of OCR methods and their limitations as of December 2025.

Prerequisites: The Ideal Requirements

Host it on a server (works on static sites too)
Run entirely in the browser (no uploading images to a server)
Support Japanese
Don’t strain server bandwidth (no large dictionaries or WASM files to serve)

None of these can all be satisfied at once. Somewhere you have to compromise.

Comparison Table

Library	Environment	Accuracy	Japanese	Layout	Setup	Cost	Privacy
Tesseract.js	Browser	△	○	×	Easy	Free	◎
Transformers.js	Browser	△〜○	△ (manga only)	×	Medium	Free	◎
PaddleOCR (JS)	Browser (broken)	-	○	-	-	-	-
NDLOCR	Docker/Server	◎	◎	△	Hard	Free	◎
Cloud Vision API	Cloud	◎	◎	◎	Easy	Pay-per-use	△
AI (GPT-4V, etc.)	Cloud	○〜◎	◎	○	Easy	Pay-per-use	△

Let’s look at each in detail.

Browser OCR

Tesseract.js

Right now, if you want Japanese OCR in the browser, this is the only real option.

import { createWorker } from 'tesseract.js';

const worker = await createWorker('jpn');
const result = await worker.recognize(imageFile);
console.log(result.data.text);
await worker.terminate();

Japanese support
Runs in a Web Worker so it doesn’t block the UI
Easy setup (just install via npm)

A working demo is available on the Lab page.

Limits

The bandwidth problem is unavoidable.

On first run, it downloads the language data from a CDN (about 14MB for Japanese). The browser caches it so subsequent runs are fast, but the first time is heavy.

Even with GZIP compression, depending on your hosting setup, this can be a real problem. Self-hosting doesn’t help — 14MB is 14MB.

Accuracy also falls short compared to server-side OCR. It depends heavily on font and resolution, and struggles with noise and skewed images.

Transformers.js

A library for running Hugging Face Transformers in the browser, including OCR-capable models.

The problem is there are no Japanese models.

Model	Japanese	Notes
MGP-STR	×	Officially supported in v3.1, designed for English scene text
Florence-2	△	Has OCR capability, requires fine-tuning for Japanese
TrOCR	×	English only
manga-ocr (ONNX)	○	Manga-specific, not suited for general documents

The only Japanese-capable model, manga-ocr, is specialized for manga fonts and layouts and doesn’t work well on general documents.

It technically works, but there are no models for the job. If you want Japanese document OCR, just use Tesseract.js.

PaddleOCR (JavaScript)

I tried running PaddleOCR’s JavaScript port in the browser, but as of December 2025, it doesn’t work in browsers.

ReferenceError: Module is not defined

The Emscripten-compiled WASM references Node.js’s global variable Module, which doesn’t exist in the browser.

What I tried:

@paddlejs-models/ocr (npm)
Via esm.sh CDN
@gutenye/ocr-browser (alternative wrapper)

All failed. The full details are in a separate article.

The JS implementation seems immature, and browser support appears to be a low priority.

Cloud APIs and AI

Google Cloud Vision API

Accuracy is unmatched. Near-perfect. It reads multi-column layouts, vertical text, whatever you throw at it. It handles the layout problems that trip up every other OCR. Nothing else comes close.

But I’m not brave enough to put it on a blog.

There’s a free tier, but it’s pay-per-use after that. If a post goes viral and traffic spikes, you could rack up a huge bill. Too risky for personal projects.

AI (GPT-4V, Claude Vision, etc.)

Feed it an image and it can extract text using surrounding context quite intelligently — sometimes smarter than dedicated OCR models.

But two problems:

Hallucination: It over-infers and can generate text that wasn’t in the original. It becomes “creative writing” instead of “reading.”
Data leakage: Can’t be used for confidential documents. No telling where the data ends up.

Choose your use case carefully.

Server-Side OCR

NDLOCR (National Diet Library OCR)

Best accuracy around. Built specifically for Japanese documents, it blows everything else out of the water.

The problem is setup difficulty. It runs in Docker, but building it is a real pain. My successful setup procedure on my environment is in a separate article.

API Proof of Concept

Since NDLOCR runs in Docker, you can set up a server-side PHP interface to expose it as an API.

The following is a PoC-level code example only. Adapt it to your environment — no guarantees.

<?php
// ndlocr_api.php - minimal version, production use requires auth & validation

header('Content-Type: application/json');

if ($_SERVER['REQUEST_METHOD'] !== 'POST') {
    http_response_code(405);
    echo json_encode(['error' => 'POST only']);
    exit;
}

if (!isset($_FILES['image'])) {
    http_response_code(400);
    echo json_encode(['error' => 'No image uploaded']);
    exit;
}

$tmpFile = $_FILES['image']['tmp_name'];
$inputDir = '/path/to/ndlocr/input';
$outputDir = '/path/to/ndlocr/output';
$imageName = uniqid() . '.png';

// Place the image in the input directory
move_uploaded_file($tmpFile, "$inputDir/$imageName");

// Run OCR in the NDLOCR container
$cmd = sprintf(
    'docker exec ndlocr-container python main.py infer %s %s -s s -p 1',
    escapeshellarg("/input/$imageName"),
    escapeshellarg("/output")
);
exec($cmd, $output, $returnCode);

if ($returnCode !== 0) {
    http_response_code(500);
    echo json_encode(['error' => 'OCR failed']);
    exit;
}

// Read the result text (adjust path as needed)
$resultPath = "$outputDir/" . pathinfo($imageName, PATHINFO_FILENAME) . ".txt";
$text = file_exists($resultPath) ? file_get_contents($resultPath) : '';

// Clean up
@unlink("$inputDir/$imageName");
@unlink($resultPath);

echo json_encode(['text' => $text]);

For production use, you’d need:

Authentication (API key, etc.)
File size and type validation
Rate limiting
Better error handling
Async processing (OCR takes time)

Column Layout and Layout Recognition

Almost all OCR tools are weak here. There’s no standard solution.

The only exception is Google Cloud Vision API — it handles multi-column and vertical text without issue. But cost is a concern, so it’s not a universal answer.

Even NDLOCR struggles with four-column vertical text books. Combining it with layout analysis tools like Layout Parser doesn’t achieve perfection either.

In the end, you have to brute-force it. I tried slicing pages into images with PyMuPDF and cutting columns using histogram analysis in a separate article.

Layout recognition is still an unsolved problem in 2025. Assume it won’t be perfect no matter what approach you take.

Conclusion: Choose Based on Use Case

Easy setup + privacy → Tesseract.js
Accuracy first → NDLOCR (server required)
Money to spend and need more accuracy → Cloud Vision API
Context understanding needed → AI (watch for hallucinations)

High-accuracy browser-based Japanese OCR hasn’t been achieved as of 2025.