Three Months with NDLOCR: The Full Journey and What Others Built

My first contact with the National Diet Library’s OCR engine “NDLOCR” was December 2025. Over the next three months I went from Docker version → brute-force multi-column → Lite version → LLM correction, writing four articles along the way. Meanwhile someone else had put Lite in the browser, and the different approaches were interesting enough to document together.

What I Did

Docker Hell (December 2025)

NDLOCR Docker Image Build: What Finally Worked

A colleague asked “does this even work?” and I casually poked at it — that was the beginning. The official Dockerfile was mid-migration to CUDA 12 and completely broken. I ended up using the community syoyo repository, manually swapping in the Python 3.8 get-pip.py, and finally got a build to pass. GPU required, CUDA dependencies, long build times. Just getting it running was a slog.

Multi-column Was Catastrophic, So I Used Brute Force (December 2025)

Solving NDLOCR’s Multi-column Recognition Problem with Histogram Analysis

Even after Docker was working, feeding a 4-column vertical-text book produced completely scrambled reading order. Tried to add Layout Parser but Detectron2 couldn’t build on Windows, so I gave up. Ended up using PyMuPDF to convert to images, detecting column boundaries from histogram valleys, splitting into 4 sections, then feeding those to NDLOCR. Brute force, but it worked.

Lite Version Came Out and Solved Everything (February 2026)

Running NDLOCR-Lite on Windows

NDLOCR-Lite was released in February 2026. No GPU needed, runs on 1GB DRAM, installs via pip. It just worked — shockingly easy after all the pain before. The design routes text lines to one of multiple models (for short/medium/long character counts) in a MoE-like fashion, and it’s fast even on CPU-only — a few seconds per image. Multi-column and table recognition is incomparably better than the Docker version.

Tested LLM Correction on Mac (February 2026)

OCR Correction on Showa-Era Documents with NDLOCR-Lite and Local LLMs

Set it up on Apple Silicon Mac and ran experiments feeding OCR results to a local LLM for correction. Compared Qwen 3.5 direct image reading against Swallow text correction. On 1963-era documents, Swallow rewrote “一方交通” as “一方通行” — updating historical vocabulary to modern phrasing rather than correcting OCR errors. Ended up concluding that the best approach is three stages — OCR → direct image reading → text correction — with a human confirming the diffs.

What Others Built

Someone Put NDLOCR-Lite in the Browser

ndlocrlite-web (yuta1984)

While I was building CLI pipelines, someone had already put the entire NDLOCR-Lite model set in the browser. It runs via ONNX Runtime Web’s WASM backend with no server communication required.

Technically it’s the same multi-stage pipeline as Lite itself — DEIMv2 for layout detection, PARSeq (in three models for 30/50/100 character widths) for character recognition — all reproduced in the browser. The combined 146MB of models downloads on first use and gets cached in IndexedDB; inference runs in a Web Worker so the UI stays unblocked.

My CLI environment processes one image in a few seconds, but WASM would be slower than native — I’m curious how much. I haven’t run it myself so I don’t know. The fact that no data is sent externally is a strong selling point for business use. When I wrote the browser OCR limits article, I was despairing at Tesseract.js accuracy. NDLOCR-Lite’s ONNX distribution is what makes browser implementation viable at all.

Someone Made a Go Binding CLI Tool Too

go-ndlocr (mattn)

mattn — well-known in the Go community — built a binding that calls NDLOCR’s ONNX models directly from Go. It wraps layout detection and character recognition as Go libraries and works as a CLI tool. The Go way: distributable as a single binary, no Python required.

Different Approaches to the Same Model

I went in the direction of “CLI + local LLM to intelligently correct OCR output afterward,” while the browser version went toward “eliminate installation entirely.” The same model producing such different things is interesting.

I’d been experimenting with browser OCR myself before, so it’s odd that I didn’t consider a browser implementation this time. The reasons: initial WASM load is heavy, downloading 146MB of models is heavy, and there’s a question of how much you trust the serving origin. Also, honestly, I’m not that desperate to do OCR. I just need something I can run quickly when needed at work — I’m not building a persistent OCR infrastructure.

That said, the browser version suggests different possibilities. Like using a smartphone camera to photograph a book page, OCR-detecting headings and words with NDLOCR-Lite, then using Web Speech API to read them aloud or play audio. Point your phone at an animal in a picture book and hear it make sounds; point at English vocabulary in a textbook and hear native pronunciation — analog meets digital. I wanted to do this with Tesseract.js but gave up when the Japanese accuracy was terrible. If NDLOCR-Lite’s accuracy works at browser speed, it might be practical depending on performance. I haven’t tested it though.

A native app approach also works. Build a camera OCR app in Flutter — ONNX Runtime Mobile + audio playback in one app. Bundle the models for offline operation, no trust-the-serving-origin problem.

The Significance of CC BY 4.0

NDLOCR-Lite’s license is CC BY 4.0 — attribution required, and you can bundle the ONNX models in an app and distribute it. Compatible with Go, Electron, Tauri, Flutter, including App Store distribution. A government agency releasing an OCR model under this license is genuinely generous, and it puts NDLOCR-Lite in a very accessible position among Japanese OCR options.

Raspberry Pi OCR Station

The developer explicitly states NDLOCR-Lite runs with 1GB DRAM. onnxruntime has official PyPI wheels for arm64, so on a Raspberry Pi 5 it should work with a pip install and run directly.

For speed: the Windows article showed Ryzen 7 5800HS CPU inference at 3–9 seconds per image. Pi 5’s Cortex-A76 would probably land around 10–30 seconds per image. For batch processing, that’s well within usable range. The simplest architecture is a hot-folder watcher — no FastAPI even needed.

flowchart LR
    A["Scanner"] -->|image output| B["Samba share<br/>scan_inbox/"]
    B -->|watch| C["Pi watch<br/>script"]
    C -->|OCR run| D["NDLOCR-Lite<br/>CLI"]
    D -->|output| E["ocr_output/<br/>TXT / JSON / viz images"]
    E -->|access via Samba| F["PC"]

Scanner writes to a Samba share, Pi watches and runs OCR, outputs results to another folder, PC reads results via Samba. Files remaining in the inbox survive a Pi restart for easy resumption.

Once you include LLM correction, you’d need something like Qwen 3.5, which obviously won’t fit on a Pi. At that point the Pi separation doesn’t make sense — easier to put everything on a GPU-equipped PC. The Raspberry Pi setup only makes sense when OCR alone is sufficient.

Cost is also somewhat marginal. A Pi 5 8GB model runs about 55,000 yen. Add power adapter, case, and SD card and you’re close to 60,000 yen. At that price you could buy a used PC — and NDLOCR-Lite would run more easily there. That said, a Pi draws only a few watts and is small enough to sit next to a scanner, so for always-on unattended operation it wins on electricity and footprint.

Classified Documents and Air Gaps

Where a Raspberry Pi setup really shines is processing sensitive documents. Personal records at local government offices, legal case files, hospital records — documents where sending them to Google Cloud Vision API may be a compliance violation in the first place. Even if a cloud OCR provider says “we don’t use it for training,” sending the data externally at all may be a non-starter.

Options side by side:

Adobe Acrobat local OCR: Mediocre Japanese, weak on vertical text
Tesseract: Japanese accuracy is poor
NDLOCR-Lite: Fully offline, Japanese-specialized, handles vertical text and old kanji, CC BY 4.0

Complete OCR workflow on a single Pi, fully disconnected from the network. Not even an intranet required. In environments where installing software on a PC requires IT approval, a standalone Pi can be easier to bring in.

The browser version appears secure in that “data isn’t sent externally,” but you still have to trust the JavaScript and WASM binary from the serving origin. Running a CLI on a Pi on your own network keeps the trust anchor with yourself — more defensible.

More on OCR in general: Browser OCR in 2025: Limits and Lessons