#OCR

17 articles

TechJan 20, 20264 min

The rise of VLM-based OCR - DeepSeek-OCR and the potential of hybrid use

An explanation of the difference between conventional OCR and VLM (vision-language model) based OCR. Introduces DeepSeek-OCR and explores the possibility of combining both approaches.

AI OCR DeepSeek VLM

TechDec 7, 20257 min

Japanese OCR on the Web in 2025: Limits and Lessons

From browser OCR and server-side OCR to cloud APIs and AI — a roundup of what I learned trying to implement Japanese OCR on the web, including the limits of each approach.

OCR JavaScript Tesseract.js NDLOCR Transformers.js AI Docker Google Cloud Vision PaddleOCR Japanese OCR Browser Experiment

TechDec 6, 20252 min

@paddlejs-models/ocr Doesn't Work in Browsers (as of 2025)

A record of my failed attempt to use PaddleOCR's JavaScript implementation in the browser.

JavaScript OCR PaddleOCR Troubleshooting Experiment

TechDec 1, 20253 min

Solving NDLOCR Column Layout Recognition with Histogram Analysis

When Layout Parser wouldn't install and NDLOCR alone couldn't handle a 4-column vertical text book, I used PyMuPDF and histogram analysis to brute-force split the columns.

NDLOCR OCR Python PyMuPDF Experiment

TechDec 1, 20254 min

How to Successfully Build the NDLOCR Docker Image

Points where building the NDLOCR Docker image gets stuck, and how to solve them.

Docker NDLOCR OCR Windows AI CUDA Experiment