An explanation of the difference between conventional OCR and VLM (vision-language model) based OCR. Introduces DeepSeek-OCR and explores the possibility of combining both approaches.
From browser OCR and server-side OCR to cloud APIs and AI — a roundup of what I learned trying to implement Japanese OCR on the web, including the limits of each approach.
When Layout Parser wouldn't install and NDLOCR alone couldn't handle a 4-column vertical text book, I used PyMuPDF and histogram analysis to brute-force split the columns.