#OCR

17 articles

TechMay 2, 202614 min

OCR-Memory Lets Agents Recall History as Images

A read of arXiv:2604.26622 OCR-Memory. It renders agent execution history into images, uses Set-of-Mark to let a VLM pick relevant segments, then retrieves verbatim text from the original logs.

AI AIエージェント OCR VLM RAG トークン管理論文

TechApr 30, 202611 min

Using Confidence Scores to Reduce Human Review in Document Extraction

Designing field-level confidence thresholds for human-in-the-loop document extraction, and the OCR and threshold walls hit when automating journal entries with freee MCP.

AI OCR VLM MCP AI Agents API

TechMar 23, 202614 min

Packaging the BERT + Qwen OCR Correction Pipeline as a Python Tool

The three-stage pipeline of BERT perplexity scan → LLM judgment → escalation packaged as a cross-platform Python tool. The installer automatically downloads llama-server and GGUF models.

NLP OCR Machine Learning Python BERT LLM llama.cpp Qwen NDLOCR-Lite Gradio Ollama Experiment

TechMar 17, 20265 min

GLM-OCR (0.9B) sets a new SOTA for document parsing, so I checked columns, vertical text, and math support

Zhipu AI's GLM-OCR reaches 94.62% on OmniDocBench v1.5 despite using only 0.9B parameters. I dug into its layout parsing, vertical text handling, and math recognition.

AI OCR VLM GLM

TechMar 9, 20267 min

Running NDLOCR-Lite in an iOS Native App for On-Device OCR

Bundling NDLOCR-Lite's DEIMv2 + PARSeq with ONNX Runtime Mobile in an iOS app to run camera capture → perspective correction → layout detection → text recognition → confidence-based correction entirely on device.

OCR NDLOCR-Lite iOS Swift ONNX Runtime Mobile Development Experiment

TechFeb 28, 202615 min

Automated OCR Error Detection and Correction with Encoder Models + Local LLM

Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.

NLP OCR Machine Learning Python BERT LUKE Ollama LLM WSL2 NDLOCR-Lite Experiment

TechFeb 27, 20269 min

Building a Local OCR Hot Folder for Confidential Documents with ScanSnap + NDLOCR-Lite

A folder-watching script that automatically OCRs images scanned by ScanSnap and runs LLM-based correction. Includes air-gap security design.

OCR NDLOCR-Lite ScanSnap Python Mac Local LLM Experiment

TechFeb 27, 20268 min

Three Months with NDLOCR: The Full Journey and What Others Built

From Docker hell to Lite + LLM correction. A retrospective on my own experimentation, plus an introduction to someone else's browser-based NDLOCR-Lite implementation.

OCR NDLOCR NDLOCR-Lite Python Docker Local LLM ONNX WebAssembly Experiment

TechFeb 26, 202613 min

OCR Correction on Showa-Era Documents with NDLOCR-Lite and Local LLMs

Set up the CLI version of NDLOCR-Lite on Apple Silicon Mac, then tested OCR result correction with Qwen 3.5 and Swallow. Includes experiments with direct image reading and the anchoring effect.

OCR Python NDLOCR-Lite Mac Qwen Swallow Ollama Local LLM Experiment

TechFeb 25, 20268 min

Running the National Diet Library's OCR Engine NDLOCR-Lite on Windows

Tried the lightweight OCR tool NDLOCR-Lite released by the National Diet Library — installed it on Windows 11 and tested both the CLI and GUI versions.

OCR Python NDLOCR-Lite Experiment

TechFeb 1, 20264 min

PageIndex - tree RAG with LLM reasoning only, no vector search

I looked into PageIndex, a RAG system that builds hierarchical document trees using only LLM reasoning, without chunking or vector databases. I also consider how it fits with layout detection and OCR pipelines.

AI RAG LLM OCR Python

TechJan 30, 20264 min

PaddleOCR-VL-1.5 - document parsing SOTA with only 0.9B parameters

Baidu's PaddleOCR-VL-1.5 reaches 94.5% accuracy on OmniDocBench v1.5 with just 0.9B parameters, surpassing large models such as GPT-4o and Qwen2.5-VL-72B.

AI OCR VLM PaddlePaddle