Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.
From Docker hell to Lite + LLM correction. A retrospective on my own experimentation, plus an introduction to someone else's browser-based NDLOCR-Lite implementation.
Set up the CLI version of NDLOCR-Lite on Apple Silicon Mac, then tested OCR result correction with Qwen 3.5 and Swallow. Includes experiments with direct image reading and the anchoring effect.
Tried the lightweight OCR tool NDLOCR-Lite released by the National Diet Library — installed it on Windows 11 and tested both the CLI and GUI versions.
This article explains how Gradio 6's new gr.HTML component works. You can write HTML, CSS, and JavaScript directly inside Python and build interactive web apps without a separate build step.
A look at ACE-Step, the 'Stable Diffusion of music,' covering its architecture, features, installation, and expected performance on Apple Silicon before trying it on an M1 Max.
Microsoft released an open-source framework that can optimize almost any AI agent with reinforcement learning, with little to no code changes. It supports arbitrary frameworks such as LangChain, AutoGen, and Claude Agent SDK.
I looked into PageIndex, a RAG system that builds hierarchical document trees using only LLM reasoning, without chunking or vector databases. I also consider how it fits with layout detection and OCR pipelines.
Introducing a Mac mini M4 Pro to build an in-house RAG system. A plan for setting up a LoRA training environment during downtime while waiting for specs to be finalized.