#NLP

4 articles

TechJun 17, 20269 min

Qwen3.7 Max vs Plus on a Japanese novel: few fixes, a name misread as a typo

Tested Qwen3.7 Max and Plus proofreading a Japanese novel: both barely fix, split on quote punctuation and names, and the one 'typo' was a character name.

NLP BERT LLM Qwen API Experiment

TechMay 17, 20269 min

BERT for search and OCR: MLM mechanics, WordPiece, and encoder successors

Why Google added BERT to search in 2019, how MLM training really works (15% mask, 80/10/10, WordPiece), and where encoder-only models still beat LLMs — rerank, classification, and OCR correction.

AI BERT NLP Search Machine Learning Python

TechMar 23, 202614 min

Packaging the BERT + Qwen OCR Correction Pipeline as a Python Tool

The three-stage pipeline of BERT perplexity scan → LLM judgment → escalation packaged as a cross-platform Python tool. The installer automatically downloads llama-server and GGUF models.

NLP OCR Machine Learning Python BERT LLM llama.cpp Qwen NDLOCR-Lite Gradio Ollama Experiment

TechFeb 28, 202615 min

Automated OCR Error Detection and Correction with Encoder Models + Local LLM

Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.

NLP OCR Machine Learning Python BERT LUKE Ollama LLM WSL2 NDLOCR-Lite Experiment