Why Google added BERT to search in 2019, how MLM training really works (15% mask, 80/10/10, WordPiece), and where encoder-only models still beat LLMs — rerank, classification, and OCR correction.
The three-stage pipeline of BERT perplexity scan → LLM judgment → escalation packaged as a cross-platform Python tool. The installer automatically downloads llama-server and GGUF models.
Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.