Tested Qwen3.7 Max and Plus proofreading a Japanese novel: both barely fix, split on quote punctuation and names, and the one 'typo' was a character name.
Why Google added BERT to search in 2019, how MLM training really works (15% mask, 80/10/10, WordPiece), and where encoder-only models still beat LLMs — rerank, classification, and OCR correction.
The three-stage pipeline of BERT perplexity scan → LLM judgment → escalation packaged as a cross-platform Python tool. The installer automatically downloads llama-server and GGUF models.
Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.