The three-stage pipeline of BERT perplexity scan → LLM judgment → escalation packaged as a cross-platform Python tool. The installer automatically downloads llama-server and GGUF models.
Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.