The three-stage pipeline of BERT perplexity scan → LLM judgment → escalation packaged as a cross-platform Python tool. The installer automatically downloads llama-server and GGUF models.
Bundling NDLOCR-Lite's DEIMv2 + PARSeq with ONNX Runtime Mobile in an iOS app to run camera capture → perspective correction → layout detection → text recognition → confidence-based correction entirely on device.
WAN 2.2 image-to-video on Windows + RTX 4060 8GB VRAM in ComfyUI. The 5B fp8 model produced rough output across three failed attempts; the 14B Rapid distilled model with --lowvram offloading hit 111 seconds per 2-second clip. Working setup and what to avoid.
Using tori29umai’s LoRA to automatically split facial parts, results from batching 28 images, and a log of running into the limits when attempting finer hair separation
Local video generation test on M1 Max 64GB: FP8 fails on Metal, GGUF gets Wan 2.2 running at 82 minutes for a 2-second clip, and LTX-2 hits NaN or unusable KSampler output on MPS.
Hands-on test of huihui-ai Qwen 3.5 abliterated models in Ollama: garbage-token failures, GLM-4.7-Flash chat-template breakage, and why the official model with thinking disabled worked better.
After a macOS update, tmux sessions started by cron lost access to the Keychain, causing Claude CLI batch jobs to silently fail. Diagnosing the issue, the fix, and why this is a structural macOS Keychain problem rather than a Claude CLI bug.
Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.
From Docker hell to Lite + LLM correction. A retrospective on my own experimentation, plus an introduction to someone else's browser-based NDLOCR-Lite implementation.