#Machine Learning

13 articles

Tech May 17, 2026 9 min

BERT for search and OCR: MLM mechanics, WordPiece, and encoder successors

Why Google added BERT to search in 2019, how MLM training really works (15% mask, 80/10/10, WordPiece), and where encoder-only models still beat LLMs — rerank, classification, and OCR correction.

AI BERT NLP Search Machine Learning Python

Tech May 11, 2026 7 min

Wildfire Evacuation AI Puts Policy Constraints in the Distillation Loss, Not a Post-Processing Filter

A DEV Community article proposes cross-modal distillation for wildfire evacuation routing that encodes road closures and AQI thresholds directly into the loss function. I look at the teacher-student gap when the student drops satellite imagery, why 23ms edge inference is irrelevant if sensor data is 5 minutes old, and what's missing for production.

AI Machine Learning Multimodal Realtime

Tech Apr 30, 2026 updated 10 min

NII's 48,000-Hour Audio Dataset Is Raw Material for TTS

NII/LLMC released CC Audio and Archive.org Audio Dataset. URL lists, metadata, and a downloader covering 48,000+ hours of Japanese audio. What it actually contains and how it fits into TTS, ASR, and audio model training.

AI Voice AI Speech Synthesis Speech Recognition TTS STT LLM Machine Learning

Tech Apr 24, 2026 9 min

TRACER trains a surrogate from LLM classification API logs and swaps in via a parity gate

TRACER, a recent arXiv paper, takes the input/output logs of an LLM classification endpoint and reuses them as training data, then swaps in a lightweight surrogate only on regions that pass a parity gate to cut inference cost. The surrogate absorbs 83–100% of traffic on a 77-class intent dataset and 100% on a 150-class one, while correctly refusing to deploy on an NLI task — that refusal behavior is the interesting part.

AI LLM Machine Learning Paper Inference Optimization

Tech Apr 9, 2026 15 min

MegaTrain Trains a 120B-Parameter LLM on a Single GPU at Full Precision

MegaTrain flips the GPU-centric paradigm by treating CPU memory as primary storage and the GPU as a transient compute device, enabling full-precision training of 100B+ LLMs on a single GPU with up to 12.2x throughput over DeepSpeed ZeRO-3.

LLM Machine Learning GPU DeepSpeed Memory Optimization

Tech Apr 1, 2026 9 min

TRL v1.0 is a major release that gives LLM post-training a stable foundation

Hugging Face's LLM post-training library TRL has reached v1.0. Stable/Experimental tiers, the stabilization of GRPO/DPO/SFT, and a roadmap that includes asynchronous GRPO all point to a more mature stack.

AI Machine Learning Reinforcement Learning LLM Open Source

Tech Mar 31, 2026 6 min

Cloudflare opens Client-Side Security's GNN+LLM detection to everyone and cuts false positives by 200x

Cloudflare added a two-stage GNN+LLM cascade to its client-side malicious script detection, reducing false positives per unique script from 1.39% to 0.007% and opening the formerly paid Advanced features to self-serve customers.

Cloudflare Security GNN LLM XSS Supply Chain Machine Learning

Tech Mar 23, 2026 14 min

Packaging the BERT + Qwen OCR Correction Pipeline as a Python Tool

The three-stage pipeline of BERT perplexity scan → LLM judgment → escalation packaged as a cross-platform Python tool. The installer automatically downloads llama-server and GGUF models.

NLP OCR Machine Learning Python BERT LLM llama.cpp Qwen NDLOCR-Lite Gradio Ollama Experiment

Tech Mar 11, 2026 7 min

Design patterns for LLM asynchronous training seen in 16 open source RL libraries

HuggingFace conducts a comparative analysis of 16 open source RL training libraries based on 7 design axes. In the synchronous type, the GPU utilization remains at around 60% due to the generation bottleneck, but with an asynchronous separation design it can be improved to over 95%.

A.I. Machine Learning Reinforcement Learning LLM

Tech Feb 28, 2026 15 min

Automated OCR Error Detection and Correction with Encoder Models + Local LLM

Experiment log: from LUKE/BERT fill-mask fine-tuning, to perplexity-based error detection, to Qwen2.5 7B correction judgment with human escalation on mismatch. A complete pipeline running on a single RTX 4060 Laptop with 8GB VRAM.

NLP OCR Machine Learning Python BERT LUKE Ollama LLM WSL2 NDLOCR-Lite Experiment

Tech Jan 31, 2026 4 min

Gradience: a tool that audits whether your LoRA rank is actually necessary

An introduction to Gradience, a tool that quantifies whether a LoRA rank setting is excessive using singular value decomposition. In experiments on Mistral-7B, halving the rank improved accuracy.

LoRA Machine Learning LLM Fine-Tuning

Tech Jan 11, 2026 4 min

What happened to emotion-detection APIs from facial expressions?

Emotion recognition used to mean fighting with old native libraries. Today there are cloud APIs and local libraries, but one major vendor has already left the field for ethical reasons.

AI API Machine Learning