H Company's Holotron-12B uses a memory-efficient new design to lift PC-operation AI throughput to 8,900 tokens per second. Unsloth has released the beta of 'Studio,' a browser tool for no-code model fine-tuning.
AI Security for Apps reached GA, letting Cloudflare block prompt injection and PII leaks at the WAF layer. On the same day, it also launched RFC 9457-compatible error responses that replace HTML with JSON or Markdown when AI agents hit Cloudflare errors.
Anthropic has GA’d a 1M‑token context window. No surcharge for long context; image/PDF per‑request limit raised from 100 to 600. Achieved a frontier‑model best score on MRCR v2.
HuggingFace conducts a comparative analysis of 16 open source RL training libraries based on 7 design axes. In the synchronous type, the GPU utilization remains at around 60% due to the generation bottleneck, but with an asynchronous separation design it can be improved to over 95%.
Sarvam AI released 30B and 105B models trained entirely in India—from pretraining through RL—featuring support for 22 constitutionally recognized Indian languages and inference optimizations.
Andrej Karpathy released Autoresearch, a system where an AI agent autonomously runs machine-learning experiments on a GPU and tries 100 variants overnight. The article breaks down the mechanism and design so even readers with zero ML background can follow.
A summary of GPT-5.3 Instant’s hallucination reductions and safety regressions, GPT-5.4’s computer use, Tool Search, and 1M-token context, plus Saguaro’s 5× inference speedups.
AWS has made OpenAI API compatibility for the Bedrock Mantle distributed inference engine generally available, letting existing OpenAI SDK code run against open-weight models such as DeepSeek and Mistral.
All variants of huihui-ai's Qwen 3.5 abliterated produced garbage tokens. GLM-4.7-Flash abliterated had a broken chat template. The official version with thinking disabled turned out to be the right answer.