Compresr's YC-backed Context Gateway is a proxy between AI agents and LLM APIs. Its three pillars - preemptive summarization, tool output compression, and tool discovery - reduce wasted context-window usage.
Sakura Internet's "Sakura AI Engine" is an LLM inference platform compatible with OpenAI API. There is a free limit of 3,000 requests per month, and multiple models such as Kimi-K2.5 and gpt-oss-120b can be used domestically.
Cursor released Composer 2 without disclosing its base model; calling its OpenAI-compatible API revealed it is Kimi K2.5. This escalated into a licensing dispute, but a formal commercial agreement with Moonshot AI was subsequently confirmed.
AttnRes to replace Transformer's fixed residual combination with softmax attention in the depth direction. Demonstration with Kimi Linear 48B improved GPQA-Diamond +7.5pt and HumanEval +3.1pt. Training overhead was kept below 4% and inference below 2%.
H Company's Holotron-12B uses a memory-efficient new design to lift PC-operation AI throughput to 8,900 tokens per second. Unsloth has released the beta of 'Studio,' a browser tool for no-code model fine-tuning.
AI Security for Apps reached GA, letting Cloudflare block prompt injection and PII leaks at the WAF layer. On the same day, it also launched RFC 9457-compatible error responses that replace HTML with JSON or Markdown when AI agents hit Cloudflare errors.
Anthropic has GA’d a 1M‑token context window. No surcharge for long context; image/PDF per‑request limit raised from 100 to 600. Achieved a frontier‑model best score on MRCR v2.
HuggingFace conducts a comparative analysis of 16 open source RL training libraries based on 7 design axes. In the synchronous type, the GPU utilization remains at around 60% due to the generation bottleneck, but with an asynchronous separation design it can be improved to over 95%.
Sarvam AI released 30B and 105B models trained entirely in India—from pretraining through RL—featuring support for 22 constitutionally recognized Indian languages and inference optimizations.
Andrej Karpathy released Autoresearch, a system where an AI agent autonomously runs machine-learning experiments on a GPU and tries 100 variants overnight. The article breaks down the mechanism and design so even readers with zero ML background can follow.