An introduction to Gradience, a tool that quantifies whether a LoRA rank setting is excessive using singular value decomposition. In experiments on Mistral-7B, halving the rank improved accuracy.
An overview of Kimi K2.5’s technical highlights from Moonshot AI: a 1T-parameter MoE architecture, the MoonViT vision encoder, Agent Swarm (PARL), benchmark results, and more.
How should memory be allocated in reasoning models? This paper explains the trade-offs among quantization, KV cache, and test-time compute, based on 1,700 experiments.
Anthropic published official guides on how to use Claude Code effectively and how to build agents with the Agent SDK. This article summarizes the key points from both.
A plan to build an internal help desk RAG system using a Mac mini M4 Pro and Dify. Highlights what's new in Dify circa 2025 and tips for running local LLMs.
Introducing a Mac mini M4 Pro to build an in-house RAG system. A plan for setting up a LoRA training environment during downtime while waiting for specs to be finalized.