Tech Feb 20, 2026 updated 13 min Accelerating LLM Inference: CDLM and Attention Matching KV Compaction Two February 2026 papers on reducing inference cost: Together AI’s Consistency DLM (up to 14.5× faster) and MIT/Harvard’s Attention Matching KV compaction (50× compaction in seconds). AI LLM Inference optimization KV cache Diffusion models