LLM articles - Page 8 | lilting channel

Tech Feb 6, 2026 6 min

Qwen3-Omni: An omni-modal MoE that unifies text, image, speech, and video with 3B active parameters

A technical walkthrough of Alibaba's Qwen3-Omni-30B-A3B. An omni-modal model that activates only 3B out of 30B and responds with speech from text/image/audio/video inputs. The article organizes the Thinker–Talker architecture, benchmarks, and the overall Qwen3 MoE family.

AI LLM Open Source Multimodal Voice AI

Tech Feb 5, 2026 4 min

UI-TARS-1.5-7B: a vision AI agent that reached SOTA in GUI grounding

A technical look at ByteDance's UI-TARS-1.5-7B, which beats OpenAI CUA and Claude 3.7 by a wide margin at identifying GUI elements from screenshots, and can run locally with a desktop app.

AI LLM Agent Open Source

Tech Feb 4, 2026 5 min

Qwen3-Coder-Next: A Local Coding Agent with 3B Active Parameters

Technical overview of Alibaba’s Qwen3-Coder-Next. An ultra-efficient MoE with 80B parameters but only 3B activated, runs even on a single RTX 4090. Brings 70%+ SWE-Bench performance to local use.

AI LLM Open Source Agent

Tech Feb 4, 2026 2 min

A Unified View of Attention Sinks and Residual Sinks: LLM 'Outliers' as a Training-Stability Mechanism

A paper explains that two seemingly mysterious Transformer behaviors, heavy attention on specific tokens and unusually large activations in specific dimensions, are actually manifestations of the same mechanism.

LLM Transformer Research

Tech Feb 3, 2026 3 min

MarkItDown - Microsoft's Python tool for turning documents into Markdown

A Microsoft tool that converts PDFs, Word, Excel, PowerPoint, and more into Markdown. It also fits into LLM pipelines and supports an MCP server.

Python Markdown LLM MCP Document Conversion

Tech Feb 3, 2026 3 min

I looked into OpenRouter's free models and Free Router

A summary of how OpenRouter's free models work, the rate limits involved, and the caveats you should know before using them in agents.

AI LLM OpenRouter

Tech Feb 2, 2026 4 min

Power Sampling: unlocking LLM reasoning without reinforcement learning

A look at how changing the inference-time sampling strategy can improve LLM reasoning performance without retraining on RL.

LLM Inference Reinforcement Learning Sampling AI

Tech Feb 1, 2026 4 min

PageIndex - tree RAG with LLM reasoning only, no vector search

I looked into PageIndex, a RAG system that builds hierarchical document trees using only LLM reasoning, without chunking or vector databases. I also consider how it fits with layout detection and OCR pipelines.

AI RAG LLM OCR Python

Tech Jan 31, 2026 4 min

Gradience: a tool that audits whether your LoRA rank is actually necessary

An introduction to Gradience, a tool that quantifies whether a LoRA rank setting is excessive using singular value decomposition. In experiments on Mistral-7B, halving the rank improved accuracy.

LoRA Machine Learning LLM Fine-Tuning

Tech Jan 31, 2026 3 min

Kimi K2.5: A 1-trillion-parameter MoE native multimodal agent model

An overview of Kimi K2.5’s technical highlights from Moonshot AI: a 1T-parameter MoE architecture, the MoonViT vision encoder, Agent Swarm (PARL), benchmark results, and more.

AI LLM Open Source

Tech Jan 30, 2026 5 min

Not All Bits Are Equal: There is no universal solution for memory allocation in reasoning models

How should memory be allocated in reasoning models? This paper explains the trade-offs among quantization, KV cache, and test-time compute, based on 1,700 experiments.

LLM Quantization Inference Research

Tech Jan 22, 2026 5 min

Official guides for Claude Code best practices and the Agent SDK

Anthropic published official guides on how to use Claude Code effectively and how to build agents with the Agent SDK. This article summarizes the key points from both.

Claude Code Claude AI LLM

#LLM