#LLM

127 articles

TechFeb 15, 20266 min

Setting Up a Local LLM on the GMKtec EVO-X2 (Strix Halo)

Building an NSFW-capable local LLM on the GMKtec EVO-X2 (Strix Halo). Getting GPU inference at ~11 tokens/s with LM Studio and MS3.2-24B-Magnum-Diamond.

AI LLM Local LLM LM Studio AMD Experiment

TechFeb 12, 20267 min

MioTTS - a lightweight LLM-based TTS built from a custom codec

MioTTS from Aratako is a family of 0.1B to 2.6B Japanese-English TTS models built from scratch around the custom MioCodec. Its key feature is that it runs directly in llama.cpp and Ollama.

AI TTS Speech Synthesis Open Source LLM

TechFeb 8, 2026updated6 min

LFM2.5 - a hybrid architecture that's neither Transformer nor Mamba

Liquid AI's LFM2.5 uses a hybrid of short-range convolutions and attention, achieving edge optimization without SSMs. This article covers the architecture, benchmarks, and community use cases.

AI LLM Edge AI Architecture

TechFeb 7, 20266 min

Qwen3-TTS — Open-source speech synthesis with a single pip install

A technical overview of Qwen3‑TTS from Alibaba’s Qwen team: one‑line pip install, 3‑second voice cloning, natural‑language voice design, and support for 10 languages including Japanese. Apache 2.0 licensed.

AI TTS Speech Synthesis Open Source LLM

TechFeb 6, 20266 min

Qwen3-Omni: An omni-modal MoE that unifies text, image, speech, and video with 3B active parameters

A technical walkthrough of Alibaba's Qwen3-Omni-30B-A3B. An omni-modal model that activates only 3B out of 30B and responds with speech from text/image/audio/video inputs. The article organizes the Thinker–Talker architecture, benchmarks, and the overall Qwen3 MoE family.

AI LLM Open Source Multimodal Voice AI

TechFeb 5, 20264 min

UI-TARS-1.5-7B: a vision AI agent that reached SOTA in GUI grounding

A technical look at ByteDance's UI-TARS-1.5-7B, which beats OpenAI CUA and Claude 3.7 by a wide margin at identifying GUI elements from screenshots, and can run locally with a desktop app.

AI LLM Agent Open Source

TechFeb 4, 20265 min

Qwen3-Coder-Next: A Local Coding Agent with 3B Active Parameters

Technical overview of Alibaba’s Qwen3-Coder-Next. An ultra-efficient MoE with 80B parameters but only 3B activated, runs even on a single RTX 4090. Brings 70%+ SWE-Bench performance to local use.

AI LLM Open Source Agent

TechFeb 4, 20262 min

A Unified View of Attention Sinks and Residual Sinks: LLM 'Outliers' as a Training-Stability Mechanism

A paper explains that two seemingly mysterious Transformer behaviors, heavy attention on specific tokens and unusually large activations in specific dimensions, are actually manifestations of the same mechanism.

LLM Transformer Research

TechFeb 3, 20263 min

MarkItDown - Microsoft's Python tool for turning documents into Markdown

A Microsoft tool that converts PDFs, Word, Excel, PowerPoint, and more into Markdown. It also fits into LLM pipelines and supports an MCP server.

Python Markdown LLM MCP Document Conversion

TechFeb 3, 2026updated3 min

OpenRouter free models and Free Router tested: rate limits, tool-calling gotchas, and when they actually work

OpenRouter ships :free models and a Free Router endpoint. Tested both for rate limits (50/day → 1,000/day after a $10 top-up), the tool-calling failure on free models, and which workloads they actually fit.

AI LLM OpenRouter

TechFeb 2, 20264 min

Power Sampling: unlocking LLM reasoning without reinforcement learning

A look at how changing the inference-time sampling strategy can improve LLM reasoning performance without retraining on RL.

LLM Inference Reinforcement Learning Sampling AI

TechFeb 1, 20264 min

PageIndex - tree RAG with LLM reasoning only, no vector search

I looked into PageIndex, a RAG system that builds hierarchical document trees using only LLM reasoning, without chunking or vector databases. I also consider how it fits with layout detection and OCR pipelines.

AI RAG LLM OCR Python