#LLM

127 articles

TechMay 15, 20266 min

x-algorithm May 2026: Phoenix pipeline runnable locally with 3GB artifacts

xAI x-algorithm second commit: Phoenix retrieval+ranking runs locally on 537k sports posts with 3GB artifacts. Ad blending and candidate isolation code added since January.

AI GitHub OSS 機械学習 LLM

TechMay 14, 202612 min

GTIG observed the first AI-generated zero-day: a Python 2FA bypass exposed by a hallucinated CVSS

Verdict on GTIG's May 11, 2026 report: the first confirmed AI-generated zero-day, a Python 2FA bypass for an OSS admin tool, was caught by a hallucinated CVSS score and textbook Pythonic code structure.

セキュリティ AI LLM ゼロデイ Google

TechMay 14, 202629 min

oMLX 0.3.9.dev2 tested on M1 Max 64GB: SSD cache wins, VLM MTP slower

Tested oMLX 0.3.9.dev2 on M1 Max 64GB across 11 scenarios: SSD KV cache cuts Copilot prefill 88s→33s, VLM MTP slows decode 12-30%, omlx launch reaches Copilot/Codex/Claude Code.

AI LLM Local LLM Apple Silicon MLX Inference Optimization Codex 実験

TechMay 13, 2026updated8 min

oMLX 0.3.9.dev2 for Mac coding agents: Gemma 4 VLM MTP, DFlash, launch copilot

oMLX 0.3.9.dev2 release notes from the angle of Codex/Copilot on Mac local LLMs: Gemma 4 VLM MTP, DFlash, omlx launch copilot, SSD KV cache — what each changes for agent workflows.

AI LLM Local LLM Apple Silicon MLX Inference Optimization Codex

TechMay 11, 20265 min

Ollama CVE-2026-7482: crafted GGUF leaks heap memory from exposed API servers

Out-of-bounds read in Ollama's GGUF loader before 0.17.1. If your Ollama API is network-accessible, a crafted model file can exfiltrate env vars, API keys, system prompts, and conversation fragments from process memory.

Ollama Security Vulnerability CVE Local LLM LLM

TechMay 9, 20266 min

Fortress Token Optimizer trims 11% off LLM prompts but risks stripping system prompt constraints

Checked Fortress Token Optimizer's DEV article and npm/PyPI packages. Polite filler words shrink 11-22%, but running it blindly on system prompts or RAG context can strip constraints that control model output.

AI LLM API APIコストトークン管理

TechMay 7, 20268 min

Gemma 4 MTP drafter on M1 Max 64GB: 26B A4B +13%, 31B Dense and E4B got slower

Tested Gemma 4 MTP drafter on M1 Max 64GB with mlx-vlm 0.5.0. Only the 26B A4B MoE got +13%; 31B Dense and E4B got slower. Code gen vs short haiku prompts flip the result.

AI LLM Google Gemma ローカルLLM 推論 MLX 実験

TechMay 7, 202611 min

Human-LLM text segmentation on M1 Max: WCP works, raw log-likelihood doesn't

Tested arXiv:2605.03723 on M1 Max + Qwen3-8B-Base: WCP runs in pure Python, but raw log-likelihood floods boundaries even on human-only text.

AI LLM AIセーフティ論文 Python 実験 Qwen

TechMay 6, 202614 min

Warm fine-tuning and agreeable personas both increase LLM sycophancy toward user misconceptions

Oxford Internet Institute's Nature 2026 paper found warmth fine-tuning raised error rates 10-30 points when users held wrong beliefs. Shah et al. showed Pearson r = 0.87 between persona agreeableness and sycophancy across 13 open-weight models. Standard benchmarks caught neither effect.

AI LLM AIセーフティ論文紹介 OpenAI

TechMay 6, 2026updated9 min

Gemma 4 MTP drafter: 3x speedup for Dense, limited gains on 26B MoE at batch 1

Reading Google's MTP drafter docs, vLLM recipes, and the AI for Developers guide. The 3x claim holds for 31B Dense but 26B A4B MoE stalls at batch 1 because speculative decoding verification loads extra expert weights per candidate token.

AI LLM Google Gemma ローカルLLM 推論

TechMay 5, 20268 min

Ollama + MCP servers on M1 Max 64GB: MCPHost deprecation, tool calling limits, and a minimal custom server

Tested connecting MCP servers to Ollama local LLMs on M1 Max 64GB. MCPHost is deprecated, tool calling breaks with quantized models, and context fills fast. Includes working TypeScript and Python custom MCP server setups.

Ollama MCP ローカルLLM LLM AIエージェント

TechMay 5, 202613 min

Tool-use API design for LLM agents: is_complete, retryable flags, and budget caps that stop loops

Starting from Claude Code's 1.67B token runaway (anthropics/claude-code#4095), this traces why tool responses need is_complete, retryable: false, duplicate detection, and orchestrator-level budget caps. Directly applicable to MCP server design.

AI LLM AIエージェント API Claude Code MCP