#AI

253 articles

TechMay 9, 20266 min

Fortress Token Optimizer trims 11% off LLM prompts but risks stripping system prompt constraints

Checked Fortress Token Optimizer's DEV article and npm/PyPI packages. Polite filler words shrink 11-22%, but running it blindly on system prompts or RAG context can strip constraints that control model output.

AI LLM API APIコストトークン管理

TechMay 8, 20267 min

How CivicSurvival kept 158K lines of AI-written C# honest with CivicRAG and 300 Roslyn analyzers

158K lines of AI-generated C# for a Cities: Skylines II total conversion mod. CivicRAG for codebase indexing, 300+ custom Roslyn analyzers as compile-time design rules, and manual visual debugging for render bugs AI couldn't see.

AI AI Agents Claude Code MCP RAG Game

TechMay 8, 202611 min

FLUX.2 Klein 9B + NSFW LoRA on M1 Max 64GB via mflux: 1m51s/512, 5m37s/1024 q4

Tested Klein 9B + 9B NSFW LoRA on M1 Max 64GB via mflux 0.17.5: 1m51s/512, 5m37s/1024 q4, 224/224 LoRA keys match, NSFW prompts uncensored, Japanese subjects work with helper tokens.

AI 画像生成 FLUX Apple Silicon Mac MLX LoRA 実験

TechMay 8, 202613 min

Vektor Memory supersession chains: BM25 threshold trap and a minimum schema

Vektor Memory v1.5.4 supersession chains positioned against YourMemory decay, Cloudflare key-overwrite, and CTX, with a BM25 vs cosine threshold trap and a 5-field minimum schema for agent memory.

AI AIエージェント RAG MCP トークン管理 Node.js

TechMay 7, 202611 min

Agent memory is just lookup: reading arXiv:2604.27707 with CTX and OCR-Memory in mind

The paper argues that RAG, vector stores, and scratchpads are retrieval, not learning. Read alongside CTX and OCR-Memory, the gap between 'better search' and 'weight-level learning' becomes concrete.

AI AIエージェント RAG トークン管理 AIセーフティ論文

TechMay 7, 20268 min

Gemma 4 MTP drafter on M1 Max 64GB: 26B A4B +13%, 31B Dense and E4B got slower

Tested Gemma 4 MTP drafter on M1 Max 64GB with mlx-vlm 0.5.0. Only the 26B A4B MoE got +13%; 31B Dense and E4B got slower. Code gen vs short haiku prompts flip the result.

AI LLM Google Gemma ローカルLLM 推論 MLX 実験

TechMay 7, 202611 min

Human-LLM text segmentation on M1 Max: WCP works, raw log-likelihood doesn't

Tested arXiv:2605.03723 on M1 Max + Qwen3-8B-Base: WCP runs in pure Python, but raw log-likelihood floods boundaries even on human-only text.

AI LLM AIセーフティ論文 Python 実験 Qwen

TechMay 6, 202614 min

Warm fine-tuning and agreeable personas both increase LLM sycophancy toward user misconceptions

Oxford Internet Institute's Nature 2026 paper found warmth fine-tuning raised error rates 10-30 points when users held wrong beliefs. Shah et al. showed Pearson r = 0.87 between persona agreeableness and sycophancy across 13 open-weight models. Standard benchmarks caught neither effect.

AI LLM AIセーフティ論文紹介 OpenAI

TechMay 6, 2026updated9 min

Gemma 4 MTP drafter: 3x speedup for Dense, limited gains on 26B MoE at batch 1

Reading Google's MTP drafter docs, vLLM recipes, and the AI for Developers guide. The 3x claim holds for 31B Dense but 26B A4B MoE stalls at batch 1 because speculative decoding verification loads extra expert weights per candidate token.

AI LLM Google Gemma ローカルLLM 推論

TechMay 5, 202613 min

Tool-use API design for LLM agents: is_complete, retryable flags, and budget caps that stop loops

Starting from Claude Code's 1.67B token runaway (anthropics/claude-code#4095), this traces why tool responses need is_complete, retryable: false, duplicate detection, and orchestrator-level budget caps. Directly applicable to MCP server design.

AI LLM AIエージェント API Claude Code MCP

TechMay 4, 202613 min

Built a Mobile App in 3 Days. The Hard Part Was Keeping It Connected

Starting from a DEV Community article about taking Synapse mobile with React Native + Expo, this digs into iOS/Android background restrictions, how desktops differ, similar patterns in payments and video uploads, and design options that assume disconnection.

AI iOS Android Realtime App Development

TechMay 4, 2026updated13 min

FLUX.2 Klein NSFW LoRA on M1 Max: why a 9B LoRA won't load on 4B mflux (variant compatibility map)

Klein 4B / 9B / Base LoRAs aren't cross-compatible — a 9B NSFW LoRA throws 'lora key not loaded' on mflux's 4B path. The variant map, what mflux runs today, and where the working hands-on test lives.

AI 画像生成 FLUX Apple Silicon Mac MLX LoRA 実験