All Articles - Page 15 | lilting channel

TechMay 2, 202623 min

Wiring Up a Multimodal Japanese Local RAG with FastAPI, Chroma, Open WebUI, and Ollama on M1 Max

Hands-on log of building the DEV article's PDF RAG on M1 Max 64GB, extending it with images via CLIP, and pushing through Japanese with bge-m3 + Qwen3.6 35B. Documents the modality gap, the dual inference server crash, and LLM-jp 4-8B's empty chat template silently dropping the system role.

AI LLM RAG ローカルLLM FastAPI llama.cpp Chroma Python Apple Silicon Ollama 日本語LLM 実験

TechMay 2, 2026updated12 min

Reading an Article on Building a Local PDF RAG with FastAPI, llama.cpp, Chroma, and Open WebUI

Notes on a DEV Community article that wires up FastAPI as an OpenAI-compatible RAG API layer with llama.cpp, Chroma, and Open WebUI, plus where the architecture fits and what to watch for.

AI LLM RAG ローカルLLM FastAPI llama.cpp Chroma Python Docker

TechMay 2, 202612 min

Kana Chat v3 and Leaning Into Blog-Specific Use

From v2 to v3 of Kana Chat, an AI agent built around official CLI wrappers. The story of stepping back from the DIY OpenClaw direction and pivoting toward a blog pipeline that quickly drafts the daily flood of AI news and papers.

AI Agent Claude Code Codex OpenClaw Gemini tmux FastAPI Tailscale Custom Tool Experiment

TechMay 2, 202614 min

OCR-Memory Lets Agents Recall History as Images

A read of arXiv:2604.26622 OCR-Memory. It renders agent execution history into images, uses Set-of-Mark to let a VLM pick relevant segments, then retrieves verbatim text from the original logs.

AI AIエージェント OCR VLM RAG トークン管理論文

TechMay 2, 202620 min

Running Qwen-Scope's SAE on M1 Max 64GB to Extract a Japanese-Language Feature

A hands-on log of running Qwen-Scope's Sparse Autoencoder locally on M1 Max 64GB with Qwen3-8B-Base, extracting feature IDs that discriminate between Japanese, English, code, and Chinese from a single middle layer.

AI LLM Qwen 解釈可能性実験 Apple Silicon MPS

TechMay 1, 202614 min

Splitting Correctness from Throughput in XLSX Streaming Writes

A hybrid approach that lets Apache POI build the OOXML scaffold while sheetData is streamed by hand. Verified XML escape, control char, date cell, and merge behavior of scndry's implementation on M1 Max.

Java パフォーマンスアーキテクチャ実験

TechMay 1, 202610 min

The Ellipse You See in Perspective and the Ellipse You Draw Are Not the Same

A projective geometry take on why a circle viewed in perspective and an ellipse drawn on paper behave differently, and why the ellipse's major axis doesn't line up with the circle's center.

数学構図数式入門

TechMay 1, 2026updated13 min

Qwen-Scope: An SAE Suite for Steering and Data Synthesis Using Qwen's Internal Features

The Qwen team released Qwen-Scope, a Sparse Autoencoder suite for Qwen3/Qwen3.5. 14 groups of SAEs covering inference-time steering, evaluation analysis, toxicity classification, data synthesis, and training improvement.

AI LLM Qwen 解釈可能性 AIセーフティ

TechApr 30, 20267 min

VecLite brings Rust/WASM vector search to the browser, making in-browser RAG plausible

VecLite is a Rust/WASM+SIMD library that accelerates vector search inside the browser. How far can you get with Transformers.js for embeddings, IndexedDB for storage, and no server at all?

Rust WebAssembly RAG Embedding AI Coding

TechApr 30, 20268 min

OpenAI slayed the goblins that infested GPT-5.5

OpenAI published a full investigation into why GPT-5.1+ kept inserting goblin and gremlin metaphors, tracing the cause from a Nerdy persona's reward signal through SFT data contamination to a Codex developer prompt suppression.

OpenAI LLM Codex 強化学習 AI安全性

TechApr 30, 2026updated10 min

NII's 48,000-Hour Audio Dataset Is Raw Material for TTS

NII/LLMC released CC Audio and Archive.org Audio Dataset. URL lists, metadata, and a downloader covering 48,000+ hours of Japanese audio. What it actually contains and how it fits into TTS, ASR, and audio model training.

AI Voice AI Speech Synthesis Speech Recognition TTS STT LLM Machine Learning

TechApr 30, 2026updated12 min

FLUX.2 Klein 4B benchmarked on M1 Max with mflux vs iris.c

Hands-on benchmark of FLUX.2 Klein 4B on M1 Max 64GB using mflux (MLX) and iris.c (pure C + Metal). A counter to Pruna AI's H100-only tutorial — measuring how fast Apple Silicon actually gets there.

AI 画像生成 FLUX Apple Silicon Mac MLX 実験