I looked into PageIndex, a RAG system that builds hierarchical document trees using only LLM reasoning, without chunking or vector databases. I also consider how it fits with layout detection and OCR pipelines.
This article organizes the major video-generation AI updates announced in January 2026 and examines whether i2v (image→video) is practically usable, including models that run locally.
An overview of Kimi K2.5’s technical highlights from Moonshot AI: a 1T-parameter MoE architecture, the MoonViT vision encoder, Agent Swarm (PARL), benchmark results, and more.
A comparison of the hook features offered by Gemini CLI, Claude Code, and Codex CLI. The differences in design philosophy are more interesting than I expected.
Baidu's PaddleOCR-VL-1.5 reaches 94.5% accuracy on OmniDocBench v1.5 with just 0.9B parameters, surpassing large models such as GPT-4o and Qwen2.5-VL-72B.
Anthropic published official guides on how to use Claude Code effectively and how to build agents with the Agent SDK. This article summarizes the key points from both.
A deep dive comparing 10 AI-powered E2E testing and browser automation tools including Shortest, Playwright MCP, Stagehand, Skyvern, and QA Wolf, categorized by use case with focus on reliability, speed, and cost.
An explanation of the difference between conventional OCR and VLM (vision-language model) based OCR. Introduces DeepSeek-OCR and explores the possibility of combining both approaches.
I investigated the source behind the viral claim that a Johns Hopkins study found ChatGPT lies 27% of the time, and it turns out multiple different studies have been mixed together.
An explanation of Shortest, a natural-language E2E testing framework built on Anthropic Claude API and Playwright, from the perspective of a Playwright user. Includes a comparison with Playwright MCP, caveats, and when to use each.
Generalized the scripts from the practice and optimization articles into a reusable framework and published it on GitHub. A walkthrough of how to use it and the design philosophy.