An overview of Kimi K2.5’s technical highlights from Moonshot AI: a 1T-parameter MoE architecture, the MoonViT vision encoder, Agent Swarm (PARL), benchmark results, and more.
A comparison of the hook features offered by Gemini CLI, Claude Code, and Codex CLI. The differences in design philosophy are more interesting than I expected.
Baidu's PaddleOCR-VL-1.5 reaches 94.5% accuracy on OmniDocBench v1.5 with just 0.9B parameters, surpassing large models such as GPT-4o and Qwen2.5-VL-72B.
Anthropic published official guides on how to use Claude Code effectively and how to build agents with the Agent SDK. This article summarizes the key points from both.
A deep dive comparing 10 AI-powered E2E testing and browser automation tools including Shortest, Playwright MCP, Stagehand, Skyvern, and QA Wolf, categorized by use case with focus on reliability, speed, and cost.
An explanation of the difference between conventional OCR and VLM (vision-language model) based OCR. Introduces DeepSeek-OCR and explores the possibility of combining both approaches.
I investigated the source behind the viral claim that a Johns Hopkins study found ChatGPT lies 27% of the time, and it turns out multiple different studies have been mixed together.
An explanation of Shortest, a natural-language E2E testing framework built on Anthropic Claude API and Playwright, from the perspective of a Playwright user. Includes a comparison with Playwright MCP, caveats, and when to use each.
Generalized the scripts from the practice and optimization articles into a reusable framework and published it on GitHub. A walkthrough of how to use it and the design philosophy.
The Web Speech API + Gemini + VOICEVOX setup is complete — an AI character you can actually have a voice conversation with. Key implementation notes and impressions.
A comparison of major AI 3D generation tools such as TRELLIS, Hunyuan 3D, Tripo AI, and Hitem3D, with a focus on image requirements for better 3D output.