colleague.skill, yourself-skill, nuwa-skill and other 'human distillation' OSS tools are exploding in popularity, primarily in China. Seeing a tool that distills colleagues, I wondered 'what if I distilled myself?' and researched how.
UC Berkeley's RDI team demonstrated that major benchmarks including SWE-bench and WebArena can be manipulated to near-perfect scores without completing any tasks. They identified 7 vulnerability patterns and released BenchJack, an automated benchmark attack tool.
Zhipu AI releases GLM-5.1, a 744B MoE (40B active) model achieving 58.4% SOTA on SWE-Bench Pro. Its standout feature is sustained performance across 8-hour sessions with 6,000+ tool calls—no degradation.
CVE-2026-22812 (CVSS 8.8) and CVE-2026-22813 (CVSS 9.4) were disclosed in the open source AI coding agent "OpenCode". Shell commands are executed via XSS of an unauthenticated HTTP server and Markdown renderer. The PoC has been published, with over 220,000 instances exposed online.
A GitHub issue claimed that Claude Code was destroying uncommitted changes with `git reset --hard origin/main` every ten minutes, but the culprit turned out to be a separate tool the reporter had written.
AWS releases "Agent Plugins for AWS" for Claude Code/Cursor, automating everything from infrastructure design to deployment. On the same day, GitHub added AI vulnerability detection to Code Security to supplement Shell, Dockerfile, Terraform, and PHP, which are not compatible with CodeQL.
Changes from v1 to v2 of Kana Chat, an AI agent built around official CLI wrappers. Covers dual-model router, Heartbeat memory, planner mode, image input, speech transcription, PWA push notifications, and the lessons learned from a month of daily use.
Composio publishes security analysis of OpenClaw. Approximately 7.1% of SkillHub-distributed skills were found to have critical vulnerabilities, leaving over 30,000 instances exposed to the internet in the early stages at risk of prompt injection and credential theft.
NVIDIA's NemoClaw secures OpenClaw agents with a four-layer sandbox, while Stripe's Machine Payments Protocol lets agents make payments without handing over private keys. The open question is how to charge safely from inside the sandbox.
CLI-Anything, released by HKUDS at the University of Hong Kong, automatically generates a CLI harness from GUI software source code so AI agents can drive it directly. It passed all 1,720 tests across 16 apps including GIMP, Blender, and LibreOffice.
AI Security for Apps reached GA, letting Cloudflare block prompt injection and PII leaks at the WAF layer. On the same day, it also launched RFC 9457-compatible error responses that replace HTML with JSON or Markdown when AI agents hit Cloudflare errors.
GitHub releases the layered defense design of the agent execution platform, and OpenAI releases the instruction hierarchy training data IH-Challenge and model. Responses to prompt injection were received from both infrastructure design and training axes.