AI Agent Orchestration: Claws and Cord

AI agents are moving past the phase of “run when called, disappear when done.” In February 2026, two projects appeared almost simultaneously: Andrej Karpathy gave the concept the name “Claws,” and June Kim built a working system in ~500 lines of Python called “Cord.” One defines the vocabulary; the other ships running code. Both deal with the “next layer” above agents.

Claws—The Name Karpathy Coined

Andrej Karpathy posted that he bought a Mac mini and has Claws running. Simon Willison picked it up and wrote an explainer (simonwillison.net/2026/Feb/21/claws/).

Karpathy’s definition:

Claws takes orchestration, scheduling, context, tool invocation, and persistence to the next level

In short, it’s about placing another layer on top of LLM agents. Tool-invoking agents like the Claude Code CLI take a task, execute it, and finish. When the session ends, their state is gone. Claws adds residency to that.

Aspect	Single Agent	Claws
Lifecycle	start → run → exit	Resident; survives across sessions
Scheduling	None; runs when invoked	Autonomously queues and executes tasks
Context	Kept only while running	Persisted; can reference past results
Tooling	Direct calls	Indirectly brokered via a messaging protocol
Runtime	Cloud API or local	Assumes personal hardware

From Karpathy’s comments you can picture a resident AI system running on a personal Mac mini. It responds to direct commands and also autonomously handles scheduled tasks. The official emoji is 🦞.

OpenClaw’s Explosive Adoption and Karpathy’s Caution

The spark for this “Claw” category was OpenClaw, an open-source autonomous AI agent by Peter Steinberger. It reached 100K GitHub stars in about two days on January 29–30, 2026—the fastest ever. At peak it was gaining 710 stars/hour.

OpenClaw is Node.js-based, an everything-included setup: voice chat, live canvas, and integrations with messaging platforms. It consumes over 1 GB of memory and the codebase exceeds 400,000 lines.

Karpathy tried Claw on his Mac mini but is clearly wary of OpenClaw:

giving my private data/keys to 400K lines of vibe coded monster that is being actively attacked at scale is not very appealing at all. Already seeing reports of exposed instances, RCE vulnerabilities, supply chain poisoning, malicious or compromised skills

In short: he doesn’t want to hand secrets to a 400k‑line “vibe‑coded monster.” Reports are already out about exposed instances, RCE vulnerabilities, supply‑chain poisoning, and malicious skills.

On February 14, Steinberger announced he was joining OpenAI, and OpenClaw is slated to be transferred to an open‑source foundation.

A Proliferation of Claw Implementations

Dissatisfaction with OpenClaw and differing use cases led to a wave of alternative implementations in February.

Implementation	Language	Memory/Binary	Notes
OpenClaw	NodeJS	>1 GB	All‑in‑one: voice / canvas / companion app
ZeroClaw	Rust	3.4 MB / <5 MB	Providers, tools, and memory are all swappable traits; <10 ms startup
PicoClaw	Go	<10 MB	Runs on $10 hardware; 95% AI‑generated code; released 2/9, 5k stars in 4 days
NullClaw	Zig	678 KB	Arduino/Raspberry Pi support; 2,000+ tests
NanoClaw	TypeScript	~4,000 lines	Runs in containers; directly wired to the Anthropic Agents SDK
IronClaw	Rust	—	WebAssembly sandbox; isolates credentials from tools; prompt‑injection defenses
TinyClaw	—	—	Multi‑agent collaboration (coder/writer/reviewer)

ZeroClaw runs with 1/194th of OpenClaw’s memory footprint. PicoClaw emphasizes running on a $10 board. IronClaw focuses on security, executing all tools inside a Wasm sandbox.

The common thread is “OpenClaw is too big.” Auditing a 400k‑line codebase for security is unrealistic for individuals, so parallel efforts pursued lighter alternatives. Once Karpathy used “Claw” as the term, the idea outgrew a single project name and became a category—something Simon Willison also notes in his article.

Cord—Stop Hard‑Coding Workflows

Cord by June Kim is a proof of concept implemented with ~500 lines of Python, SQLite, and MCP. It earned 112 points on Hacker News.

The Common Wall in Existing Frameworks

LangGraph, CrewAI, AutoGen, OpenAI Swarm. In today’s multi‑agent frameworks, developers must predefine how tasks are decomposed. That assumption is the shared bottleneck.

Framework	What it can and can’t do
LangGraph	Fixed workflows are powerful, but if new decomposition is required during execution it cannot adapt
CrewAI	Role‑based and easy to grasp, but structure must be predefined; dynamic team re‑formation isn’t possible
AutoGen	Conversation‑centric and flexible, but lacks dependency tracking and permission scoping
OpenAI Swarm	Simple but linear; no parallelization or tree branching
Claude alone	Constrained by the context window and single‑threaded

All of them follow the model “developers design the workflow; agents just execute it.” Cord flips this: AI agents decide the task structure at runtime.

Cord exposes five MCP primitives:

Spawn: Create a child task with a clean context. The child agent only receives its own prompt and explicit dependencies—like bringing in a specialist with a blank slate.
Fork: Create a child that inherits all sibling results. The child agent starts with full prior context—like adding a briefed member to the team.
Ask: Query a human and pause execution.
Complete: Mark a task as finished with a result.
Read_tree: Inspect the current task tree.

The distinction between Spawn and Fork is core to Cord’s design. Both create children, but what the child “knows” at startup is completely different.

Children created with Spawn have an empty context. Only the outputs of their dependency tasks are passed in. This fits independent investigative tasks that run in parallel. Children created with Fork, on the other hand, start out with all sibling results. Use this for tasks that synthesize multiple investigations into a final decision.

By using the two appropriately, you avoid polluting the context window while ensuring the necessary information is passed along. Cord controls what each agent can see at the primitive level.

State Management with SQLite

Cord consolidates state management in SQLite.

技術スタック:
- Claude Code CLI（エージェントランタイム）
- SQLite（エージェント間の共有状態・タスクツリーの永続化）
- MCPサーバー（依存関係解決・権限スコープの強制）
- Python ~500行（オーケストレーション層全体）

The choice of SQLite is straightforward: file‑based with no server, transactional, and supports concurrent reads from multiple processes. Task state transitions (pending → running → completed), the dependency graph, and each task’s output all live in a single SQLite file.

Because the MCP server enforces permission scopes via SQLite, a child agent is denied if it tries to access task results outside its parent’s purview. Without this permission model, the Fork design that passes full context would risk data leakage.

Example: A Workflow Emerges Without Writing One

For the query “Should we migrate the API from REST to GraphQL?”, the developer didn’t code any task structure. Claude autonomously designed the following:

#1 Root: GraphQL移行分析
├── #2 現行API監査 [Spawn: 並列実行、コンテキスト独立]
├── #3 GraphQL調査 [Spawn: 並列実行、コンテキスト独立]
├── #4 要件確認 [Ask: 人間に問い合わせ、#2完了待ち]
└── #5 最終推奨 [Fork: #2,#3,#4の全結果を統合]
    └── #6 レポート生成 [Spawn: #5の結果のみ受け取る]

Because #2 and #3 were created with Spawn, they don’t know about each other and run independently in parallel. #5 is created with Fork, so it starts with the results of #2–#4. #6 only needs #5’s output, so Spawn suffices.

The agent looked at the problem structure and used Spawn and Fork appropriately. The only thing the developer wrote was the root question.

Validate First, Then Build the Infrastructure

What’s distinctive about June Kim’s approach is validating before building the infrastructure. In fifteen tests they confirmed:

Claude spontaneously called read_tree() to inspect overall state
It correctly chose between Spawn and Fork without being told
When denied out‑of‑scope access, it escalated rather than retrying

By confirming that the model naturally understands the coordination primitives first, the infrastructure could be implemented afterwards. Usually you build infra and then test; here the order is reversed. Validating that the primitives match model behavior likely enabled the ~500‑line implementation.

I couldn’t fully lock down—or frankly, trust—the security of OpenClaw, so I was building my own. More on that another time.

References: