Latest trends in AI agent development tools - AGENTS.md validation, CI integration, session persistence, and sandboxing

This article summarizes four topics that came up in February around AI agent development and operations.

AGENTS.md may be counterproductive - a result that goes against intuition

An arXiv paper titled “Evaluating AGENTS.md” was published. It is a study that tested whether repository-level context files such as AGENTS.md and CLAUDE.md are actually useful for coding agents.

Main findings

Task success rate drops: Success tended to be lower when a context file was provided.
Costs rise by more than 20%: Token usage increased and so did execution cost.
Agents do follow instructions: They do follow the file’s instructions, but the search space expands too much.

The conclusion is “less is more.” Stuffing unnecessary specs into the file makes the task more complicated. Because agents follow instructions so faithfully, extra instructions become noise. A context file is better treated as a place to record the minimum constraints, not as a manual.

Continue.dev - AI checks for PRs, enforced in CI

As AI coding agents became widespread, PR volume grew rapidly and review fatigue became a problem. Continue.dev released a mechanism that automatically runs AI code-quality checks in CI to address that.

How it works

Checks are defined as Markdown files under .continue/checks/. Each file contains the check name, description, and prompt for the AI. When a PR is opened, Continue runs all checks against the diff and reports the results as GitHub status checks.

Examples of what it can inspect:

Hard-coded API keys, tokens, or passwords
New API endpoints without input validation
SQL queries built by string concatenation
Sensitive data logged to stdout

Because the check definitions are versioned as Markdown, you can also track how the rules have changed over time. It is a practical approach to code-quality management in the AI era, along the lines of eslint or prettier.

AWS Strands Agents SDK - session persistence built in

AWS’s Strands Agents SDK includes a SessionManager that automatically persists the agent’s conversation history and state. You can enable it simply by passing it to the Agent constructor.

Three backends

Manager	Use case
`FileSessionManager`	Local development or single process
`S3SessionManager`	Production environments or distributed systems
`RepositorySessionManager`	Custom backends such as DynamoDB or RDS

What gets persisted

Conversation history: All user and assistant messages, stored as individual JSON files
Agent state: JSON-serializable key-value dictionaries
Session metadata: Timestamps and session type

Conversation persistence for AI agents is not flashy, but it is unavoidable. If you implement it yourself, the boilerplate balloons quickly. Having the SDK absorb that cost is practical.

Docker Shell Sandbox - isolate AI agents in a microVM

A new shell type was added to Docker Sandboxes. It runs AI agents inside a microVM-based isolated environment.

docker sandbox create --name my-agent shell ~/workspace
docker sandbox run my-agent

It is a general-purpose environment with Ubuntu, Node.js, Python, git, and common development tools preinstalled. It is not tied to any specific agent framework.

Security benefits

Filesystem isolation: Only the mounted workspace is visible.
Credential management: API keys are injected through Docker’s network proxy. Sentinel values let the proxy intercept outbound API calls and swap in real keys, so the keys never exist inside the sandbox.
Clean environment: No conflict with the host environment.
Disposable: docker sandbox rm resets everything completely.

Because agents do file operations and shell commands, putting a sandbox in front of them is likely to become a standard security best practice.

References: