Tech 4 min read

Latest trends in AI agent development tools - AGENTS.md validation, CI integration, session persistence, and sandboxing

IkesanContents

This article summarizes four topics that came up in February around AI agent development and operations.

AGENTS.md may be counterproductive - a result that goes against intuition

An arXiv paper titled “Evaluating AGENTS.md” was published. It is a study that tested whether repository-level context files such as AGENTS.md and CLAUDE.md are actually useful for coding agents.

Main findings

  • Task success rate drops: Success tended to be lower when a context file was provided.
  • Costs rise by more than 20%: Token usage increased and so did execution cost.
  • Agents do follow instructions: They do follow the file’s instructions, but the search space expands too much.

The conclusion is “less is more.” Stuffing unnecessary specs into the file makes the task more complicated. Because agents follow instructions so faithfully, extra instructions become noise. A context file is better treated as a place to record the minimum constraints, not as a manual.

Continue.dev - AI checks for PRs, enforced in CI

As AI coding agents became widespread, PR volume grew rapidly and review fatigue became a problem. Continue.dev released a mechanism that automatically runs AI code-quality checks in CI to address that.

How it works

Checks are defined as Markdown files under .continue/checks/. Each file contains the check name, description, and prompt for the AI. When a PR is opened, Continue runs all checks against the diff and reports the results as GitHub status checks.

Examples of what it can inspect:

  • Hard-coded API keys, tokens, or passwords
  • New API endpoints without input validation
  • SQL queries built by string concatenation
  • Sensitive data logged to stdout

Because the check definitions are versioned as Markdown, you can also track how the rules have changed over time. It is a practical approach to code-quality management in the AI era, along the lines of eslint or prettier.

AWS Strands Agents SDK - session persistence built in

AWS’s Strands Agents SDK includes a SessionManager that automatically persists the agent’s conversation history and state. You can enable it simply by passing it to the Agent constructor.

Three backends

ManagerUse case
FileSessionManagerLocal development or single process
S3SessionManagerProduction environments or distributed systems
RepositorySessionManagerCustom backends such as DynamoDB or RDS

What gets persisted

  • Conversation history: All user and assistant messages, stored as individual JSON files
  • Agent state: JSON-serializable key-value dictionaries
  • Session metadata: Timestamps and session type

Conversation persistence for AI agents is not flashy, but it is unavoidable. If you implement it yourself, the boilerplate balloons quickly. Having the SDK absorb that cost is practical.

Docker Shell Sandbox - isolate AI agents in a microVM

A new shell type was added to Docker Sandboxes. It runs AI agents inside a microVM-based isolated environment.

docker sandbox create --name my-agent shell ~/workspace
docker sandbox run my-agent

It is a general-purpose environment with Ubuntu, Node.js, Python, git, and common development tools preinstalled. It is not tied to any specific agent framework.

Security benefits

  1. Filesystem isolation: Only the mounted workspace is visible.
  2. Credential management: API keys are injected through Docker’s network proxy. Sentinel values let the proxy intercept outbound API calls and swap in real keys, so the keys never exist inside the sandbox.
  3. Clean environment: No conflict with the host environment.
  4. Disposable: docker sandbox rm resets everything completely.

Because agents do file operations and shell commands, putting a sandbox in front of them is likely to become a standard security best practice.

References: