Agent Lightning: Microsoft's reinforcement learning framework for AI agents

Microsoft has open-sourced Agent Lightning, a framework for training AI agents with reinforcement learning (RL).

https://github.com/microsoft/agent-lightning

What it can do

The concept is “optimize any AI agent with almost zero code changes.”

Key features:

Framework agnostic: Works with any agent framework, including LangChain, OpenAI Agent SDK, AutoGen, CrewAI, and Claude Agent SDK. It also works with plain Python + OpenAI, with no framework at all.
Zero code changes: Just insert lightweight agl.emit_xxx() helpers into existing agent code.
Selective optimization: You can choose only specific agents inside a multi-agent system and optimize just those.
Multiple algorithms: Supports reinforcement learning, automatic prompt optimization, and supervised fine-tuning.

Architecture

During agent execution, the tracer collects prompts, tool calls, and rewards, then stores them as structured spans in LightningStore. On the other side of the store, an algorithm reads those spans, learns from them, and reflects the results back into improved prompt templates and policy weights.

[Agent] → [Tracer] → [LightningStore] → [Algorithm] → [Updated Resources]

The idea is that the existing agent code keeps running as-is while the tracer collects data and the algorithm learns in the background.

Installation

pip install agentlightning

Supported environment

The only officially supported platform is Linux.

Environment	Status
Linux + CUDA GPU	Fully supported
macOS	Not supported
Windows (including WSL2)	Not supported
CPU-only	Evaluation and inference only (Linux only)

Serious RL training requires a CUDA-capable GPU such as an RTX 4090. It does not run on M1/M2/M3 Macs or Apple Silicon.

Compared with Power Sampling

The Power Sampling article I covered earlier argued that LLM reasoning could be improved just by changing the sampling strategy, without RL. That view is based on the hypothesis that the base model already has reasoning ability and RL is only reshaping the probability distribution.

Agent Lightning takes the opposite approach: it uses RL aggressively to optimize agents. Which one is better depends on the use case, but it helps to keep both perspectives in mind.