Agent Lightning: Microsoft's reinforcement learning framework for AI agents
Contents
Microsoft has open-sourced Agent Lightning, a framework for training AI agents with reinforcement learning (RL).
https://github.com/microsoft/agent-lightning
What it can do
The concept is “optimize any AI agent with almost zero code changes.”
Key features:
- Framework agnostic: Works with any agent framework, including LangChain, OpenAI Agent SDK, AutoGen, CrewAI, and Claude Agent SDK. It also works with plain Python + OpenAI, with no framework at all.
- Zero code changes: Just insert lightweight
agl.emit_xxx()helpers into existing agent code. - Selective optimization: You can choose only specific agents inside a multi-agent system and optimize just those.
- Multiple algorithms: Supports reinforcement learning, automatic prompt optimization, and supervised fine-tuning.
Architecture
During agent execution, the tracer collects prompts, tool calls, and rewards, then stores them as structured spans in LightningStore. On the other side of the store, an algorithm reads those spans, learns from them, and reflects the results back into improved prompt templates and policy weights.
[Agent] → [Tracer] → [LightningStore] → [Algorithm] → [Updated Resources]
The idea is that the existing agent code keeps running as-is while the tracer collects data and the algorithm learns in the background.
Installation
pip install agentlightning
Supported environment
The only officially supported platform is Linux.
| Environment | Status |
|---|---|
| Linux + CUDA GPU | Fully supported |
| macOS | Not supported |
| Windows (including WSL2) | Not supported |
| CPU-only | Evaluation and inference only (Linux only) |
Serious RL training requires a CUDA-capable GPU such as an RTX 4090. It does not run on M1/M2/M3 Macs or Apple Silicon.
Compared with Power Sampling
The Power Sampling article I covered earlier argued that LLM reasoning could be improved just by changing the sampling strategy, without RL. That view is based on the hypothesis that the base model already has reasoning ability and RL is only reshaping the probability distribution.
Agent Lightning takes the opposite approach: it uses RL aggressively to optimize agents. Which one is better depends on the use case, but it helps to keep both perspectives in mind.