Merriam-Webster and Britannica sue OpenAI over copyright infringement
Contents
On March 16, 2026, Encyclopedia Britannica and Merriam-Webster sued OpenAI for copyright infringement. They claim that OpenAI used roughly 100,000 of their articles without permission to train LLMs.
What the lawsuit says
According to the complaint, OpenAI incorporated Britannica’s and Merriam-Webster’s long-built encyclopedia and dictionary content into model training data without permission or compensation. The companies are seeking an injunction and damages.
Merriam-Webster was founded in 1828 and Britannica in 1768, making them two of the most authoritative reference sources in the English-speaking world. This is the first time they have sued OpenAI together.
Why dictionaries and encyclopedias matter
Unlike ordinary books or articles, dictionaries and encyclopedias are built primarily to provide accurate definitions and factual explanations. That makes them especially valuable when an LLM is learning “knowledge.”
Merriam-Webster provides standard English meanings, etymology, and usage, while Britannica covers tens of thousands of topics. The plaintiffs argue that OpenAI understood the value of that material but used it without paying for it.
Where this fits in the wave of AI copyright suits
AI-related copyright litigation has increased sharply in recent years.
| Plaintiff | Filed | Main claim |
|---|---|---|
| Getty Images | 2023 | Unauthorized image training against Stability AI |
| The New York Times | 2023 | Unauthorized article training against OpenAI and Microsoft |
| Writers’ group (including John Grisham) | 2023 | Unauthorized novel training against OpenAI |
| Merriam-Webster and Britannica | 2026 | Unauthorized dictionary and encyclopedia training against OpenAI |
Where the NYT suit focused on reproducing reporting, this one is more about structural extraction of definition data. Dictionary and encyclopedia articles are not like ordinary prose with fuzzy emotional context; they contain deterministic factual descriptions. How LLMs internalize that content could matter for fair-use analysis.
The core of the fair-use dispute
OpenAI will likely argue transformative use: that training transforms the original works and therefore does not infringe copyright.
The plaintiffs disagree. If the model can generate summaries or similar explanations, they argue, then it becomes a market substitute. That is especially true for services like Britannica and Merriam-Webster, where users go specifically for accurate definitions and explanations.
U.S. fair-use analysis uses four factors:
| Factor | Meaning |
|---|---|
| Purpose and character of the use | Transformative or commercial |
| Nature of the copyrighted work | Factual or creative |
| Amount and substantiality used | How much of the work was used |
| Market effect | Whether it harms the original market |
Dictionaries and encyclopedias are factual works, which generally receive less protection than creative works. But systematic use at the scale of roughly 100,000 articles, combined with direct market substitution, could make the fair-use argument difficult.
What it means for OpenAI
OpenAI has consistently argued fair use in prior AI copyright lawsuits and is still fighting those cases in court. The NYT suit is ongoing, and there is not yet a precedent-setting ruling that settles the industry’s training-data policy.
This new lawsuit adds to OpenAI’s already expanding legal exposure. If the plaintiffs win, OpenAI could face retroactive compensation obligations for past training data, and the case could become another precedent shaping fair-use interpretation.