Judge compels OpenAI to surrender 20 million anonymized ChatGPT logs in copyright showdown
- Marijan Hassan - Tech Journalist
- 3 days ago
- 2 min read
Landmark discovery order upheld, Forcing AI giant to release user data sample to news publishers suing for infringement.

A U.S. federal judge has handed a significant legal blow to OpenAI, affirming an order that compels the company to produce a sample of 20 million anonymized ChatGPT conversation logs to the news organizations suing it for mass copyright infringement. The decision, handed down by District Judge Sidney H. Stein, upholds an earlier magistrate ruling and marks a major victory for publishers like The New York Times and the Chicago Tribune.
The ruling intensifies the ongoing legal battle that is defining the future of intellectual property in the generative AI era, prioritizing the plaintiffs' need for evidence over the AI company's privacy objections.
The core of the dispute: Output evidence
The discovery demand stems from a consolidated multidistrict litigation (MDL) involving over a dozen lawsuits against OpenAI, alleging that the company used vast amounts of copyrighted material from news websites and authors to train its large language models (LLMs) without permission.
Plaintiffs argued that the logs, which represent approximately 0.5% of OpenAI’s preserved chat data are crucial to prove their claims.
They intend to analyze the conversations to determine if, and how often, ChatGPT outputs reproduce copyrighted material from their publications, countering OpenAI's claims that plaintiffs had to "hack" the chatbot to force infringing results.
OpenAI initially offered the 20 million sample but then attempted to limit production, proposing instead to run targeted searches for conversations referencing the plaintiffs' works.
The judge rejected this approach, stating that the law does not require the court to mandate the "least burdensome" discovery method and emphasizing that the broad logs are also relevant to assessing OpenAI's fair use defense.
Privacy vs Discovery: The legal distinction
OpenAI strongly contested the order, arguing that the release of millions of conversations, many of which are highly sensitive and unrelated to the lawsuit, constituted a massive and unnecessary invasion of user privacy.
Judge Stein dismissed the privacy argument, distinguishing the case from precedents involving illegally wiretapped or surreptitiously recorded data. He noted that ChatGPT users voluntarily submitted their communications to the platform, and OpenAI’s ownership of the logs is undisputed.
The judge also confirmed that user privacy interests are adequately protected by three key safeguards:
Limiting the production to a 20 million-log sample
OpenAI's mandatory process of de-identifying all Personally Identifiable Information (PII) from the logs
An existing protective order governing all discovery materials in the litigation.
Looking forward
The ruling sets a major precedent, signaling that AI companies cannot easily shield their internal chat logs from discovery, even when invoking user privacy concerns. This legal defeat is expected to encourage plaintiffs in similar AI copyright lawsuits against firms like Anthropic and Meta to pursue equally sweeping discovery demands.













