
Why is the Times asking for this trove? As I noted, the newspaper claims it needs the logs to show that OpenAI encourages infringement. But that betrays a fundamental misunderstanding of how large-language models work. The question in a copyright case is whether training data contained protected text—not what users later typed into the chat box**.** Preserving billions of post-training conversations won’t reveal what went into GPT-4’s weights two years ago; it merely warehouses private data that is irrelevant to the alleged wrongdoing.
There is also a blatant ethical hypocrisy here. The Times routinely positions itself as a principled watchdog on technology, lecturing Silicon Valley on surveillance and privacy abuses. Yet it has asked a court to impose one of the most sweeping data-retention orders in tech-policy history. That request does not “elevate the ethics of AI” in any recognizable sense; it weaponizes discovery to score legal points while disregarding the collateral damage to user privacy.
• Core allegation. The Times says the defendants copied “millions” of Times articles to train GPT models, infringing copyright and threatening its subscription business.
Scope of the order. It sweeps in all data from ChatGPT Free, Plus, Pro, Team, and standard API users; only Enterprise, Edu, and Zero-Data-Retention (ZDR) API customers are excluded.
But no matter what the merits of the case, this ruling is an absolute travesty for AI users. Not just ChatGPT users! It casts a chilling effect across the entire industry. And it’s horrendous that a so-called safeguard of the consumer (the NYT) has demanded it. We need to demand better LLM fluency from the media.
Here’s why the log-hoarding demand is a dead end for the copyright claims: the lawsuit hinges on what went into GPT-4’s weights during training and whether that ingestion was transformative fair use or straight-up infringement.
Those training runs were frozen months before ChatGPT ever launched. Capturing billions of post-training chats tells you nothing about that historical data set
As I noted above in the facts—in May Judge Wang ordered the company to “preserve and segregate all ChatGPT output logs indefinitely,” a directive prompted by the Times’ discovery demands. Indefinitely. While the case runs on. Possibly for years. It’s nothing less than a torpedo at the heart of AI trust.