
Anthropic used both downloaded ebooks and purchased and scanned print books to create the data it used to train its models. A judge found that the scanned books fell under fair use as they were transformative versions of the works and were not shared outside of the company. The downloaded ebooks did not count as fair use - these will be the subject of a forthcoming jury trial. The judge found that training a large language model on unlicensed data counted as 'fair use'.