Meta admits using pirated books to train AI, but won't pay for it

Nishadil
January 13, 2024
0 Comments
2 minutes read
116 Views

Meta admits using pirated books to train AI, but won't pay for it

A group of authors filed a lawsuit against Meta, alleging the unlawful use of copyrighted material in developing its Llama 1 and Llama 2 large language models. In response, Facebook addressed writer and comedian Sarah Silverman, author Richard Kadrey, and other rights holders spearheading the legal action, acknowledging that its...Read Entire Article

A hot potato: Training advanced AI models with proprietary material has become a controversial issue. Many companies now face legal challenges from authors and media organizations in court. Meta admitted to using the well known "pirate" dataset, Books3, yet the company is reluctant to compensate writers adequately.

Meta has admitted to using the Books3 dataset, among many other materials, to train Llama 1 and Llama 2 LLMs. Books3 is a well known set comprising a plaintext collection of over 195,000 books totaling nearly 37GB. The archive was created by AI researcher Shawn Presser in 2020 as a way to provide a better data source to improve machine learning algorithms.

The widespread availability of the Books3 dataset has led to its extensive use in AI training by many researchers. Big Tech companies, including Meta, have utilized Books3 and other contentious datasets for their commercial AI products. On that account, the New York Times has sued OpenAI and Microsoft for allegedly using millions of copyrighted articles to develop the ChatGPT chatbot.

OpenAI has openly declared that training AI models without using copyrighted material is "impossible," arguing that judges and courts should dismiss compensation lawsuits brought by rights holders. Echoing this stance, Meta admitted to using Books3 but denied any intentional misconduct. Meta has acknowledged using parts of the Books3 dataset but argued that its use of copyrighted works to train LLMs did not require "consent, credit, or compensation." The company refutes claims of infringing the plaintiffs' "alleged" copyrights, contending that any unauthorized copies of copyrighted works in Books3 should be considered fair use.

Furthermore, Meta is disputing the validity of maintaining the legal action as a Class Action lawsuit, refusing to provide any monetary "relief" to the suing authors or others involved in the Books3 controversy. The dataset, which includes copyrighted material sourced from the pirate site Bibliotik, was targeted in 2023 by the Danish anti piracy group Rights Alliance, demanding that digital archiving of the Books3 dataset should be banned and is using DMCA notices to enforce those takedowns..

Disclaimer: This article was generated in part using artificial intelligence and may contain errors or omissions. The content is provided for informational purposes only and does not constitute professional advice. We makes no representations or warranties regarding its accuracy, completeness, or reliability. Readers are advised to verify the information independently before relying on

NYPD cops arrest new suspect in fatal shooting at Bronx lounge

Houthis promise ‘strong and effective’ retaliation to US strike

Emma Stone is Selling Her Charming L.A. Home for $4M – Inside the Property

NJ maintenance man ran apartment rental, Tesla sale scam: cops

Taiwan’s ruling party candidate Lai Ching te wins presidential election

Taiwan's president elect Lai Ching te: How China, US and Russia reacted to his election

After GIM 2024, Industries Minister heads to World Economic Forum