OpenAI to regulators: Training AI models without copyrighted material is "impossible"

Nishadil
January 12, 2024
0 Comments
4 minutes read
80 Views
Save
Follow Topic

OpenAI to regulators: Training AI models without copyrighted material is "impossible"

A hot potato: Artificial intelligence researchers used to work in peace. However, now that companies like OpenAI, Microsoft, Google, and others are commercializing generative AI, the use of copyrighted training material has come under fire. Regulators in the UK are asking for information regarding the issue, and OpenAI recently responded.

OpenAI recently told members of the House of Lords that it is "impossible" to train large language models (LLMs) without using copyrighted material. The claim was in response to the UK's Communications and Digital Select Committee, which is looking into the legal issues involving current AI systems.

Current consumer applications like ChatGPT and Dall E are based on GPT 3. Since 2018, OpenAI has trained the model on billions of samples of writings, art, and photographs, mostly scraped from the internet. In March, OpenAI released GPT 4, which uses a dataset of text samples measuring about 570GB. Some examples in the training material include websites and books, which are without question protected works.

However, copyright law goes far beyond books and websites. "Because copyright today covers virtually every sort of human expression – including blogposts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today's leading AI models without using copyrighted materials," OpenAI's submission to the House of Lords reads.

Indeed, under current copyright law, a copyright does not even have to be registered to be protected. Any intellectual property is instantly copyrighted when the creator sets it to permanent media. It does not matter if it's a digital file, video, book, blog post, or a forum comment. All copyright laws apply.

This issue wasn't much of a problem in years past because machine learning research was strictly academic. Training was largely considered fair use and nobody bothered researchers. However, now that LLMs are going commercial, they have entered a gray area of the fair use doctrine. On rare occasions, ChatGPT "regurgitates" copyrighted snippets, which is a cut and dry infringement and a problem that OpenAI is working hard to eliminate.

However, that issue is not directly related to what happens when researchers train an LLM with protected material. Instead, the system uses the works, copyrighted or otherwise, to learn how language is structured and used so that it may create original content that humans can understand. Unfortunately, being a new frontier, copyright law has no legal definition regarding AI training.

So, allegedly infringed parties have begun bringing cases to courts. Companies like OpenAI and Microsoft are saying, "No. Training falls under fair use like it always has." "Training AI models using publicly available internet materials is fair use, as supported by long standing and widely accepted precedents," OpenAI related in a blog post this week.

"We view this principle as fair to creators, necessary for innovators, and critical for US competitiveness." Despite believing that the fair use doctrine covers LLM training, OpenAI provides a simple opt out process, which The New York Times used in August last year. OpenAI's tools can no longer access the NYT website, yet the newspaper filed a lawsuit in December.

"We support journalism, partner with news organizations, [but] believe The New York Times lawsuit is without merit," it said. OpenAI faces similar lawsuits from several published authors, including high profile comedian Sarah Silverman. It's an issue that the courts cannot handle alone.

The US Patent and Trademark Office, along with lawmakers, need to clearly define the role AI training plays in copyright rules. Permalink to story. https://www.techspot.com/news/101475 openai tells regulators training usable ai models without.html.

Comments 0

Please login to post a comment. Login

No approved comments yet.

Editorial note: Nishadil may use AI assistance for news drafting and formatting. Readers can report issues from this page, and material corrections are reviewed under our editorial standards.

More On This Topic

Shilpa Shinde Makes a Surprise Comeback as Lock Upp Season 2’s First Wildcard

Jurassic Shadows: When Dinosaurs Meet Ninjas in a Bold New Anime

Karnataka Forest Department Requests HMT to Transfer Over 430 Acres of Peenya Plantation and Jarakabande Sandal Reserve

India’s Dating Map: How the South Swipes Right on Grindr While the North Courts Shaadi.com

From Kyiv to Yosemite: How Ukrainian Climbers Are Forging a New Outdoor Culture

Woven Futures: How Wales Is Stitching Its Ancient Textile Heritage Into Modern Fashion

Century‑Old Cannonball Unearthed at the Alamo Sparks Historic Excitement

Portugal‑Ready Vacation Chic: Editor’s Top Picks

Latest In News

Colorado Kayaking for Beginners: A Full‑Scale Guide

Colorado Kayaking: The Ultimate Beginner’s Guide

Karnataka Minister Urges Unified Action Among Officials Ahead of Natural Calamities

Kerala’s Kollam Coast Shelters Under Weather Alert – Fishing Halted Until July 7

Superworms to the Rescue: Tiny Larvae Revolutionize How Museums Clean Skeletons

Apollo Speciality Hospitals embraces AI for real‑time patient monitoring

Flipper Zero Firmware Gets a Boost from Its Growing Community

Why More Drivers Are Skipping Premium Gas for Regular Amid Soaring Prices

Trending In Last 24 Hours

A Family's First Impressions: Taylor Swift's Aunt Praises Travis Kelce's 'Sweet' and 'Humble' Demeanor at Wedding

A Public Disagreement: Ansiba Hassan Questions Shwetha Menon's Claims on Mammootty-Mohanlal Support

A High-Profile Legal Quandary Unfolds in Napa Valley

The Mastermind's Phantom Voice: Goldy Dhillon's Desperate Extortion Plot Unmasked

Early Morning Traffic Stop Leads to Major Firearm Seizure in Kentville

Revanth Reddy's Bold Vision: Rahul Gandhi for PM, Congress to Rule Until 2034

Uttar Pradesh STF Cracks Down on a Hidden Tea‑Processing Operation in Barabanki

Why Are Party Cadres Deserting the TMC? Dutta Casts Doubt on Mamata Banerjee’s Leadership