OpenAI harvested over million hours of YouTube content for ChatGpt-4: Report

Voltaire Staff
Apr 7, 2024
1 min read

OpenAI harvested more than a million hours of YouTube content to train its most advanced large language model – ChatGPT-4, a report has claimed.

Artificial Intelligence firms have been in a scramble to look for newer sources to train their models, having harvested most of the traditional repository of human knowledge, such as books, newspapers, and scientific databases.

Many a time such poaching of databases has run afoul of copyright laws, with OpenAI, as well as several other AI firms, facing lawsuits by writers and publishers alike.

According to a report by The New York Times, OpenAI trained its AI model through its voice recognition software Whisper. The firm's president Greg Brockman was personally involved in the collecting of videos, NYT wrote.

OpenAI spokesperson Lindsay Held told The Verge in an email that the company curates "unique" datasets for each of its models to "help their understanding of the world."

The spokesperson added that the company is also now looking into generating its own synthetic data.

Google spokesperson Matt Bryant told The Verge that the company has "seen unconfirmed reports" of OpenAI’s activity.

He said, "Both our robots.txt files and Terms of Service prohibit unauthorized scraping or downloading of YouTube content."

YouTube CEO Neal Mohan too had earlier alleged that OpenAI had used the video steamers' content to train its text-to-video AI model Sora.

Both however stopped short of expressing whether OpenAI's acts merit a legal action from the company.

Voltaire

OpenAI harvested over million hours of YouTube content for ChatGpt-4: Report

Related Posts

Comments

Microsoft deepens role as OpenAI restructures under new Foundation-PBC model

Sora nears Meta AI in downloads despite iOS-only release

Xiaomi warns of handset cost pressure as AI chip boom drives up memory prices

OpenAI launches ChatGPT Atlas, seeks to upend Chrome's supremacy

Outage at Amazon Web Services brings down Snapchat, Canva, several more sites

US court bans spyware firm NSO from breaking into WhatsApp

ChatGPT to allow 'erotica' by December, says CEO Sam Altman

Google to pump in $15 billion in India to set up AI data centre

Google warns Australia’s under-16 social media man 'extremely difficult' to enforce

India on course to allow use of biometrics to authenticate UPI payments

A little-known Indian messaging app sees sudden surge in users as govt makes push for 'swadeshi'

Indonesia cancels TikTok licence over failure to share livestream data

Israel hacked Gaza residents’ phones to beam Prime Minister Benjamin Netanyahu’s UN speech

Taiwan aims to double chip, electronics exports to India as iPhone output grows

Instagram touches 3 billion monthly user milestone