Google last month upgraded its Bard chatbot with a new machine-learning model that can better understand conversational language and compete with OpenAI’s ChatGPT. As Google develops a sequel to that model, it may hold a trump card: YouTube. From a report: The video site, which Google owns, is the single biggest and richest source of imagery, audio and text transcripts on the internet. And Google’s researchers have been using YouTube to develop its next large-language model, Gemini, according to a person with knowledge of the situation. The value of YouTube hasn’t been lost on OpenAI, either: The startup has secretly used data from the site to train some of its artificial intelligence models, said one person with direct knowledge of the effort. AI practitioners who compete with Google say the company may gain an edge from owning YouTube, which gives it more complete access to the video data than rivals that scrape the videos. That’s especially important as AI developers face new obstacles to finding high-quality data on which to train and improve their models. Major website publishers from Reddit to Stack Exchange to DeviantArt are increasingly blocking developers from downloading data for that purpose. Before those walls came up, AI startups used data from such sites to develop AI models, according to the publishers and disclosures from the startups.
The advantage that Google gains in AI from owning YouTube may reinforce concerns among antitrust regulators about Google’s power. On Wednesday, the European Commission kicked off a complaint about Google’s power in the ad tech world, contending that Google favors its “own online display advertising technology services to the detriment of competing providers.” The U.S. Department of Justice in January sued Google over similar issues. Google could use audio transcriptions or descriptions of YouTube videos as another source of text for training Gemini, leading to more-sophisticated language understanding and the ability to generate more-realistic conversational responses. It could also integrate video and audio into the model itself, giving it the multimodal capabilities many researchers believe are the next frontier in AI, according to interviews with nearly a dozen people who work on these types of machine-learning models. Google CEO Sundar Pichai told investors earlier this month that Gemini, which is still in development, is exhibiting multimodal capabilities not seen in any other model, though he didn’t elaborate.