Google-Extended: A Tool Shaping the Future of Web Content and AI

There’s a crucial piece of code that has maintained order on the web for decades: robots.txt. This code allows website owners to decide whether Google and other tech giants can scrape their online content. Most sites have allowed Google to do this because the company drives significant traffic. However, with the rise of AI, the content indexed by Google has become foundational for training powerful AI models used by companies like OpenAI, Google, Meta, and others. These models often provide direct answers to user questions, potentially reducing the need for users to visit the actual websites and disrupting the balance of the web. In response to this shift, Google has introduced a new tool called Google-Extended, which allows websites to block the company from using their content for training AI models. This tool, launched in September, has seen some adoption.

Google-Extended: A Tool Shaping the Future of Web Content and AI

Data from Originality.ai indicates that about 10% of the top 1,000 websites have implemented the Google-Extended snippet as of late March. The New York Times is one notable example, having enabled the Google-Extended blocker in its robots.txt file. The publication is currently embroiled in an AI copyright dispute with OpenAI and has also blocked OpenAI’s access to its content.

The NYT’s robots.txt page explicitly prohibits the use of any device, tool, or process designed for data mining or scraping content using automated means without prior written permission. This includes the development of software, machine learning, artificial intelligence (AI), and/or large language models (LLMs).

Other websites that have implemented the Google-Extended blocker include CNN, BBC, Yelp, and Business Insider. However, Google Extended has not been adopted as widely as OpenAI’s GPTBot, which is being used by around 32% of the top 1,000 websites.

Originality.ai CEO Jonathan Gillham suggests that the lower adoption rate of Google-Extended may be due to concerns that blocking Google’s access to training data could result in exclusion from AI-generated search results. For example, a restaurant that excludes Google’s AI from using its website data for training may not include AI-generated responses to queries about the best deep-dish pizza in Chicago.

Google is currently testing an early version of a generative AI search engine through its Search Generative Experience (SGE). We will see whether Google will fully launch this new search engine and how different it will be from the traditional Google search engine. These decisions could significantly impact the future of the web in the AI era.

PTA Taxes Portal

Find PTA Taxes on All Phones on a Single Page using the PhoneWorld PTA Taxes Portal

Explore NowFollow us on Google News!

Onsa Mustafa

Onsa is a Software Engineer and a tech blogger who focuses on providing the latest information regarding the innovations happening in the IT world. She likes reading, photography, travelling and exploring nature.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
Get Alerts!

PhoneWorld Logo

Join the groups below to get the latest updates!

💼PTA Tax Updates
💬WhatsApp Channel

>