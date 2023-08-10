OpenAI Now Lets You Block Its Web Crawler GPTBot
According to the latest reports, OpenAI will now let you block its web crawler from grating your site to help train GPT models. The company has even said that website operators will be able to particularly disallow its GPTBot crawler on their site’s Robots.txt file. They can even block its IP address.
GPTBot: A Web crawler By Open AI
OpenAI stated in the blog post:
“Web pages wriggled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site ( for sources that don’t fit the excluded criteria) can help AI models become more accurate and improve their general capabilities and safety.”
The point notable here is that blocking the GPTBot is considered to be the first step in OpenAI. It will be allowing internet users to select whether they want their data to be used for training its large language models or not. Some early attempts had been made at creating a flag first that would ban content from training. It is just like a “NoAI” tag created by DeviantArt last year.
Reports claim that the internet delivered much of the training data for large language models such as OpenAI’s GPT models and Google’s Bard. However, the fact is that OpenAI will not confirm if it got its data through social media posts, copyrighted works, or what parts of the internet it abraded for information. It would not be wrong to say that sourcing data for AI training has become increasingly controversial. Lawmakers also heard questions regarding data privacy and consent in many Senate hearings around AI regulation last month.