The Impact of AI Crawlers on Content Providers

Cloudflare's Content Independence Day The Impact of AI Crawlers on Content Providers

Full Transcript

00:00
Hello, this is Jim from Augusto. Recently, Cloudflare announced that it’s going to be blocking AI scrapers, or crawlers, by default. This is big news in the industry, and they actually called it Content Independence Day, and I thought that was great. I was in the midst of writing a
00:21
blog post, but thought it’d be great to go ahead and do a video, and let’s see let’s talk about this, and the impact, and what it means, and even do a demo of what it looks like. So to jump in, Cloudflare decided that it’s important that the content providers be able to control these AI crawlers.
00:40
For those people who don’t know, a crawler, like Google, will pull in your data from your website, it will retrieve it, put it in the Google index, and then people will go to Google search for you and go there. These AI crawlers are now outpacing the search engine crawlers.
00:54
And Cloudflare has built some tools to help us control whether they can get to your content or not. Now, the reason this is important, and they state this out here, is that today’s world, people get your content by using search engines. Now we’re seeing people start to use AI tools,
01:11
whether and chat GDP. Or an ant product, right? You’re seeing a different change and a great example they talk about. It’s worse as AI. It’s 750 times more difficult to get traffic than the old Google world. Well, what’s happening? What’s happening is increasingly users are consuming
01:29
derivatives. They’re not getting your full content from Google. You might search up some things about Augusto digital, takes us to our site and uses it and you’re seeing our content when you’re inside of an AI tool, they’re blending that content with other things.
01:42
And I’ll demonstrate that later cloud further then goes on with a few other blog posts, which I think are great. They talk about understanding the referral and the impact on these providers. As you work through some of the content here, what they’re getting at is the AI crawlers.
01:56
Traffic is growing. They’re slurping up your data using AI. Taking your data and putting it into their content. Basically the large language models are consuming it and sending less users to your website. Now if you’re a content provider,
02:13
that might be great to some people, whereas you want people to see your content in any platform. But if your platform is subscription based or if your platform has content and knowledge that you want people to directly get from and not see it. See it in a different way, I think it’s important that
02:27
you can control it. And that’s what cloudflare has done. And I’m showing a couple of graphs in here just so you could see, and they do a great job of breaking it down. I will have all the links in the content of this video. So just to outline while these graphs mean,
02:41
it’s basically saying that these new models consume more content, more of your content that you’re giving away free, they’re doing it more for frequently looking for changes, and they’re sending less traffic to the source, your website. Now, if you are a creator and your content’s
02:59
valuable, you might not want that. You might want the user to come to your site. However, the reverse is you do want your data in some cases to be inside of AI. Hence what cloudflare is putting in place is they’re starting to build a system to allow content providers to block users and then work on
03:16
how to get compensated. And I know that’s going to be a challenge in the near future, but that’s what they’re starting is the ability to cloudflare has turned on some tools to allow us to see what’s happening and view the traffic that’s coming to your So let’s talk a little bit about bots
03:33
for a moment. So what does this mean up until recently? Recently, and I think cloudflare does a great job bots fill into two categories, good and bad, good bots like Google and other search engines would bring in your content, search your site. And you wanted that you wanted to index and do search
03:48
engine optimization to pull that content. Then there’s bad bots. The ones that were looking for flaws, trying to bring in rogue information or lock down a site, They kind of blended. To bots. Google good and bad. We now have a new version of bot.
04:04
This new version is AI bots, Unlike the malicious bots, the bad ones or the good ones, they’re not trying to knock your site offline and they’re not trying to steal your sense of data. What they want is to scan all your public information from your site.
04:19
But unlike the helpful ones, they’re not necessarily driving traffic. Take care. Like I mentioned to your site, they’re keeping it inside of their system. So there could be a risk. The risk is that these bots are taking your information, sometimes blending it with other content,
04:37
and they’re not sending it to your site. So CloudFare tools help you understand that. So I’m going to walk you through, an audit that they’ve turned on. It’s called the AI audit. And let’s take a look at Augusta. So here you can see in the short period,
04:49
we’ve had a whole bunch of requests and it’s by the type of crawler. So the, I’ll say the Google bot is your typical historical, but crawling our site, pulling back content. Amazon has started adding their bot out there, whether or not that’s an AI bot. I’m unsure,
05:05
but what I do see is a huge increase. For example, chat GDP as a user, people crawling the site. Through the chat GTP engine to get information. That means that content is being pulled in and reused could be blended with others could be returned, wholeheartedly from what we put out there.
05:21
Now I’m going to do an example of this and show you how it’s different and how it could impact you. next one is the Bing bot. And then I think chat GDP bot on its own is just crawling in the background as well as the cloud. So you can see a rise and fall. Now,
05:35
I’m only looking at the last couple of days, but you can see what’s happening down here is I like the example here of the chat to be user. That’s a little bit more specific versus the GPT bot crawling our entire site. these tools inside of cloud for are pretty amazing.
05:54
And then down here, the cloud Operators, Google is still the highest ranking Amazon’s coming up in there. And then you see open AI perplexity here is on the end, again, this is showing a dramatic change over time. Now, before I go to my example of what the impact could look like,
06:11
I want to quick show what we’re doing at Augusto. right here, you can see I’m on the AI audit page under crawlers. It gives me a bit of ability to turn on and off specific crawlers to our site. This is pretty intensive because I could say, I don’t want Claude to actually track.
06:28
our users. So I wouldn’t let Claude go search us because we’d block them or the same with Claude search bot or their bot itself. I won’t turn these on long-term because at Augusta, we haven’t really made a decision. we like our content. And to get out there, we’d like to have SEO
06:42
recognition, but if we were charging for our content or giving a blurb away and then offering a paywall, some of these things we lose control of. So let’s actually look at what that means from a technical perspective. I showed you the list where I could turn it all off and I
06:57
can easily hear in the homepage, manage AI bot traffics with the robots.txt. That’s the tooling that allows Cloudflare to do to turn on and turn off that search option or crawling. Now, what that looks like on Augusto is we haven’t done anything.
07:11
We would like any agent to go to our site map and crawl our content. And again, we haven’t made a decision whether or not we want to block or allow it, that doesn’t really impact us. Like maybe some other larger content providers. So let’s follow down this path and see what it looks
07:28
like. Like when data is returned outside of Google. All right, for my demo, I’m going to go ahead and bring up a recent blog post that we put out there around Claude code, right? From prototype to impact lessons from an AI news summary tool built with Claude
07:43
code. It’s a blog post we recently put out. It talks about Claude code. It kind of puts an opinion of what it does, how it works. And it’s a video demo of what we learned while we did it. It’s a great way for people to see Claude code in a normal world. Let’s go to Google and let’s say,
07:59
what is Augusto digital say about Claude code? I’m just going to search Google and let’s see what our response is. as you can see we came up number one, obviously Augusto digital. It was updated seven days ago, and it pretty much shares a link directly
08:14
to it too. And you can see, we got other links in here. Brian must have posted on it, but if I click on this, it takes me right to the site. So it was indexed by the Google bot. The Google bot gave us search results. The search results brought us to this webpage.
08:28
I could read more. That’s been the construct on the web for quite a long time. So let’s Let’s now move to what we’ll see from a chat GTP perspective. So I’m in chat GTP. So I’m going to ask the same question I did on Google. What does
08:41
Augusto digital say about cloud code? Now here it’s going out and this would be the GPT bot dash user. And it comes back and says, Augusto hasn’t actively explored color cloud code, um, or these other tools. Now, we’ll see you what it does say that we did do a blog
08:59
post titled from prototype to impact. And it talks about and summarizes using cloud code. Now notice here, it didn’t send me to that link. Now it showed me where it found it. I could hover over this and it has a link, but this content blended what it thought that we haven’t
09:15
done anything with cloud code. Or haven’t actively explored it or taken us to that page. It’s not like a quick link. I would have read this, said, hey, maybe Augusto did or did not work with cloud code. It’s It’s not a direct link. I’ve stayed in the system. I may or may not ask more questions,
09:34
but I’m not driving traffic. And Augusto’s information didn’t fully make it into the here. It’s not talking about what our perspective is. So let’s do the second thing. this is an example of chat GDP. Let’s move over to cloud. Now I ran the same thing to save us some time.
09:54
It says, what is Augusto say about cloud code? And it’s search for information on the web. And it brought back a link to Augusto’s digital and cloud code. You can see a couple of things. One, it had a couple of links to our blog posts,
10:10
but it didn’t actually get to the one that we recently did with cloud code. maybe the top one did, but it’s the search and it brought its a content back to us. It says, I can’t see that my search results had anything, but it does talk about right here, a GUI for cloud code.
10:26
When we did this demo last time. One blog post. So this one actually had, well, it actually brought the right content back that we did do a blog post about it, but it didn’t actually link it for me. So wasn’t able to find anything directly about
10:39
cloud code, even though it knew about it here that we did an actual blog around it. So again, you’re seeing that your content’s being drawn in. It is. It’s not sending me back to the website. There’s not really a big link in here. It says, go check it out
10:53
or maybe send me to a result. So what does this mean? It means that we as content providers, developers, or people in this age need to think about how these tools are impacted or impacting our content choices and what we do on our site. So I hope this is informative.
11:13
take a chance, read the blog post. I will share all this content as well.

Let's work together.

Partner with Augusto to streamline your digital operations, improve scalability, and enhance user experience. Whether you're facing infrastructure challenges or looking to elevate your digital strategy, our team is ready to help.

Schedule a Consult