The Web’s Closing Doors – And AI’s Feeling the Lock
By Asaf Shamly | August 21, 2025
Recently, Cloudflare took the unusual step of publicly calling out Perplexity AI for scraping publisher content without permission.
And they didn’t hold back – accusing the company of disguising its crawler, bypassing robots.txt rules, and consuming massive bandwidth without attribution or consent.
Perplexity pushed back.
They said they’re playing by the rules.
That they’re helping the internet, not hurting it.
That they’re being misunderstood.
Cue the familiar debate: who owns online content? Should AI assistants be allowed to “learn” from everything on the web? Is scraping theft – or fair use?
These questions are important. But they’re also a distraction. Because the real tension here isn’t moral. It’s strategic.
This is a fight over visibility. And more precisely: who gets to see what – and what that seeing is worth.
Everyone’s chasing a data edge. Some are running out of road.
What makes tools like Perplexity valuable isn’t their UI. It’s the illusion of omniscience. The ability to summarize any topic, any site, any angle, without needing to click, browse, or read.
But that illusion depends on access – to vast amounts of content, refreshed constantly, without paywalls, slowdown, or blocks.
And that kind of access is starting to evaporate.
Cloudflare’s crackdown is just the most public signal. Publishers have been tightening their controls for months. Robots.txt is being revisited. IPs are being blacklisted. Some are exploring licensing models. Others are demanding compensation.
The message is simple:
If you want visibility, you’re going to have to earn it.
The web is fragmenting into gated intelligence zones
There was a time when AI startups could learn by crawling. But those days are numbered.
As content creators reclaim their walls, platforms are being forced to rethink how and where they train their models.
It’s not a technical debate, it’s a competitive one.
Because as scraping becomes harder – and legal pressure mounts – the companies that will thrive will be the ones with the strongest data partnerships, the clearest usage rights, and the richest exposure to real-world behavior.
Scraping tells you what’s on a page.
Intelligence comes from knowing what users do with it.
Visibility isn’t free (and it never really was)
There’s a growing myth that AI needs “the open web” to function.
But much of what’s out there isn’t open. It’s optimized. Stylized. Rewritten for performance. And increasingly, it’s not even human – it’s content written for other machines.
Training AI on scraped outputs isn’t learning. It’s recycling.
What’s actually scarce now is clean, signal-rich, human-centered behavioral data – the kind that shows how people scroll, react, ignore, pause. The kind that requires consent, cooperation, and context to collect.
That’s what platforms like Perplexity are struggling to access..and that’s why the fight is getting louder.
Don’t mistake the noise for the story
It’s easy to frame this as a debate about fairness, but it’s really a reshuffling of power.
The internet is dissolving into ecosystems where visibility is earned, not inherited. Where systems trained on borrowed content get beat by those trained on direct interaction. Where permission, not reach, drives performance.
AI needs grounding.
And grounding comes from access to what’s real.
Latest Articles
-
What the RSL Standard Signals About the Future of Visibility in 2026
When Reddit, Yahoo, Medium, and several of the web’s biggest content platforms announced a new Really Simple Licensing (RSL) standard, most coverage focused on the politics: platforms finally demanding compensation from AI companies; a new legal framework for training data; the good old open-web fight. But if you take one step back, something bigger comes into focus. For the first time, publishers are trying to engineer visibility - not for users, but for AI agents. And advertisers should be paying attention.
View Now -
When the Pipes Change, So Do the Rules: What OpenAds and AdCP Signal for Advertisers
Programmatic’s foundations are shifting. As control moves closer to publishers and planning logic becomes more open and inspectable, two developments - OpenAds and AdCP - are redefining how ad decisions are made. Together, they signal a new era where fewer intermediaries, clearer signals, and transparent coordination reshape the rules for advertisers.
View Now -
When AI Becomes the Storefront, Advertisers Need to See Beyond the Chat Window
ChatGPT Shopping quietly introduced a massive shift: AI is becoming the new storefront. Discovery, comparison, and checkout now happen inside a single conversation - no website required. For advertisers, this means visibility depends on how well their message can be interpreted, ranked, and surfaced by AI systems that control attention.
View Now