Back to all blog posts

The Web’s Closing Doors – And AI’s Feeling the Lock

By Asaf Shamly | August 21, 2025

Recently, Cloudflare took the unusual step of publicly calling out Perplexity AI for scraping publisher content without permission. 

And they didn’t hold back – accusing the company of disguising its crawler, bypassing robots.txt rules, and consuming massive bandwidth without attribution or consent.

Perplexity pushed back.
They said they’re playing by the rules.
That they’re helping the internet, not hurting it.
That they’re being misunderstood.

Cue the familiar debate: who owns online content? Should AI assistants be allowed to “learn” from everything on the web? Is scraping theft – or fair use?

These questions are important. But they’re also a distraction. Because the real tension here isn’t moral. It’s strategic.

This is a fight over visibility. And more precisely: who gets to see what – and what that seeing is worth.

Everyone’s chasing a data edge. Some are running out of road.

What makes tools like Perplexity valuable isn’t their UI. It’s the illusion of omniscience. The ability to summarize any topic, any site, any angle, without needing to click, browse, or read.

But that illusion depends on access – to vast amounts of content, refreshed constantly, without paywalls, slowdown, or blocks.

And that kind of access is starting to evaporate.

Cloudflare’s crackdown is just the most public signal. Publishers have been tightening their controls for months. Robots.txt is being revisited. IPs are being blacklisted. Some are exploring licensing models. Others are demanding compensation.

The message is simple:
If you want visibility, you’re going to have to earn it.

The web is fragmenting into gated intelligence zones

There was a time when AI startups could learn by crawling. But those days are numbered. 

As content creators reclaim their walls, platforms are being forced to rethink how and where they train their models.

It’s not a technical debate, it’s a competitive one.

Because as scraping becomes harder  –  and legal pressure mounts  –  the companies that will thrive will be the ones with the strongest data partnerships, the clearest usage rights, and the richest exposure to real-world behavior.

Scraping tells you what’s on a page.
Intelligence comes from knowing what users do with it.

Visibility isn’t free (and it never really was)

There’s a growing myth that AI needs “the open web” to function. 

But much of what’s out there isn’t open. It’s optimized. Stylized. Rewritten for performance. And increasingly, it’s not even human  –  it’s content written for other machines.

Training AI on scraped outputs isn’t learning. It’s recycling.

What’s actually scarce now is clean, signal-rich, human-centered behavioral data – the kind that shows how people scroll, react, ignore, pause. The kind that requires consent, cooperation, and context to collect.

That’s what platforms like Perplexity are struggling to access..and that’s why the fight is getting louder.

Don’t mistake the noise for the story

It’s easy to frame this as a debate about fairness, but it’s really a reshuffling of power.

The internet is dissolving into ecosystems where visibility is earned, not inherited. Where systems trained on borrowed content get beat by those trained on direct interaction. Where permission, not reach, drives performance.

AI needs grounding.
And grounding comes from access to what’s real.

 

 

Latest Articles

  • AI is Rewriting the Attention Economy, Advertisers are at Risk of Getting Sidelined

    AI is rewriting the attention economy. Perplexity’s “citations, not clicks” payouts and Cloudflare’s pay-per-crawl model mark a new phase where platforms set the rules of visibility. Unless brands and agencies demand clarity, control, and real behavioral signals, performance will be optimized to interfaces, not outcomes.

    View Now
  • When Agentic AI Takes the Wheel, Who’s Watching the Road?

    Speed feels like smarts in advertising, but most “autonomous” systems optimize on labels and averages rather than behavior in context. They can’t see scroll, true in-view time, ad density, or what happens after the impression. Data without depth is a liability; the edge now is first-party, behavior-rich signals learned across environments.

    View Now
  • Your Media Plan is Still Yours – Until it Isn’t

    Platforms like X aren’t building AI to support advertisers - they’re building it to control the process. Systems like Grok optimize for what’s measurable inside their ecosystem, not for a brand’s broader goals. Automation without visibility isn’t strategy, it’s dependency and that shift is one every advertiser should be paying attention to.

    View Now