noir detective examining bot traffic analytics on a glowing screen

Bot Traffic Analytics: The Case of the Lying Dashboard

Your Analytics Are Lying to You (and Half Your Users Can't Tell Why)

Bot traffic analytics have a dirty secret -- and last week a browser game made it impossible to ignore.

Last week I went down a rabbit hole that started with a browser game and ended with me rethinking how I talk to clients about web traffic. The game is called "Bot or Not" -- built by Surfshark in collaboration with master's students from Malmo University -- and it drops you into a simulated social media comment section where you have 120 seconds to spot AI-generated comments from real ones. I played it twice. My accuracy was not as good as I expected.

That was the point.

Think of it like a noir detective story where the clues keep changing. You are scanning a room full of suspects, and half of them have learned to walk, talk, and comment like the real thing. The fingerprints do not match anything on file. The gritty gut instinct that used to work -- "that feels off" -- is no longer reliable. And the case file sitting in your analytics dashboard? It was written by some of the suspects.

The experiment behind the game ran during Milan Design Week in May 2026, with 710 participants. The headline number: 47% of players failed to correctly identify AI-generated comments from human ones. The average bot-detection rate across all participants was 58% -- meaning more than four in ten bots went completely undetected. And this was in a controlled environment, with people actively trying to spot them.

Now consider what happens when nobody is trying at all.

The 51% Problem -- Now 57.5%

Here is the number that should be in every strategy conversation you have this year: automated bots accounted for 51% of all web traffic in 2024. That figure comes from the 2025 Bad Bot Report, cited in Surfshark's research. A separate Adalytics investigation found that at least 40% of web traffic across more than two million websites consisted of fake users or automated bots -- and that leading fraud detection systems routinely missed them, even when bots openly identified themselves.

Update: That number has already moved. Cloudflare CEO Matthew Prince posted on X just this week that automated bot traffic has crossed a threshold nobody expected this soon. For the first time in the internet's history, machines now generate more web traffic than people. Cloudflare's Radar dashboard puts bots at 57.5% of all HTTP requests to HTML content, humans at 42.5%. Prince had predicted the crossover by end of 2027. We got there eighteen months early.

Worth noting: these two numbers are measuring slightly different things. The 51% figure comes from the 2025 Bad Bot Report and reflects all web traffic broadly across 2024. Cloudflare's 57.5% is more recent and more specific -- it tracks bot share of HTTP requests to HTML content specifically. They are the same trend at two different points in time, and the direction is the same. The web is getting less human by the month.

More than half of your traffic is not human. And the tools designed to filter it are not catching all of it.

For anyone doing SEO work, this is not just an ad fraud problem. It is a data integrity problem. If your analytics are counting bot sessions as user sessions, your bounce rate, time-on-page, conversion rate, and click-through data are all contaminated. You are optimizing for an audience that does not exist.

What This Means for GEO

Generative Engine Optimization is built on a different premise than traditional SEO. You are not trying to rank for a keyword in a list -- you are trying to become the source an AI model cites when someone asks a relevant question. The crawlers doing that work are bots. Intentional, legitimate bots that your robots.txt and llms.txt files are designed to welcome.

But here is the tension: the same web infrastructure that lets ClaudeBot and GPTBot index your content also carries the bots that inflate your traffic numbers and contaminate your engagement data. The web does not cleanly separate legitimate AI crawlers from bad actors. Your job is to make that distinction deliberately, in your own configuration.

This is why the structural work matters -- proper JSON-LD schema, explicit crawler permissions, clean structured content -- not just as ranking signals, but as signals of intent. When you configure your site to explicitly welcome legitimate AI crawlers while blocking scrapers and path traversal scanners, you are making a statement about what kind of traffic you want and why. That clarity translates directly into how AI models assess the quality and trustworthiness of your content.

The Platform Divide Is a Proxy for Something Bigger

The Surfshark study broke down bot-detection performance by platform. Reddit and X users hit 68% detection rates. Facebook users landed at 47% -- barely better than chance. The researchers attribute it to community culture: Reddit users are trained to be skeptical of low-effort posts; Facebook users are not.

What that really tells you is that context shapes perception. Audiences on different platforms have developed entirely different filters for evaluating whether content is real or synthetic. That has direct implications for where you publish and how you frame content for different distribution channels.

There was also a finding about emotional topics that I keep coming back to. On a technical subject like data centers, players achieved 71% bot-detection and 76% accuracy. On immigration and women's rights, those numbers dropped to 49% and 61%. Emotional engagement actively degrades the ability to evaluate content quality.

If your content strategy leans into polarizing or emotionally charged topics as an engagement driver, you are publishing into the exact conditions where bot content is hardest to distinguish from real engagement. Your comments, your social signals, your sentiment data -- all of it gets noisier in precisely the spaces where you are most invested in the signal.

What to Actually Do About It

A few concrete things worth doing now, regardless of your platform or stack:

Check Microsoft Clarity's bot dashboard. It became generally available in January 2026 and gives you property-level visibility into AI bot traffic -- something that was not easy to get before. If you have not looked at it, do that before your next analytics review.

Review what your robots.txt is actually allowing. Most sites still have default configurations that were written before AI crawlers existed as a category. If you have not explicitly addressed which bots you want indexing your content and which you do not, you are making that decision by omission.

Separate your traffic sources before drawing conclusions. If you are making content decisions based on aggregate session data, you are likely optimizing off a number that includes substantial bot activity. Segment by source before you analyze behavior.

Think about what you are actually optimizing for. Traditional SEO chases rankings that humans (and now AI models) use to find content. GEO chases citations in AI-generated responses. Bot traffic contamination affects how you measure success in both cases differently. Know which you are measuring and adjust accordingly.

The web crossed a threshold in 2024 when non-human traffic became the majority. That is not a temporary anomaly -- it is the baseline we are working from now. The strategy question is not how to get back to a human-majority web. It is how to build digital presence that is legible, trustworthy, and useful to both the humans and the AI systems that are now jointly determining what gets found.

Services

The Case Is Open. Is Your Site on the Right Side of the Evidence?

Search has changed. A growing share of the answers people get online never come from a results page -- they come directly from AI. Whether your site shows up in those answers depends on things most web teams have never touched. I offer a focused service that covers both sides: making sure search engines can find you the traditional way, and making sure AI systems can read, understand, and cite you the new way. No jargon, no bloated retainer. Just the work that actually matters in 2026.

Let's talk about your site →


Want to talk through what this means for your site's configuration? That is exactly the kind of problem I work on. Start here.

Skip to content