Fix your robots.txt or your site disappears from Google

WmWsjA6B29B4nfk · 2026-01-19T19:52:23 1768852343

Google docs are pretty clear (https://developers.google.com/crawling/docs/robots-txt/robot...):

> Google's crawlers treat all 4xx errors, except 429, as if a valid robots.txt file didn't exist. This means that Google assumes that there are no crawl restrictions.

This is a better source than a random SEO dude with a channel full of AI-generated videos.

marginalia_nu · 2026-01-19T20:14:09 1768853649

Not entirely unlikely this is just a bug on Google's end.

It's fairly common for there to be a very long and circuitous route between cause and effect in search, so a bug like this can sometimes be difficult to identify until people start making blog posts about it.

efilife · 2026-01-19T20:26:28 1768854388

It seems that this is not happening and even the guy who wrote the article mentions it:

> I don't have a robots.txt right now. It hasn't been there in a long time. Google still shows two results when I search for files on my site though:

The source that he links to is another indian spam channel we've seen a thousand times on YouTube

xnx · 2026-01-20T10:15:17 1768904117

It does seem unlikely that Google would have a big in basic behavior of its crawler after ~27 years.

conradfr · 2026-01-19T20:40:08 1768855208

Google Adsense docs says that ads.txt is not mandatory and yet I remember having no ads displayed on my website until I added one.

jamesfinlayson · 2026-01-20T03:44:48 1768880688

Yeah I thought I got a notification saying to add it for an existing site but it still seemed optional the last time I created a new site?

edwinjm · 2026-01-19T23:34:24 1768865664

Indeed. “Unreachable” is very different than “not found”.

franze · 2026-01-19T19:50:04 1768852204

Fake or just miss-informed

this is the support page https://support.google.com/webmasters/community-video/360202...

this is the creators linkedin https://www.linkedin.com/in/iskgti/

he does not work for google, just a seo somewhere that creates videos and posts his hypothesis in forums

this is his youtube account https://m.youtube.com/@saket_gupta

nice high quality - propably ai created videos - still no relationship to reality

jimberlage · 2026-01-19T20:19:53 1768853993

I remember back in the day, when SEO was a more viable channel, being surprised at how much of the game was convincing Google to crawl you at all.

I naively assumed that they would be happy to take in any and all data, but they had a fairly sophisticated algorithm for deciding "we've seen enough, we know what the next page in the sequence is going to look like." They value their bandwidth.

It led to a lot of gaming of how you optimally split content across high-value pages for search terms (the 5 most relevant reviews should go on pages targeting the New York metro, the next 5 most relevant for LA, etc.)

I'm surprised again, honestly. I kind of assumed the AI race meant that Google would go back to hoovering all data at the cost of extra bandwidth, but my assumption clearly doesn't hold. I can't believe I knew all that about Google and still made the same assumption twice.

HWR_14 · 2026-01-19T21:31:13 1768858273

Google may be aggressively crawling for AI and only making a small subset visible to the search database.

jimberlage · 2026-01-19T20:23:31 1768854211

And from the comments below, sounds like they might be aggressively crawling still, but unidentified or with a different crawler identity. So perhaps they are hoovering up everything in the AI era.

efilife · 2026-01-19T20:21:56 1768854116

"Here's the video from Google Support that covers it:"

This Google Support is another indian spammer that generates tens of nonsense videos and uploads them to YouTube: https://www.youtube.com/watch?v=2LJKNiQJ8LA

This guy is not affiliated with Google in any way other than spamming on their help forums like indian people tend to do

https://www.iskgti.com/

His own website has 92 score in SEO on lighthouse despite his claim he's a "SEO expert"

From the article:

> I don't have a robots.txt right now. It hasn't been there in a long time. Google still shows two results when I search for files on my site though:

guess why

skybrian · 2026-01-19T19:17:32 1768850252

Not sure if this is reliable.

- What does "unreachable" mean, exactly? A 404 or some more serious error?

- What is a "Diamond Product Expert" and do they speak for the company?

larrymcp · 2026-01-19T19:34:01 1768851241

I agree; I'm calling "incorrect" on this for now, pending corroborating sources. I run a few sites that don't contain a robots.txt file, and they are showing on Google just fine. I see links to the home page and several interior pages; all good.

dazc · 2026-01-19T20:25:53 1768854353

Because you can see pages not affected doesn't guarantee they will stay that way.

wackget · 2026-01-20T02:06:28 1768874788

A diamond product expert is a poor sap who freely gave up their time to act as an unpaid stand-in customer support for the richest company on earth.

antisol · 2026-01-20T06:58:57 1768892337

and we have a winner for the coveted "best comment I'm going to read all week" award!

cj · 2026-01-19T19:49:31 1768852171

Not having a robots.txt is fine as long as it's a 404. If it's a 403, you'll be de-indexed.

I have a feeling there's more to the story than what's in the blog post.

xp84 · 2026-01-20T00:23:37 1768868617

If there's one thing I know about Google search, it's that there's never one behavior you can rely on. De-indexed? It's been decades since Google started drawing a complete distinction between allowing the Googlebot to crawl and presence in the index. Last time I needed to make a page disappear from the index, I learned that crawl permission had nothing to do with keeping a page in the index or not. In fact, disallowing it in robots was actually the worst thing I could do, since it wouldn't let the bot show up to find my new "noindex" metatags, which are now the only way to make your page drop out of the index.

Having a shortcut like 403ing the robots would actually be useful. LOL

ttoinou · 2026-01-20T00:26:33 1768868793

I’d say you’re right because I have a custom 404 page returned by the robots.txt route and I’m well indexed by google

Aardwolf · 2026-01-19T19:50:55 1768852255

If true, this would mean more websites with genuine content from the "old" internet won't show up (since many personal websites won't have this), while more SEO-optimized content farms that of course do put up a robots.txt will...

shevy-java · 2026-01-19T20:02:11 1768852931

It also fits Google's plan to create a surrogate web.

- AI was the first step (or actually, among the first five steps or so). CHECK. - Google search has already been ruined. CHECK. - Now robots.txt is used to weed out "old" websites. CHECK.

They do too much evil. But it is also our fault, because we became WAY too dependent on these mega-corporations.

senko · 2026-01-19T19:30:50 1768851050

> Your robots.txt file is the very first thing Googlebot looks for. If it can not reach this file, it will stop and won't crawl the rest of your site. Meaning your pages will remain invisible (on Google).

This implication (stopped crawl means your pages are invisible) directly contradicts Google's own documentation[0] that states:

> If other pages point to your page with descriptive text, Google could still index the URL without visiting the page. If you want to block your page from search results, use another method such as password protection or noindex.

What I get from the article is the big change is Google now treats missing robots.txt as if it disallowed crawling. Meaning you can still get indexed but not crawled (as per above).

My cynical take for this is this is a preparation for a future AI-related lawsuit. Everyone explicitly allowing Google (and/or other crawlers) is a proof they're doing it with website's permission.

Oh, you'd want to appear in Google search results without appearing in Gemini? Tough luck, bro.

[0] https://developers.google.com/search/docs/crawling-indexing/...

edwinjm · 2026-01-19T23:39:34 1768865974

It does not contradict. In their second case, there’s no crawling.

bflesch · 2026-01-19T19:50:40 1768852240

Don't invest any second of your time into the US tech monopoly. That time is much better spent deploying non-US alternatives and backing up your data from US clouds, which could be blocked for us any moment.

Google is a rent-seeking parasitic middleman leeching off productive businesses, let them hang out with their best friends at the US administration.

forinti · 2026-01-19T19:00:22 1768849222

My logs tell me that Google ignores my robots.txt.

CDRdude · 2026-01-19T19:27:39 1768850859

Isn’t it somewhat likely that a lot of shady crawlers pretend to be a google bot with their user agent?

ipaddr · 2026-01-19T19:33:46 1768851226

If you check the ips they belong to Google.

Plus if you run adsense google with ignore crawler rules and visit the page from google ips and from some shady ip. Wonder if it is the same for sites using Analytics.

bflesch · 2026-01-19T19:53:52 1768852432

Why you still have the idea in your head that they play by the rules. With the current administration they have been empowered to extract maximum value from us.

In the early days of smartphone use, Google and Facebook uploaded contact lists of every single smartphone user to their servers.

cookiengineer · 2026-01-20T00:12:46 1768867966

Gotta dig -x to find out if they really are from Google

0x1ch · 2026-01-19T19:52:25 1768852345

I was going to say, this is an absolute win for everyone with a personal site they have been trying to prevent Google from crawling if true.

nikanj · 2026-01-19T20:50:03 1768855803

I remember how religiously people used to care about their Google ranking. It's almost shocking to realize how fast that has changed. People used to spend tons of effort gaming site load speed, optimizing sitemaps and writing blog content.

All of that is fast getting completely irrelevant, people see ads on their favourite TikReels app, find their holiday presents on Temu and ask their questions from ChatGPT

AznHisoka · 2026-01-19T21:31:16 1768858276

Some of it has rebranded to “GEO optimization” (generative ai optimization) and half of that battle is ranking higher in Google since that is where most AI tools search anyway

estimator7292 · 2026-01-19T19:33:23 1768851203

What I'm hearing is that if I tweak robots.txt I can exclude my site from google? Excellent news!

sofixa · 2026-01-19T19:44:22 1768851862

This is literally the point of robots.txt. It was created to allow site owners to configure how and which parts of their website can be scraped by what bot, and all the "decent" ones (Google, Bing) respect it.

bflesch · 2026-01-19T19:54:19 1768852459

Spoiler: They don't.

edwinjm · 2026-01-19T23:40:33 1768866033

Google, Bing do

aendruk · 2026-01-19T19:40:57 1768851657

Was that not possible before?

dazc · 2026-01-19T20:30:18 1768854618

no-index is a thing.

tremon · 2026-01-19T20:43:35 1768855415

no-index is per individual page, not for an entire domain, IIUC?

snowwrestler · 2026-01-20T03:43:11 1768880591

You can pretty easily send no-index for an entire site by configuring it as a site-wide HTTP header.

dazc · 2026-01-19T20:23:23 1768854203

I've witnessed a few catastrophes that have resulted in mistakes made via robots.txt, especially when using 'disallow' as an attempt to prevent pages being indexed.

I don't know if the claims made here are true but there really isn't any reason not to have a valid robots.txt available. One could argue that if you want Google to respect robots.txt then not having one should result in Googlebot not crawling any further.

nextlevelwizard · 2026-01-20T05:53:50 1768888430

Who cares? Top results are anyway paid ads

sidnarsipur · 2026-01-21T04:04:55 1768968295

I've made a nice tool (https://botblock-puce.vercel.app/) to generate custom robots.txt easily if anyone is interested.

josefritzishere · 2026-01-19T18:56:10 1768848970

The irony is that their AI bots still hoover up all your site content regardless.

sixtyj · 2026-01-19T19:09:50 1768849790

Two different teams, I suppose. It is quite common in such a big company.

bflesch · 2026-01-19T19:57:08 1768852628

No need to simp for a megacorp. No matter how much these extremely well-paid US tech workers blame the "organizatioN" for their unethical behavior.

TurdF3rguson · 2026-01-19T20:00:09 1768852809

According to Gemini it uses the googlebot cache but it will fetch on demand when it's missing and the user asks for a summary. There are separate UAs you would need to block for those, Googlebot (search) and Google-Extended (AI summaries)

geldedus · 2026-01-20T22:15:50 1768947350

I don't even bother with Google indexing anymore. They massively de-index perfectly valid and useful pages. And this happens to thousands of sites. Google becomes less and less relevant.

gmiller123456 · 2026-01-19T19:17:39 1768850259

Sounds like great news. Users will eventually figure out other search engines produce more relevant results and Google's dominance will fade. Hopefully they never "fix" it.

hackyhacky · 2026-01-19T19:35:07 1768851307

Users are not aware that other search engines exist.

dazc · 2026-01-19T20:28:54 1768854534

Good luck with that one.

vicpara · 2026-01-19T20:00:16 1768852816

A lot of websites have robots.txt and sitemap.xml protected by cloudflare if you can imagine that. That's crazy.

ArcHound · 2026-01-19T19:03:46 1768849426

To reach my site, users need to get through the AI summary first. Spoilers: they don't get through more often than not. This is based on the drop of views since AI summary started.

And honestly, I don't blame them. If the summary has the info, why risk going to a possibly ad-filled site?

CodesInChaos · 2026-01-19T19:24:00 1768850640

> If the summary has the info, why risk going to a possibly ad-filled site?

I can usually tell if the information on a website was written by somebody who knows what they're talking about. (And ads are blocked)

The AI summary on the other hand looks exactly the same to me regardless if it's correct. So it's only useful if I can verify its correctness with minimal effort.

promiseofbeans · 2026-01-19T19:46:04 1768851964

Kagi has an optional AI summary users can trigger on demand, which feels a lot more useful than google’s - most of the time I want the actual websites, but sometimes I just want an overview of the top results which it’s really useful for

alex1138 · 2026-01-19T20:00:51 1768852851

It's the '?', right? I think it might use FastGPT

cocoto · 2026-01-19T19:19:30 1768850370

And what if your website is ad free and the AI full of advertising? At least the users get the information and the AI save on your bandwidth (in theory!).

pcdevils · 2026-01-19T19:41:21 1768851681

You lose any real attribution and people following other links on your site... Essentially Google took the value and left you with nothing.

TeMPOraL · 2026-01-19T21:40:32 1768858832

That's assuming one cares about "attribution" and "people following other links on your site". I.e. that's still being a salesman, maybe with extra steps.

In the alternative case, no value is being taken, you're left exactly with what you had before - nothing gained, nothing lost - but some user somewhere gains a little. Apparently even in 2026, the concept of positive-sum exchange, is unfathomable to so many.

account42 · 2026-01-20T13:31:16 1768915876

> That's assuming one cares about "attribution" and "people following other links on your site". I.e. that's still being a salesman, maybe with extra steps.

No, it's called being part of a community.

Soup kitchens provide free food without requiring anything in return. That doesn't make it OK for you to take as much as you can get and resell it.

> In the alternative case, no value is being taken, you're left exactly with what you had before - nothing gained, nothing lost - but some user somewhere gains a little. Apparently even in 2026, the concept of positive-sum exchange, is unfathomable to so many.

It's not a positive sum exchange. The community is what is lost.

TeMPOraL · 2026-01-20T15:45:37 1768923937

> Soup kitchens provide free food without requiring anything in return. That doesn't make it OK for you to take as much as you can get and resell it.?

It would be if the kitchen soup had infinite soup available.

Whatever volume of soup you take from the soup kitchen, it's gone from the kitchen. This is not the case with information - you consuming or collecting it does not mean there's less of it at the source.

> No, it's called being part of a community.

Soup kitchens are bad example. They're not there to build a community of poor people. They're there to feed them. The only reason they mind people taking in excess is because supply of soup is finite - take too much, and there won't be enough for someone else. Beyond that, they don't really care what people do with it.

> It's not a positive sum exchange. The community is what is lost.

Nobody other than salesmen and marketers want a community around everything. Especially not when they're looking for facts, or providing a helping hand.

Pay-it-forward is not affected by introduction of an intermediary (AI or otherwise), because it's about giving, not trading.

That's another way of putting this concept that so many don't seem to get: not everything has to be an exchange.

Animats · 2026-01-19T20:18:43 1768853923

So Google Search is now opt-in? Good.

dazc · 2026-01-19T20:28:23 1768854503

You can use x-robots noindex on any page and it will not be indexed. This has been the case for at least the past decade.

Animats · 2026-01-19T20:51:18 1768855878

That's opt-out, not opt-in.

mrroryflint · 2026-01-20T09:06:35 1768899995

Thanks for sharing - I took a look at my own robots.txt and realised the sitemap was broken.

ThinkBeat · 2026-01-19T22:54:12 1768863252

Mildly odd given the desperate bots trying to mop up the entire internet that leeches anything.

Onavo · 2026-01-19T19:16:20 1768850180

Is this a compliance issue? I can't imagine why they would willingly not scrape.

Bengalilol · 2026-01-19T20:40:37 1768855237

...and not a single link to any Google dev page...

crazygringo · 2026-01-19T19:49:39 1768852179

This is interesting and unexpected if true.

My only thought is that virtually all "serious" sites tend to have robots.txt, and so not having it indicates a high likelihood of spam.

franze · 2026-01-19T19:50:16 1768852216

not true

mwkaufma · 2026-01-19T19:35:43 1768851343

Yes it's _our_ fault Google search was enshittified.

wackget · 2026-01-20T02:05:04 1768874704

> The top Stack Overflow answer on robots.txt has a discussion about Allow: / not being valid according to the spec. The only date for the comments is "Over a year ago" but given that the question is from 2010 the comments are probably from around that time.

Firstly, I detest that stupid "feature" of showing only relative dates. It makes screenshots impossible to date, and it's frankly useless for humans as proven by OP's article.

Secondly, you can hover over the relative date string to see the actual date. But don't let that stop you from hating it.

account42 · 2026-01-20T13:42:20 1768916540

It wouldn't be so bad if it was in addition to absolute dates and times, but that doesn't look as pretty. There is some value in highlighting that something happened within a few seconds/minutes/hours/days although the switchover points should be chosen carefully as to not have huge relative differences between start and end of the range.

shevy-java · 2026-01-19T20:00:47 1768852847

We need to fix Google.

linolevan · 2026-01-19T19:00:57 1768849257

This is a crazy change. I wonder if part of the reasoning is that sites without a robots.txt tend to be very low-quality. Search is a very hard problem and in a world of LLM-generated internet, it's become way harder.

pwg · 2026-01-19T19:44:46 1768851886

My take: google marketing found a ploy to make "google" look like a better nettizen than the AI companies that hammer away on sites to the level of a DDOS attack.

franze · 2026-01-19T19:50:55 1768852255

its not true, just a seo posting stuff and another person thought it was legit

dangus · 2026-01-19T20:37:07 1768855027

Honestly, not crazy. This is should have been how it always was. Why should search engine crawling be opt-out rather than opt-in?

Igor_Wiwi · 2026-01-19T21:35:39 1768858539

Thanks for the heads up. I am releasing 10 projects every month it's really easy to miss some of the SEO fundamentals, to fix it I created a Chrom extension to verify basic stuff https://chromewebstore.google.com/detail/becgiilhpcpakkecdho...