Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The issue with paywalls is the bait and switch - you push the whole content for SEO then force users to pay money.

Nobody cares if you want to sell the secrets of the universe for $5000/mo. We care about the bait and switch.

By setting up this bait and switch, you hurt your own paywall - if the search engine can get a full copy, so can I. Want these extensions to stop working? Stop the bait and switch.



The fact that you can do something that you are not authorized to do doesn't mitigate the fact that you don't have consent to do so. Just because we expect criminals to break the internet without consent doesn't mean everyone is morally absolved to do the same.


Sites that serve content to Google's crawlers, but then slap up a paywall are spamming users, and unfairly drowning out content producers that are playing by the rules. Last time I checked, they were doing this without Google's consent.

As for the read path, the sites consented to Google indexing their stuff, and Google consents to letting people read the crawler cache. I don't see the issue.


Couldn't agree more


What's the bait and switch involved?


The content is available to search engines, and they are highly optimized for SEO, so they show up high in search results. So users expect by clicking the link in the search result they can see the rest of the content. Then you get hit with a paywall.


A third party that indexes the content for searchability and shows too much of it to its users is not bait and switch. Come on.


That is the deal of being indexed. That which you share to the indexer is shared to the world.


The news sites return different content to search engines than they do to browsers. Which means you get a different site than what is shown in the search preview, specifically a paywall page.

I would be a lot less annoyed with these paywalls if they didn't rank so highly on search results, or at the very least indicated they were paywalled on the search page and were easy to filter out.


Why is that relevant? They've created their content - if they want to share it with a search engine that's none of your business.

The technology does not allow them to do this without sharing it with you as well. But that is ethically irrelevant.

(Of course, we also deserve a good search engine allowing us to remove paywalled sites and seo spam from our results.)


Publishers are subverting users' reasonable expectations that the information they search and receive hits for on Google is publicly accessible—why else would it appear in search?

It's actually everyone's business because this creates a web that is less usable for everyone. If publishers were willing to commit and make their paid content server-side for customers only, they would have a stronger case against infringers.


That's just your definition of "reasonable".

They provide users with an expectation of what information they will receive if they pay for the content.

The fact that this "glimpse" of the content pollutes your web searches is a search engine problem.

It would be trivial to filter sites with paywalled content. But Google refuses to let you do that. Hope someone else will come along and help with that.


It's the search engines definition of reasonable. Don't like it? Don't let the search engine get to the content.


Since when are Google results 100% clearnet? This has never been the case, ever. Search "login" in Google and you'll get a lot of content that is not publicly accessible.


They are serving different content to Google than users, and are then upset the version they served Google is available to users.


What? Those login pages are publicly accessible in their entirety.


"FREE VACATIONS! (you have to sit through a timeshare presentation and there are catches"

We widely regard the above as unethical.

Adding content to a search engine literally says "you can come read this content!". That is the purpose of search engines. Even Google penalizes paywall behavior and will downrank them - which forces the paywall people to get more clever.

Sorry but don't abuse search engines and users to sell your content. Buy ads like a big boy.


They're doing something bad - so it's okay to steal their stuff. Did I get that right?


They are being gamed by their own game. Nobody took their content - they submitted it to search engines and are upset the search engine provides a cached copy.

If they don't like it they can not submit to search engines.


> nobody took their content

According to the law, someone did.

Just because something's available on request from a server does not mean it's up for grabs.


> Just because something's available on request from a server does not mean it's up for grabs.

When you give the content out for free to the indexer, you have given a copy to the world. Google search is not your advertising machine - you can pay for that priviledge if you would care for it.


I wouldn't say it is necessarily immature to use a paywall as opposed to ads. I for one sometimes prefer paying for a site just so i dont have to see ads.

Though I do agree, the bait and switch aspect of finessing seo and such leaves a bad taste in your mouth. Glad google penalizes such behavior


> By setting up this bait and switch, you hurt your own paywall - if the search engine can get a full copy, so can I.

I don’t get this. They do whatever they want with their content. Editors give free book copies to journalists; that doesn’t gives you the right to have one as well.


...and if they post their content to the public, the public can read it. Giving content to journalists is not the same as giving your content to Google to index in their search engine and caches. The later is not private.


Journalists post quotes and snippets, just as Google does. Not that it matters. Just because I gave something to someone else for free, doesn't mean I have to give it to you.

The idea that a single public exhibition of a work is enough to invalidate any future sale of that work is totally silly. Are you gonna bust out the window of a Barnes & Noble because libraries exist?


Google reproduces the entire page it searches. This is not some oversight or hidden secret - it is how it works.


> Giving content to journalists is not the same as giving your content to Google to index in their search engine and caches.

It’s exactly the same. You give access to an entity that serves to promote your content. The fact that Google’s cache is accessible is a technical implementation detail; Google could remove access to that cache tomorrow and that wouldn’t change anything.


> The fact that Google’s cache is accessible is a technical implementation detail;

It's intentional to fix the kind of abusive behavior you are engaging in. If you serve content to a search engine, that search engine will reproduce the content.

Search engines are for publically available content.

If you want to advertise, pay for it.


Publishers give books to journalists with the expectation that the journalists will immediately put an electronic copy of the book in a publicly-available CDN and then spend nearly-unlimited resources promoting the free copy?


You can't give a free copy to the local library to drive sales and be upset when people borrow the book and read it from the local library.


And you cannot take a book from someone who asks for money, refuse to pay, and say "but it's free at the library!"

The difference is consent.


But we are literally getting the copy here from the library because the author gave the book to the library.

Google is obeying robots.txt. Google caches a copy of what you serve and gives that out. That is part of the deal.


> You can't give a free copy to the local library to drive sales

Nobody does that. The comparison doesn’t hold.


You are doing that when you let Google past the paywall.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: