Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Disagree. The web is clearly architected such that publishing a webpage makes it public and crawlable. You don’t “block Google”, you specify that the site is not for crawling in robots.txt according to well-known standards. This is all basically the contract of the internet and it shouldn’t be surprising to anyone.

Google specifically does not publish their API for free consumption by other companies, yet that’s what’s happening here anyway. The company is also using specific tricks to circumvent detection of the behavior.

In your analogy, this would be like a crawler ignoring robots.txt and then scraping the content for their own website with zero attribution to the source, which is nothing like Google indexing your site with full attribution and driving traffic to it for you.

Regardless, “turnabout is fair play” is unequivocally not a legally or even ethically acceptable standard, so that argument wouldn’t actually hold up anywhere anyway.



I don’t understand your argument. There is no actual “publishing” of web sites or APIs on the web. You simply make something available at a URL, and it’s up to anyone else to discover that URL. In this regard, your personal web site is no different than this Google Translate web API.


> this would be like a crawler ignoring robots.txt

Google ignores the noindex directive in robots.txt now. You're supposed to put it in your HTTP response headers or HTML meta tags...


`noindex` was a Google-specific rule that was never officially documented nor supported. I think they were perfectly entitled to withdraw support for it, especially considering there are alternatives.

https://developers.google.com/search/blog/2019/07/a-note-on-...


"driving traffic to it for you."

I did mention rich snippets.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: