13 September 2010

Comment author: Calorion 08 July 2017 05:28:07PM 0 points [-]

The Patri Friedman links are dead, and blocked from archive.org. Anyone have access to another archive, so I can see what he's talking about? There has got to be a better way to link. Has no one come up with a distributed archive of linked material yet?

Comment author: arundelo 09 July 2017 03:01:31AM *  0 points [-]

archive.is has both things from Patri's LiveJournal:

(Unlike archive.org, archive.is does not, IIRC, respect robots.txt.)

Gwern Branwen has a page on link rot and URL archiving.

Comment author: arundelo 09 July 2017 06:24:27PM *  0 points [-]

Why does archive.is not obey robots.txt?

Because it is not a free-walking crawler, it saves only one page acting as a direct agent of the human user.

--archive.is faq

A few months ago we stopped referring to robots.txt files on U.S. government and military web sites [...] As we have moved towards broader access it has not caused problems, which we take as a good sign. We are now looking to do this more broadly.

--archive.org blog, 2017-04-17