If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
From the linked Wired article:
Gwern's comment in the Reddit thread:
These comments seem to partly refer to the 2013 mass archive of Google Reader just before it was discontinued. For others who want to examine the data: the relevant WARC records for
gse-compliance.blogspot.com
are in line 110789824 to line 110796183 ofgreader_20130604001315.megawarc.warc
, which is about three-quarters of the way into the file. I haven't checked the directory and stats grabs and don't plan to, as I don't want to spend any more time on this.NB: As for any other large compressed archives, if you plan on saving the data, then I suggest decompressing the stream as you download it and recompressing into a seekable structure. Btrfs with compression works well, but blocked compression implementations like
bgzip
should also work in a pinch. If you leave the archive as a single compressed stream, then you'll pull all your hair out when you try to look through the data.