Comment author: Houshalter 14 September 2016 03:04:16PM 0 points [-]

Ah, data hoarding. This is a subject that interests me for multiple reasons. I think preserving humanity's knowledge is important to start with. But I also like to have local copies of things in case of emergency or just a regular internet outage.

You mentioned wikipedia. I found it takes a long time to download, and viewing it is difficult.

I am working on a scraper for lesswrong. I already downloaded all the html of every post, but I need to parse it into a machine readable format, and then I will publish it as a torrent.

All reddit comments ever are available. I don't really know what the utility of this is, I'm mostly interested in this stuff for machine learning. But I have found that reddit comments are fantastic for answering questions that wikipedia might not be able to answer, not to mention multiple lifetimes of reading material. I once had an IRC bot that would answer questions by searching askreddit, and it was fairly effective for many types of questions. Similarly it might be worth scraping other social media sites such as hacker news.

I find a torrent for "reddit's favorite books" which contains hundreds of books people recommended on reddit. It may be worth downloading say all books that have ever appeared on a best sellers list. But one would need to have such a list and how to scrape libgen, which I haven't looked into yet.

Various textbooks are available through torrent sites or the library genesis. These contain knowledge in a format better than wikipedia, I think. Also scientific papers.

The problem with this is that many books and especially papers and textbooks, are distributed in weird formats like pdf or even postscript. These formats are awful and don't compress well.

The fantastic thing about text data is that it's so small, compared to images or video. And it compresses super well. You can store multiple libraries worth of text in a cheapish hard drive.

But pdfs store tons of data as overhead. Just converting them to text might be possible. But that fails terribly on math or anything that isn't english text. Especially graphs which are important I think. OCR has tons of errors. I'd love to someday have a local archive of all of humanity's knowledge with almost every book and paper ever published, but it would require solving this problem.

Then perhaps it would be possible to store the data on nickel plates that will last up to 10,000 years. One website is doing that to all of their data. Which is crazy because it's mostly images too. There is no information on the total storage space, but they do say "Ten thousand standard letter-sized sheets of text or more could fit onto a 2.2-inch diameter nickel plate", which seems like a lot.

Comment author: DataPacRat 14 September 2016 05:58:30PM *  0 points [-]

I am working on a scraper for lesswrong. I already downloaded all the html of every post, but I need to parse it into a machine readable format, and then I will publish it as a torrent.

I think that'll be worth at least a Discussion post when you publish it, for those of us who don't keep track of every comment. :)

(Will you be including OvercomingBias?)

But I also like to have local copies of things in case of emergency or just a regular internet outage.

I've found a torrent of public-domain "survival books" of which at least some may interest you; unfortunately, LW doesn't seem to want to let me embed the magnet URL, so I'll try just pasting it: magnet:?xt=urn:btih:57963b66246379aa3c10d84a5de92c0ab5173faf&dn=SurvivalLibrary&tr=http%3a%2f%2ftracker.tfile.me%3a80%2fannounce&tr=http%3a%2f%2fpow7.com%3a80%2fannounce&tr=http%3a%2f%2ftracker.pow7.com%2fannounce&tr=http%3a%2f%2ftorrent.gresille.org%3a80%2fannounce&tr=http%3a%2f%2fp4p.arenabg.ch%3a1337%2fannounce&tr=http%3a%2f%2fretracker.krs-ix.ru%2fannounce&tr=http%3a%2f%2fmgtracker.org%3a2710%2fannounce&tr=http%3a%2f%2ftracker.dutchtracking.nl%3a80%2fannounce&tr=http%3a%2f%2fshare.camoe.cn%3a8080%2fannounce&tr=http%3a%2f%2ftracker.dutchtracking.com%3a80%2fannounce&tr=http%3a%2f%2fexplodie.org%3a6969%2fannounce&tr=http%3a%2f%2ftorrent.gresille.org%2fannounce&tr=http%3a%2f%2fretracker.krs-ix.ru%3a80%2fannounce&tr=http%3a%2f%2ft1.pow7.com%2fannounce&tr=http%3a%2f%2fpow7.com%2fannounce&tr=http%3a%2f%2fsecure.pow7.com%2fannounce&tr=http%3a%2f%2ftracker.tfile.me%2fannounce&tr=http%3a%2f%2fatrack.pow7.com%3a80%2fannounce&tr=http%3a%2f%2fextremlymtorrents.me%2fannounce.php&tr=http%3a%2f%2finferno.demonoid.me%3a3414%2fannounce&tr=http%3a%2f%2ftorrentsmd.com%3a8080%2fannounce&tr=udp%3a%2f%2fopen.facedatabg.net%3a6969%2fannounce&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337&tr=udp%3a%2f%2fthetracker.org%3a80&tr=udp%3a%2f%2f9.rarbg.to%3a2710&tr=udp%3a%2f%2f9.rarbg.me%3a2710%2fannounce&tr=udp%3a%2f%2f9.rarbg.to%3a2710%2fannounce&tr=udp%3a%2f%2f9.rarbg.me%3a2710&tr=udp%3a%2f%2fopen.facedatabg.net%3a6969&tr=udp%3a%2f%2ftracker.ex.ua%3a80%2fannounce&tr=udp%3a%2f%2finferno.demonoid.com%3a3411%2fannounce&tr=udp%3a%2f%2finferno.demonoid.ph%3a3389%2fannounce&tr=udp%3a%2f%2f9.rarbg.com%3a2710%2fannounce&tr=udp%3a%2f%2ftracker.leechers-paradise.org%3a6969%2fannounce&tr=udp%3a%2f%2ftracker.coppersurfer.tk%3a6969%2fannounce&tr=udp%3a%2f%2ftracker.ilibr.org%3a6969%2fannounce&tr=udp%3a%2f%2fzer0day.ch%3a1337%2fannounce&tr=udp%3a%2f%2fwww.eddie4.nl%3a6969%2fannounce&tr=udp%3a%2f%2ftorrent.gresille.org%3a80%2fannounce&tr=udp%3a%2f%2fp4p.arenabg.ch%3a1337%2fannounce&tr=udp%3a%2f%2fp4p.arenabg.com%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.leechers-paradise.org%3a6969&tr=udp%3a%2f%2ftracker.kicks-ass.net%3a80%2fannounce&tr=udp%3a%2f%2ftracker.tiny-vps.com%3a6969%2fannounce&tr=udp%3a%2f%2f91.218.230.81%3a6969%2fannounce&tr=udp%3a%2f%2f168.235.67.63%3a6969%2fannounce&tr=udp%3a%2f%2fexplodie.org%3a6969%2fannounce&tr=udp%3a%2f%2feddie4.nl%3a6969%2fannounce&tr=udp%3a%2f%2ftracker.coppersurfer.tk%3a6969&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.aletorrenty.pl%3a2710%2fannounce&tr=http%3a%2f%2ftracker.dler.org%3a6969%2fannounce

Comment author: Lumifer 14 September 2016 04:41:52PM 3 points [-]

If the world does collapse access to wikipedia could be enormously useful.

What makes you think you'll have electricity in a TEOTWAWKI scenario? I'll still take beans & ammo (and maybe a paper survivalist book).

On a more general level, if you desire to prepare for the civilization collapse, downloading Wikipedia to your local hard drive is probably not the right place to start.

Comment author: DataPacRat 14 September 2016 05:53:50PM 4 points [-]

not the right place to start

Who says that's where I'm starting? :)

I already have my short-term physical supplies, including water, food, camping gear, and AA-battery-powerable handheld ham radio. I also have a highly-portable solar panel capable of keeping my phone, and the offline copy of Wikipedia I keep on its SD Card, functioning regardless of the power grid; and I have enough battery-backup stuff at home to run my laptop long enough to copy the latest Wikipedia dump (and whatever emergency-survival ebooks I've collected by then) onto that SD card.

Comment author: ChristianKl 13 September 2016 10:01:06AM 4 points [-]

I don't see much value in having a recent copy of Wikipedia or Project Gutenberg on my computer. In both cases the availability of the information is secured by other parties. It's more valuable to make sure that I store information that's not protected by other people

Comment author: DataPacRat 14 September 2016 12:01:01AM 2 points [-]

Someone Is Learning How to Take Down the Internet. What will you do when the only data you have access to is whatever you have stored locally?

Comment author: philh 12 September 2016 10:44:49AM *  3 points [-]

Low confidence that this will help, but my approach: I mentally move the right-hand matrix up, so that the space "in between" them (right of the first, below the second) is the right shape for the result. Each value of the result is the dot product of the vectors to the left and above it. (I don't have a trick for dot products, I just know how to calculate them.)

. . . . g h i
a b c * j k l
d e f . m n o

"becomes"

. . . g h i
. . . j k l
. . . m n o
. . . -----
a b c|S T W
d e f|X Y Z

and e.g. S is (a b c) dot (g j m), Y is (d e f) dot (h k n).

Comment author: DataPacRat 12 September 2016 10:19:31PM 2 points [-]

By Juniper, I think this is the one. It's simple enough that I can actually remember this, and I can solve arbitrary matrixes with this approach.

Feel free to have an internet cookie in thanks. :)

Comment author: DataPacRat 12 September 2016 10:09:14PM 4 points [-]

Time to rebuild a library

My 5 terabyte harddrive went poof this morning, and silly me hadn't bought data-recovery insurance. Fortunately, I still have other copies of all my important data, and it'll just take a while to download everything else I'd been collecting.

Which brings up the question: What info do you feel it's important to have offline copies of, gathered from the whole gosh-dang internet? A recent copy of Wikipedia and the Project Gutenberg DVD are the obvious starting places... which other info do you think pays the rent of its storage space?

Comment author: Elo 09 September 2016 06:46:20AM -2 points [-]

Consider couch to 5k. It's a good basic place to start.

Expect at least 2 months before you are feeling fit. You get to feel progress in the sense of "could run a bit further today" each new run. the two most important things:

  1. You will get hurt. You will injure yourself. If you think you won't you definitely will, and you will have to take rest because of it. It might set you back days or weeks. But it's better to rest.
  2. You actually make gains to muscle and strength on your days off. When the muscles repair and grow back. Because of this - most of the pages on the fitness subreddits will have a 3-4 days a week routine with rest days in between. Rest days are important.
Comment author: DataPacRat 10 September 2016 05:39:46PM 0 points [-]

FYI, my current plan is 3 days a week of bodyweight exercises (working up to /r/bodyweight's recommended routine), and 3 days a week of jogging starting with a pre-C25k program. How well that plan succeeds, well, we'll just have to see. :)

Comment author: DataPacRat 10 September 2016 02:30:07AM 3 points [-]

Matrix multiplication

Could somebody explain to me, in a way I'd actually understand, how to (remember how to) go about multiplying a pair of matrixes? I've looked at Wikipedia, I've read linear algebra books up to where they supposedly explain matrixes, and I keep bouncing up against a mental wall where I can't seem to remember how to figure out how to get the answer.

Comment author: Elo 09 September 2016 06:46:20AM -2 points [-]

Consider couch to 5k. It's a good basic place to start.

Expect at least 2 months before you are feeling fit. You get to feel progress in the sense of "could run a bit further today" each new run. the two most important things:

  1. You will get hurt. You will injure yourself. If you think you won't you definitely will, and you will have to take rest because of it. It might set you back days or weeks. But it's better to rest.
  2. You actually make gains to muscle and strength on your days off. When the muscles repair and grow back. Because of this - most of the pages on the fitness subreddits will have a 3-4 days a week routine with rest days in between. Rest days are important.
Comment author: DataPacRat 09 September 2016 02:23:46PM 0 points [-]

I appreciate the suggestion, and have added a few bookmarks on C25k to my to-read pile.

Given my experience today, attempting to add some of the warm-up routines from /r/bodyweight's Recommended Routine, I'm afraid that it seems that I have a bit of a ways to go before I'm even at the level of 'couch'. Ah well; I knew what I was getting into when I started this, and am picking up as much theory as I can to adapt pre-existing routines to my circumstances.

Comment author: cousin_it 08 September 2016 12:26:23PM *  0 points [-]

I did burpees for a while. Now I'm not sure what's the point. Sure, you get tired quickly, but you don't feel strong or fast while doing it. Lifting average-heavy stuff for 10-15 reps, or running 100m dashes with short breaks, is much more fun for me because I can go all out and push against my limit of power, not just my fatigue.

Comment author: DataPacRat 08 September 2016 03:20:15PM 0 points [-]

I am, close to literally, starting from scratch, exercise-wise. I've started a thread in the bodyweight subreddit about a better exercise regimen, and am entirely literally in a mall right now looking for some exercise bands to let me do more types of movements in the area I have to exercise in. You could think of the burpees as a placeholder while I work out something better.

Comment author: root 06 September 2016 04:22:00AM 0 points [-]

ie for 17, do 7, 6, 3, 2, 1),

That rounds up to 19, not 17.

Comment author: DataPacRat 06 September 2016 12:52:09PM 0 points [-]

<forehead slap> So it does. Obviously, for 17, I should have wrote 6, 5, 3, 2, 1. (And for 19, it would be 6, 5, 4, 3, 1.)

View more: Prev | Next