gwern comments on Frequentist Statistics are Frequently Subjective - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (81)
Scientists cannot even get their papers published under distributable terms or Free terms. The Open Access people have issues precisely because researchers don't want to take the time to learn about all that and work through it, and copyright law, by default to all-rights-reserved, doesn't help in the least. (This is one reason why such people try to get federal-supported papers to be mandated to be open access; defaults are incredibly important.) In many cases, they don't have permission, or it's too difficult to figure out who has permission. And publishers occasionally do nasty things like fire off DMCA takedowns in all directions. (The ACM being just the latest example.)
It should be possible. It often is. It's also possible to run a marathon backwards.
The standard recommendation, incidentally. But this is not a cureall because data requires interpretation, and the entire computing world never has and never will switch entirely to textual formats. And as long as old binary or hard-to-read data is around, the costs will be paid. The Daily WTF as well furnishes evidence that even textual formats can require reverse-engineering (to say nothing of the reputation scientists have for bad coding).
I have a problem, someone says; I know, I'll use a Global Unique ID... A ID is perhaps the simplest possible solution, but it only works if you never need to do anything besides answer the question 'is this dataset I'm looking at the one whose ID I know?' You don't get search, you don't get history, or descriptions, or locations, or anything. One could be in the position of the would-be BitTorrent leech: one has the hashes and .torrent one needs, but there don't seem to be any seeds...
I didn't mean accessibility in the sense of catering to the blind (although that is an issue, textual formats alleviate it). I meant more along the lines of community issues, it needs to be publicly online, it needs to be well-known, needs to be well-used, easily searched or found, and have zero friction for use. It cannot be Citizendium; it must be Wikipedia. It cannot be like the obscure Open Access databases libraries try to maintain; it must be like ArXiv. There are scads of archive sites and libraries and whatnot; no one uses them because they're too hard to remember which one to use when. Archive services benefit heavily from network effects.