Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Long-time readers may have noticed that spam on the wiki has been a very persistent problem for the past 2 years or so; I've been dealing with it so far by hand, but I recently reached a breaking point and asked Trike to resolve it or find a new wiki administrator. (Speaking of which, is anyone interested?)
So Trike has enabled a MediaWiki extension called the edit filter: a small functional programming language which lets you define predicates applied to edits which trigger one of a set of actions, like banning a user, deleting an edit/page, or stopping an edit from going through. I have so far defined one rule: page creation is forbidden for users younger than 24 hours. This so far seems to have worked well; spam pages have fallen from 5-10/day to ~5 over the past 2 weeks. This is much more manageable, and I am hopeful that this new anti-spam measure will be effective longer than the previous additions did (but if it doesn't, I'll look into adding more rules dealing with images and external links, and perhaps also ban users whose names end in a numeric digit as almost all the spam accounts do).
If you've run into this edit filter before by making a page and seeing the submission rejected with an error message, fret not: merely wait 24 hours. (If your account is more than a day old and you're still getting errors, please contact me or Trike.)
Previous: Recent updates to gwern.net (2011)
“But where shall wisdom be found? / And where is the place of understanding? / Man knoweth not the price thereof; neither is it found in the land of the living…for the price of wisdom is above rubies.”
As before, here is material I’ve worked on in the 477 days since my last update which LWers may find interesting. In roughly chronological & topical order, here are the major additions to
- I interviewed translator Michael House about his work in Japan as a translator
- finished data collection for my hafu anime statistics page and begun analysis. (I’ve achieved good coverage of characters, found an astonishingly consistent absence of Korean characters, and confirmed the blond-haired/blue-eyed stereotype; but my original thesis doesn’t seem to work and the data is too unevenly distributed to identify time trends.)
- judged the 2011 & 2012 results for the Haskell Summer of Codes and the accuracy of my predictions
- did a meta-analysis on whether dual n-back increases IQ, and examining possible biases and various claims about what makes the training work or not work
- did another meta-analysis on whether iodine increases IQ, etc
- tried kratom
- did a nicotine gum/n-back experiment
- did 2 potassium experiments; neither improved my mood/productivity, and one damaged my sleep
- my Silk Road page has been expanded with a BBC interview, putting SR in a historical cypherpunk context, an updated account of all arrests & law enforcement actions, and application of basic statistics to ordering
- ran 2 sleep experiments on the timing of taking a vitamin D supplement: I found that taking vitamin D before bed substantially damaged my sleep, while taking vitamin D after waking up did not hurt & somewhat helped
- checked whether a walking desk (treadmill) damaged typing speed or accuracy
- I have run 3 Wikipedia experiments establishing that: Talk page edits are ignored by editors; random link deletions (and their restoration) are also ignored by editors; and external link suggestions on Talk pages are also ignored by readers. (I take the former 2 as indicative of the decline in edit activity and rise of deletionist beliefs on Wikipedia.)
- tried some economic/historical analysis: “Reasons of State: Why Didn’t Denmark Sell Greenland to the USA?”
- Defending sunk costs essay (LW discussion)
- “Slowing Moore’s Law: Why You Might Want To and How You Would Do It”
- “The Hyperbolic Time Chamber as Brain Emulation Analogy”
- tried estimating the bandwidth of a Death Note
- compiled predictions for Harry Potter and the Methods of Rationality
- looked into Conscientiousness and online education; studies so far are useless from a meta-analytic standpoint
- tripled length of appendix dealing with the reliability of mainstream science (methodological flaws, replication rates, etc)
- finished meta-ethics essay, “The Narrowing Circle”
- explained the philosophy saying “one man’s modus ponens is another man’s modus tollens”
- speculation about a restoration of the British monarchy
- clean up & exploratory data analysis of SDr’s lucid dreaming data
- Who wrote the Death Note script? (LW discussion)
- 2012 US election predictions: statistical comparison
- Turing-completeness in surprising places (inventory of particularly “weird machines”; relevant to computer and AI security)
Transcribed or translated:
- Nash’s letters on cryptography
- Douglas Hofstadter’s superrationality columns (from Metamagical Themas, 1985)
- “The Iron Law Of Evaluation And Other Metallic Rules”, Rossi 1987 (lessons from the large RCTs evaluating social & welfare interventions)
- “The Ups and Downs of the Hope Function In a Fruitless Search”, Falk et al 1994
- Gene Wolfe on writing
- “Shiny balls of Mud: William Gibson Looks at Japanese Pursuits of Perfection” (2002)
- “Otaku Talk”, Okada et al 2004
- “Earth in My Window”, Murakami 2005
- “On The Battlefield of ‘Superflat’”
- “Ero-Anime: Manga Comes Alive”, Sarrazin 2010
- 1996 NewType interview with Hideaki Anno (translated by me, with the help of an EGFer)
- 1997 Animeland interview with Hideaki Anno (bought, transcribed, and translated by me with the help of other LWers)
- 1997 Utena interviews
- added edit history statistics/visualization for
- site traffic updates: July-December 2011, January 2012-July 2012, July 2012-Jan 2013
- There’s also been a lot of backend changes: switching to Amazon S3+Cloudflare, adding error pages, metadata like tags, A/B testing, but no need to go into detail.
Excerpts from literature on robotic/self-driving/autonomous cars with a focus on legal issues, lengthy, often tedious; some more SI work. See also Notes on Psychopathy.
Having read through all this material, my general feeling is: the near-term future (1 decade) for autonomous cars is not that great. What's been accomplished, legally speaking, is great but more limited than most people appreciate. And there are many serious problems with penetrating the elaborate ingrown rent-seeking tangle of law & politics & insurance. I expect the mid-future (+2 decades) to look more like autonomous cars completely taking over many odd niches and applications where the user can afford to ignore those issues (eg. on private land or in warehouses or factories), with highways and regular roads continuing to see many human drivers with some level of automated assistance. However, none of these problems seem fatal and all of them seem amenable to gradual accommodation and pressure, so I am now more confident that in the long run we will see autonomous cars become the norm and human driving ever more niche (and possibly lower-class). On none of these am I sure how to formulate a precise prediction, though, since I expect lots of boundary-crossing and tertium quids. We'll see.
I do an informal experiment testing whether LessWrong karma scores are susceptible to a form of anchoring based on the first comment posted; a medium-large effect size is found although the data does not fit the assumed normal distribution so there may or may not be an anchoring effect. Full writeup on gwern.net
It has been suggested that the top-scoring articles tend to benefit from an initially positive reaction in comments. Such an anchoring or social proof effect resulting in a first-mover advantage seems quite plausible to me.
Thereafter, whenever I wrote an Article or Discussion, after making it public, I flipped a coin and if Heads, I posted a comment as Rhwawn saying
Upvoted or if Tails, a comment saying
Downvoted. (Grognor said that the comments came with reasons, but unfortunately if I came up with reasons for either comment, some criticisms or praise would be better than others and this would be another source of variability; I settled on adding some generic comments, see the full writeup.) Needless to say, no actual vote was made. I then made a number of quality comments and votes on other Articles/Discussions to camouflage the experimental intervention. (In no case did I upvote or downvote someone I had already replied to or voted on with my Gwern account.) Finally, I scheduled a reminder on my calendar for 30 days later to record the karma on that Article/Discussion. I don’t post that often, so I decided to stop after 1 year, on 27 February 2013. I wound up breaking this decision since by September I had ceased to find it an interesting question, it was an unfinished task that was burdening my mind, and the necessity of making some genuine contributions as Rhwawn to cloak a anchoring comment was a not-so-trivial inconvenience that was stopping me from posting.
From pg812-1020 of Chapter 8 “Sufficiency, Ancillarity, And All That” of Probability Theory: The Logic of Science by E.T. Jaynes:
The classical example showing the error of this kind of reasoning is the fable about the height of the Emperor of China. Supposing that each person in China surely knows the height of the Emperor to an accuracy of at least ±1 meter, if there are N=1,000,000,000 inhabitants, then it seems that we could determine his height to an accuracy at least as good as
merely by asking each person’s opinion and averaging the results.
The absurdity of the conclusion tells us rather forcefully that the rule is not always valid, even when the separate data values are causally independent; it requires them to be logically independent. In this case, we know that the vast majority of the inhabitants of China have never seen the Emperor; yet they have been discussing the Emperor among themselves and some kind of mental image of him has evolved as folklore. Then knowledge of the answer given by one does tell us something about the answer likely to be given by another, so they are not logically independent. Indeed, folklore has almost surely generated a systematic error, which survives the averaging; thus the above estimate would tell us something about the folklore, but almost nothing about the Emperor.
We could put it roughly as follows:
error in estimate = (8-50)
where S is the common systematic error in each datum, R is the RMS ‘random’ error in the individual data values. Uninformed opinions, even though they may agree well among themselves, are nearly worthless as evidence. Therefore sound scientific inference demands that, when this is a possibility, we use a form of probability theory (i.e. a probabilistic model) which is sophisticated enough to detect this situation and make allowances for it.
As a start on this, equation (8-50) gives us a crude but useful rule of thumb; it shows that, unless we know that the systematic error is less than about of the random error, we cannot be sure that the average of a million data values is any more accurate or reliable than the average of ten1. As Henri Poincare put it: “The physicist is persuaded that one good measurement is worth many bad ones.” This has been well recognized by experimental physicists for generations; but warnings about it are conspicuously missing in the “soft” sciences whose practitioners are educated from those textbooks.
Or pg1019-1020 Chapter 10 “Physics of ‘Random Experiments’”:
…Nevertheless, the existence of such a strong connection is clearly only an ideal limiting case unlikely to be realized in any real application. For this reason, the law of large numbers and limit theorems of probability theory can be grossly misleading to a scientist or engineer who naively supposes them to be experimental facts, and tries to interpret them literally in his problems. Here are two simple examples:
- Suppose there is some random experiment in which you assign a probability p for some particular outcome A. It is important to estimate accurately the fraction f of times A will be true in the next million trials. If you try to use the laws of large numbers, it will tell you various things about f; for example, that it is quite likely to differ from p by less than a tenth of one percent, and enormously unlikely to differ from p by more than one percent. But now, imagine that in the first hundred trials, the observed frequency of A turned out to be entirely different from p. Would this lead you to suspect that something was wrong, and revise your probability assignment for the 101’st trial? If it would, then your state of knowledge is different from that required for the validity of the law of large numbers. You are not sure of the independence of different trials, and/or you are not sure of the correctness of the numerical value of p. Your prediction of f for a million trials is probably no more reliable than for a hundred.
- The common sense of a good experimental scientist tells him the same thing without any probability theory. Suppose someone is measuring the velocity of light. After making allowances for the known systematic errors, he could calculate a probability distribution for the various other errors, based on the noise level in his electronics, vibration amplitudes, etc. At this point, a naive application of the law of large numbers might lead him to think that he can add three significant figures to his measurement merely by repeating it a million times and averaging the results. But, of course, what he would actually do is to repeat some unknown systematic error a million times. It is idle to repeat a physical measurement an enormous number of times in the hope that “good statistics” will average out your errors, because we cannot know the full systematic error. This is the old “Emperor of China” fallacy…
Indeed, unless we know that all sources of systematic error - recognized or unrecognized - contribute less than about one-third the total error, we cannot be sure that the average of a million measurements is any more reliable than the average of ten. Our time is much better spent in designing a new experiment which will give a lower probable error per trial. As Poincare put it, “The physicist is persuaded that one good measurement is worth many bad ones.”2 In other words, the common sense of a scientist tells him that the probabilities he assigns to various errors do not have a strong connection with frequencies, and that methods of inference which presuppose such a connection could be disastrously misleading in his problems.
I excerpted & typed up these quotes for use in my DNB FAQ appendix on systematic problems; the applicability of Jaynes’s observations to things like publication bias is obvious. See also http://lesswrong.com/lw/g13/against_nhst/
If I am understanding this right, Jaynes’s point here is that the random error shrinks towards zero as N increases, but this error is added onto the “common systematic error” S, so the total error approaches S no matter how many observations you make and this can force the total error up as well as down (variability, in this case, actually being helpful for once). So for example, ; with N=100, it’s 0.43; with N=1,000,000 it’s 0.334; and with N=1,000,000 it equals 0.333365 etc, and never going below the original systematic error of . This leads to the unfortunate consequence that the likely error of N=10 is 0.017<x<0.64956 while for N=1,000,000 it is the similar range 0.017<x<0.33433 - so it is possible that the estimate could be exactly as good (or bad) for the tiny sample as compared with the enormous sample, since neither can do better than 0.017!↩
Possibly this is what Lord Rutherford meant when he said, “If your experiment needs statistics you ought to have done a better experiment”.↩
I give a history of the 2009 leaked script, discuss internal & external evidence for its authenticity including stylometrics; and then give a simple step-by-step Bayesian analysis of each point. We finish with high confidence in the script's authenticity, discussion of how this analysis was surprisingly enlightening, and what followup work the analysis suggests would be most valuable.
A summary of standard non-Bayesian criticisms of common frequentist statistical practices, with pointers into the academic literature.
This is some old work I did for SI. See also Notes on the Psychology of Power.
Deviant but not necessarily diseased or dysfunctional minds can demonstrate resistance to all treatment and attempts to change their mind (think No Universally Compelling Arguments; the premier example are probably psychopaths - no drug treatments are at all useful nor are there any therapies with solid evidence of even marginal effectiveness (one widely cited chapter, “Treatment of psychopathy: A review of empirical findings”, concludes that some attempted therapies merely made them more effective manipulators! We’ll look at that later.) While some psychopath traits bear resemblance to general characteristic of the powerful, they’re still a pretty unique group and worth looking at.
The main focus of my excerpts is on whether they are treatable, their effectiveness, possible evolutionary bases, and what other issues they have or don’t have which might lead one to not simply write them off as “broken” and of no relevance to AI.
(For example, if we were to discover that psychopaths were healthy human beings who were not universally mentally retarded or ineffective in gaining wealth/power and were destructive and amoral, despite being completely human and often socialized normally, then what does this say about the fragility of human values and how likely an AI will just be nice to us?)
The unprecedented gap in Methods of Rationality updates prompts musing about whether readership is increasing enough & what statistics one would use; I write code to download FF.net reviews, clean it, parse it, load into R, summarize the data & depict it graphically, run linear regression on a subset & all reviews, note the poor fit, develop a quadratic fit instead, and use it to predict future review quantities.
Then, I run a similar analysis on a competing fanfiction to find out when they will have equal total review-counts. A try at logarithmic fits fails; fitting a linear model to the previous 100 days of _MoR_ and the competitor works much better, and they predict a convergence in <5 years.
Master version: http://www.gwern.net/hpmor#analysis
A time dilation tool from an anime is discussed for its practical use on Earth; there seem surprisingly few uses and none that will change the world, due to the severe penalties humans would incur while using it, and basic constraints like Amdahl's law limit the scientific uses. A comparison with the position of an Artificial Intelligence such as an emulated human brain seems fair, except most of the time dilation disadvantages do not apply or can be ameliorated and hence any speedups could be quite effectively exploited. I suggest that skeptics of the idea that speedups give advantages are implicitly working off the crippled time dilation tool and not making allowance for the disanalogies.
Master version on gwern.net
View more: Next