Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Bragging Thread March 2015

5 Morendil 08 March 2015 01:50PM

Your job, should you choose to accept it, is to comment on this thread explaining the most awesome thing you've done this month. You may be as blatantly proud of yourself as you feel. You may unabashedly consider yourself the coolest freaking person ever because of that awesome thing you're dying to tell everyone about. This is the place to do just that.

Remember, however, that this isn't any kind of progress thread. Nor is it any kind of proposal thread. This thread is solely for people to talk about the awesome things they have done. Not "will do". Not "are working on"Have already done. This is to cultivate an environment of object level productivity rather than meta-productivity methods.

So, what's the coolest thing you've done this month?

(Previous Bragging Thread)

December 2014 Bragging Thread

3 Morendil 30 November 2014 11:54PM

Your job, should you choose to accept it, is to comment on this thread explaining the most awesome thing you've done this month. You may be as blatantly proud of yourself as you feel. You may unabashedly consider yourself the coolest freaking person ever because of that awesome thing you're dying to tell everyone about. This is the place to do just that.

Remember, however, that this isn't any kind of progress thread. Nor is it any kind of proposal thread. This thread is solely for people to talk about the awesome things they have done. Not "will do". Not "are working on"Have already done. This is to cultivate an environment of object level productivity rather than meta-productivity methods.

So, what's the coolest thing you've done this month?

Rocket science and big money - a cautionary tale of math gone wrong

14 Morendil 24 April 2013 03:11PM


The 2006 report from NASA's "Independent Verification and Validation Facility" makes some interesting claims. Turning to page 6, we learn that thanks to IV&V, "NASA realized a software rework risk reduction benefit of $1.6 Billion in Fiscal Year 2006 alone". This is close to 10% of NASA's overall annual budget, roughly equal to the entire annual budget of the International Space Station!

If the numbers check out, this is an impressive feat for IV&V (the more formal big brother of "testing" or "quality assurance" departments that most software development efforts include). Do they?


Flaubert and the math of ROI

Back in 1841, to tease his sister, Gustave Flaubert invented the "age of the captain problem", which ran like this:

A ship sails the ocean. It left Boston with a cargo of wool. It grosses 200 tons. [...] There are 12 passengers aboard, the wind is blowing East-North-East, the clock points to a quarter past three in the afternoon. It is the month of May. How old is the captain?

Flaubert was pointing out one common way people fail at math: you can only get sensible results from a calculation if the numbers you put in are related in the right ways. (Unfortunately, math education tends to be excessively heavy on the "manipulate numbers" part and to skimp on the "make sense of the question" part, a trend dissected by French mathematician Stella Baruk who titled one of her books after Flaubert's little joke on his sister.)

Unfortunately, NASA's math turns out on inspection to be "age-of-the-captain" math. (This strikes me as a big embarrassment to an organization literally composed mainly of rocket scientists.)

continue reading »

[Link] How to Win at Forecasting - a conversation with Philip Tetlock

3 Morendil 06 December 2012 06:13PM

Over at Edge, Tetlock discusses "expert political judgment", the controversy surrounding Nate Silver's presidential predictions, overconfidence, motivated cognition, black swans, the IARPA forecasting contest, and much else. A few choice bits:

There's a question that I've been asking myself for nearly three decades now and trying to get a research handle on, and that is why is the quality of public debate so low and why is it that the quality often seems to deteriorate the more important the stakes get?


Is world politics like a poker game? This is what, in a sense, we are exploring in the IARPA forecasting tournament. You can make a good case that history is different and it poses unique challenges. This is an empirical question of whether people can learn to become better at these types of tasks. We now have a significant amount of evidence on this, and the evidence is that people can learn to become better. It's a slow process. It requires a lot of hard work, but some of our forecasters have really risen to the challenge in a remarkable way and are generating forecasts that are far more accurate than I would have ever supposed possible from past research in this area.


One of the things I've discovered in my work on assessing the accuracy of probability judgment is that there is much more eagerness in participating in these exercises among people who are younger and lower in status in organizations than there is among people who are older and higher in status in organizations.


From a sociological point of view, it's a minor miracle that this forecasting tournament is even occurring. Government agencies are not supposed to sponsor exercises that have the potential to embarrass them.



If you think that the Eurozone is going to collapse–if you think it was a really bad idea to put into common currency economies at very different levels of competitiveness, like Greece and Germany (that was a fundamentally unsound macroeconomic thing to do and the Eurozone is doomed), that's a nice example of an emphatic but untestable hedgehog kind of statement. It may be true, but it's not very useful for our forecasting tournament.

To make a forecasting tournament work we have to translate that hedgehog like hunch into a testable proposition like will Greece leave the Eurozone or formally withdraw from the Eurozone by May 2013? Or will Portugal? You need to translate the abstract interesting issue into testable propositions and then you need to get lots of thoughtful people to make probability judgments in response to those testable proposition questions. You need to do that over, and over, and over again.


In our tournament, we've skimmed off the very best forecasters in the first year, the top two percent. We call them "super forecasters." They're working together in five teams of 12 each and they're doing very impressive work.


Another amazing and wonderful thing about this tournament is how many really smart, thoughtful people are willing to volunteer, essentially enormous amounts of time to make this successful. We offer them a token honorarium. We're paying them right now $150 or $250 a year for their participation. The ones who are really taking it seriously–it's way less the minimum wage. And they're some very thoughtful professionals who are participating in this. Some political scientists I know have had some disparaging things to say about the people who might participate in something like this and one phrase that comes to mind is "unemployed news junkies." I don't think that's a fair characterization of our forecasters. Certainly the most actively engaged of our forecasters are really pretty awesome. They're very skillful at finding information, synthesizing it, and applying it, and then updating the response to new information. And they're very rapid updaters.

(I confess to some feelings of pride, possibly unearned, on reading this last paragraph - as the top forecaster of a middle-ranked team.)

But actually, go read the whole thing.

Raising the forecasting waterline (part 2)

14 Morendil 12 October 2012 03:56PM

Previously: part 1

The three tactics I described in part 1 are most suited to making an initial forecast. I will now turn to a question that was raised in comments on part 1 - that of updating when new evidence arrives. But first, I'd like to discuss the notion of a "well-specified forecast".

Well-specified forecasts

It is often surprisingly hard to frame a question in terms that make a forecast reasonably easy to verify and score.  Questions can be ambiguous (consider "X will win the U.S. presidential election" - do we mean win the popular vote, or win re-election in the electoral college?). They can fail to cover all possible outcomes (so "which of the candidates will win the election" needs a catch-all "Other").1

continue reading »

Raising the forecasting waterline (part 1)

31 Morendil 09 October 2012 03:49PM

Previously: Raising the waterline, see also: 1001 PredictionBook Nights (LW copy), Techniques for probability estimates

Low waterlines imply that it's relatively easy for a novice to outperform the competition. (In poker, as discussed in Nate Silver's book, the "fish" are those who can't master basic techniques such as folding when they have a poor hand, or calculating even roughly the expected value of a pot.) Does this apply to the domain of making predictions? It's early days, but it looks as if a smallish set of tools - a conscious status quo bias, respecting probability axioms when considering alternatives, considering references classes, leaving yourself a line of retreat, detaching from sunk costs, and a few more - can at least place you in a good position. 

continue reading »

Raising the waterline

30 Morendil 07 October 2012 04:23PM

Among the goals of Less Wrong is to "raise the sanity waterline" of humanity. We've also talked about "raising the rationality waterline": the phrase is somewhat popular around these parts, which suggests that the metaphor is catchy. But is that all there is to it, a catchy metaphor? Or can the phrase be more usefully cashed out?

While reading Nate Silver's The Signal and the Noise, I came across a discussion of "raising the waterline" which fleshes out the metaphor with a more substantial model. This model preserves some of the salient aspects of the metaphor as discussed on LW, for instance the perception that the current waterline (as regards sanity and rationality) is "ridiculously low". More interestingly, it fleshes out some of the specific ways that a "waterline" belief should constrain our future sensory experiences, maybe even to the point of quantifying what should result from low (or rising) waterlines.

This is intended as a short series:

  • "Raising the waterline", this introductory post, will summarize Nate Silver's "waterline" model, within its original context of playing Poker, which Silver frames as a game of prediction under uncertainty. Poker therefore serves as a "toy model" for a much more general class of problems.
  • "Raising the forecasting waterline" will extend the discussion to the kind of forecasts studied by Philip Tetlock's Good Judgement Project, a prediction game somewhat similar to PredictionBook and related to prediction markets; I will leverage the waterline model to extract useful insights from my participation in GJP.
  • "Raising the discussion waterline", a shamelessly speculative coda, will relate the previous two posts to the question of "how do Internet discussions reliably lead to correct inferences from true beliefs, or fail to do so"; I will argue that the waterline model brings some hope that a few basic tactics could nevertheless provide large wins, and raise the more general question of what other low waterlines we could aim to exploit.

continue reading »

Fallacies of reification - the placebo effect

20 Morendil 13 September 2012 07:03AM

TL;DR: I align with the minority position that "there is a lot less to the so-called placebo effect than people tend to think there is (and the name is horribly misleading)", a strong opinion weakly held.

The following post is an off-the cuff reply to a G+ post of gwern's, but I've been thinking about this on and off for quite a while. Were I to expand this for posting to Main, I would: a) go into more detail about the published research, b) introduce a second fallacy of reification for comparison, the so-called "10X variance in programmer productivity".

My agenda is to have this join my short series of articles on "software engineering as a diseased discipline", which I view as my modest attempt at "using Less Wrong ideas in your secret identity" and is covered at greater length in my book-in-progress.

I would therefore appreciate your feedback and probing at weak points.

Most of the time, talk of placebo effects (or worse of "the" placebo effect) falls victim to the reification fallacy.

My position is roughly "there is a lot less to the so-called placebo effect than people think there is (and the name is horribly misleading)".

More precisely: the term "placebo" in the context of "placebo controlled trial" has some usefulness, when used to mean a particular way of distinguishing between the null and test hypotheses in a trial: namely, that the test and control group receive exactly the same treatment, except that you substitute, in the control group, an inert substance (or inoperative procedure) for the putatively active substance being tested.

Whatever outcome measures are used, they will generally improve somewhat even in the control group: this can be due to many things, including regression to the mean, the disease running its course, increased compliance with medical instructions due to being in a study, expectancy effects leading to biased verbal self-reports.

None of these is properly speaking an "effect" causally linked to the inert substance (the "placebo pill"). The reification fallacy consists of thinking that because we give something a name ("the placebo effect") then there must be a corresponding reality. The false inference is "the people who improved in the control group were healed by the power of the placebo effect".

The further false inference is "there are ailments of which I could be cured by ingesting small sugar pills appropriately labeled". Some of my friends actually leverage this into justification for buying sugar in pharmacies at a ridiculous markup. I confess to being aghast whenever this happens in my presence.

A better name has been suggested: the "control response". This is experiment-specific, and encompasses all of the various mechanisms which make it look like "the control group improves when given a sugar pill / saline solution / sham treatment". Moreover it avoids hinting at mysterious healing powers of the mind.

Meta-analyses of those few studies that were designed to find an actual "placebo effect" (i.e. studies with a non-treatment arm, or studies comparing objective outcome measures for different placebos) have not confirmed it, the few individual studies that find a positive effect are inconclusive for a variety of reasons.

Doubting the existence of the placebo effect will expose you to immediate contradiction from your educated peers. One explanation seems to be that the "placebo effect" is a necessary argumentative prop in the arsenal of two opposed "camps". On the one hand proponents of CAM (Complementary and Alternative Medicine) will argue that "even if a herbal remedy is a placebo, who cares as long as it actually works" and must therefore assume that the placebo effect is real. On the other hand opponents of CAM will say "homeopathy or herbal remedies only seem to work because of the placebo effect, we can therefore dismiss all positive reports from people treating themselves with such".

I don't have a proper list of references yet, but see the following:








[Link] Failed replications of the "elderly walking" priming effect

19 Morendil 13 March 2012 07:55AM
Recently a controversy broke out over the replicability of a study John Bargh et al. published in 1996. The study reported that unconsciously priming a stereotype of elderly people caused subjects to walk more slowly. A recent replication attempt by Stephane Doyen et al., published in PLoS ONE, was unable to reproduce the results. (Less publicized, but surely relevant, is another non-replication by Hal Pashler et al.) (source)


This is interesting, if only because the study in question is one of the more famous examples of priming effects - it's the one I tend to use when I introduce people to the idea of priming. (Ironically, the failed replication study also mentions a further experimental manipulation that does show priming effects - affecting the experimenters rather than the subjects.) Bargh's reply is also unusual in that it focuses significantly on extra-scientific arguments, such as attacks on the open access business model of PLoS ONE.

I was instantly reminded of The Golem, which "debunks the view that scientific knowledge is a straightforward outcome of competent theorization, observation, and experimentation". The examples on relativity and solar neutrinos are particularly engaging - it's not just psychology where experimentation is problematic, but all of science.

The linked blog also contributes useful observations of its own, such as the "rhetorical function" of the additional experiment in Doyen's study, how online publication makes a difference in how easily experimental setups can be replicated, or a subtle point about our favorite villain, p-values.


EDIT: added link to source. Heartfelt thanks to the two readers who upvoted the version without the link. :)

Causal diagrams and software engineering

32 Morendil 07 March 2012 06:23PM

Fake explanations don't feel fake. That's what makes them dangerous. -- EY

Let's look at "A Handbook of Software and Systems Engineering", which purports to examine the insights from software engineering that are solidly grounded in empirical evidence. Published by the prestigious Fraunhofer Institut, this book's subtitle is in fact "Empirical Observations, Laws and Theories".

Now "law" is a strong word to use - the highest level to which an explanation can aspire to reach, as it were. Sometimes it's used in a jokey manner, as in "Hofstadter's Law" (which certainly seems often to apply to software projects). But this definitely isn't a jokey kind of book, that much we get from the appeal to "empirical observations" and the "handbook" denomination.

Here is the very first "law" listed in the Handbook:

Requirement deficiencies are the prime source of project failures.

Previously, we observed that in the field of software engineering, a last name followed by a year, surrounded by parentheses, seems to be a magic formula for suspending critical judgment in readers.

Another such formula, it seems, is the invocation of statistical results. Brandish the word "percentage", assert that you have surveyed a largish population, and whatever it is you claim, some people will start believing. Do it often enough and some will start repeating your claim - without bothering to check it - starting a potentially viral cycle.

As a case in point, one of the most often cited pieces of "evidence" in support of the above "law" is the well-known Chaos Report, according to which the first cause of project failure is "Incomplete Requirements". (The Chaos Report isn't cited as evidence by the Handbook, but it's representative enough to serve in the following discussion. A Google Search readily attests to the wide spread of the verbatim claim in the Chaos Report; various derivatives of the claim are harder to track, but easily verified to be quite pervasive.)

Some elementary reasoning about causal inference is enough to show that the same evidence supporting the above "law" can equally well be suggested as evidence supporting this alternative conclusion:

Project failures are the primary source of requirements deficiencies.

continue reading »

View more: Next