Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: whales 01 June 2017 12:35:48AM *  0 points [-]

I've started cleaning up and posting some old drafts on my blog. I've drifted away, but some of them may be of interest to people still here. Most directly up this alley so far would be this post recommending people read Trial By Mathematics.

Comment author: whales 11 April 2015 12:47:06AM *  0 points [-]

I like this post. I lean towards skepticism about the usefulness of calibration or even accuracy, but I'm glad to find myself mostly in agreement here.

For lots of practical (to me) situations, a little bit of uncertainty goes a long way concerning how I actually decide what to do. It doesn't really matter how much uncertainty, or how well I can estimate the uncertainty. It's better for me to just be generally humble and make contingency plans. It's also easy to imagine that being well-calibrated (or knowing that you are) could actually demolish biases that are actually protective against bad outcomes, if you're not careful. If you are careful, sure, there are possible benefits, but they seem modest.

But making and testing predictions seems more than modestly useful, whether or not you get better (or better calibrated) over time. I find I learn better (testing effect!) and I'm more likely to notice surprising things. And it's an easy way to lampshade certain thoughts/decisions so that I put more effort into them. Basically, this:

Or in other other words: the biggest problem with your predictions right now is that they don't exist.

To be more concrete, a while back I actually ran a self-experiment on quantitative calibration for time-tracking/planning (your point #1). The idea was to get a baseline by making and resolving predictions without any feedback for a few weeks (i.e. I didn't know how well I was doing--I also made predictions in batches so I usually couldn't remember them and thus target my prediction "deadlines"). Then I'd start looking at calibration curves and so on to see if feedback might improve predictions (in general or in particular domains). It turned out after the first stage that I was already well-calibrated enough that I wouldn't be able to measure any interesting changes without an impractical number of predictions, but while it lasted I got a moderate boost in productivity just from knowing I had a clock ticking, plus more effective planning from the way predictions forced me to think about contingencies. (I stopped the experiment because it was tedious, but I upped the frequency of predictions I make habitually.)

Comment author: whales 08 March 2015 07:33:55PM *  5 points [-]

If I can introduce a problem domain that doesn't get a lot of play in these communities but (I think) should:

End-of-life healthcare in the US seems like a huge problem (in terms of cost, honored preferences, and quality of life for many people) that's relatively tractable for its size. The balance probably falls in favor of making things happen rather than researching technical questions, but I'm hoping it still belongs here.

There's a recent IOM report that covers the presently bleak state of affairs and potential ways forward pretty thoroughly. One major problem is that doctors don't know their patients' care preferences, resulting in a bias towards acute care over palliative care, which in turn leads to unpleasant (and expensive) final years. There are a lot of different levers in actual care practices, advanced care planning, professional education/development, insurance policies, and public education. I might start with the key findings and recommendations (PDF) and think about where to go from there. There's also Atul Gawande's recent book Being Mortal, which I've yet to read but people seem excited about. Maybe look at what organizations like MyDirectives and Goals of Care are doing.

This domain probably has a relative advantage in belief- or value-alignment for people who think widely available anti-aging is far in the future or undesirable, although I'm tempted to argue that in a world with normalized life extension, the norms surrounding end-of-life care become even more important. The problem might also be unusually salient from some utilitarian perspectives. And while I've never been sure what civilizational inadequacy means, people interested in it might be easier to sell on fixing end-of-life care.

Comment author: D_Malik 23 February 2015 10:05:05PM 6 points [-]

People use PredictionBook to make predictions about many heterogeneous questions, in order to train calibration. Couldn't we train calibration more efficiently by making a very large number of predictions about a fairly small, homogeneous group of questions?

For instance, at the moment people are producing a single probability for each of n questions about e.g. what will happen in HPMOR's final arc. This has a high per-question cost (people must think up individual questions, formalize them, judge edge cases, etc.) and you only get one piece of data from each question (the probability assigned to the correct outcome).

Suppose instead we get some repeatable, homogeneous question-template with a numerical answer, e.g. "what time is it?", "how many dots are in this randomly-generated picture?", or "how long is the Wikipedia article named _?". Then instead of producing only one probability for each question, you give your {1,5,10,20,...,90,95,99}-percentile estimates. Possible advantages of this approach:

  • Questions are mass-produced. We can write a program to ask the same question over and over, for different times / pictures / Wikipedia articles. Each question gives more data, since you're producing several percentile estimates for each rather than just a single probability.
  • The task is simpler; more time is spent converting intuition into numbers. There's less system II thinking to do, more just e.g. guessing how many dots are in a picture. You only need to mentally construct a single probability distribution, over a 1-dimensional answer-space, then read off some numbers, rather than constructing a distribution over some high-dimensional answer-space (e.g. what will happen in the final HPMOR arc), deciding which outcomes count as "true" vs "false" for your question, then summing up all the mass counted as "true".

Possible disadvantages of this approach:

  • Less entertaining - speculating about HPMOR's final arc is more fun than speculating about the number of dots in a randomly-generated picture. IMO purchasing fun and skill separately is better than trying to purchase both at the same time.
  • If calibration training doesn't generalize, then you'll only get well-calibrated about numbers of dots, not about something actually important like HPMOR. I'm pretty sure that calibration generalizes, though.
  • Making predictions trains not only calibration but also discrimination, i.e. it decreases the entropy of your probability distribution. Discrimination doesn't generalize much. Improved discrimination about numbers of dots is less useful than about other things.
  • Heterogeneous "messy" questions are probably more representative of the sorts of questions we actually care about, e.g. "when will AGI come?", or "how useful would it be to know more math?". So insofar as calibration and discrimination do not generalize, messy questions are better.
  • The estimates you mass-produce will tend to be correlated, so would provide less information about how well-calibrated you are than the same number of estimates produced more independently, I think.

Overall, I'd guess:

  • The homogeneous mass-prediction approach is better at training calibration.
  • You should use domain-specific training in the domain you want to predict things in, to develop discrimination.
  • It's inefficient to make heterogeneous predictions in domains you don't care very much about.

An alternative, roughly between the two groups discussed above, would be to find some repeatable way of generating questions that are at least slightly interesting. For instance, play online Mafia and privately make lots of predictions about which players have which roles, who will be lynched or murdered, etc. Or predict chess or poker. Or predict karma scores of LW/Reddit comments. Or use a spaced repetition system, but before showing the answer estimate the probability that you got the answer right. Any better ideas?

Comment author: whales 27 February 2015 05:34:13AM *  2 points [-]

You can predict how long tasks/projects will take you (stopwatch and/or calendar time). Even if calibration doesn't generalize, it's potentially useful on its own there. And while you can't quite mass-produce questions/predictions, it's not such a hassle to rack up a lot if you do them in batches. Malcolm Ocean wrote about doing this with a spreadsheet, and I threw together an Android todo-with-predictions app for a similar self experiment.

Comment author: PhilGoetz 14 October 2014 02:29:47AM *  14 points [-]

Bostrom flies by an issue that's very important:

Suppose that a scientific genius of the caliber of a Newton or an Einstein arises at least once for every 10 billion people: then on MegaEarth there would be 700,000 such geniuses living contemporaneously, alongside proportionally vast multitudes of slightly lesser talents. New ideas and technologies would be developed at a furious pace,

Back up. The population of Europe was under 200 million in 1700, less than a sixth of what it is today. The number of intellectuals was a tiny fraction of the number it is today. And the number of intellectuals in Athens in the 4th century BC was probably a few hundred. Yet we had Newton and Aristotle. Similarly, the greatest composers of the 18th and 19th century were trained in Vienna, one city. Today we may have 1000 or 10,000 times as many composers, with much better musical training than people could have in the days before recorded music, yet we do not have 1000 Mozarts or 1000 Beethovens.

Unless you believe human intelligence has been steadily declining, there is one Einstein per generation, regardless of population. The limiting factor is not the number of geniuses. The number of geniuses, and the amount of effort put into science, is nearly irrelevant to the amount of genius-level work accomplished and disseminated.

The limiting factor is organizational. Scientific activity can scale; recognition or propagation of it doesn't. If you graphed scientific output over the years in terms of "important things discovered and adopted by the community" / (scientists * dollars per scientist), you'd see an astonishing exponential decay toward zero. I measured science and technology output per scientist using four different lists of significant advances, and found that significant advances per scientist declined by 3 to 4 orders of magnitude from 1800 to 2000. During that time, the number of scientific journals has increased by 3 to 4 orders of magnitude, and a reasonable guess is that so did the number of scientists. Total recognized "significant" scientific output is independent of the number of scientists working!

You can't just add scientists and money and get anything like proportional output. The scientific community can't absorb or even be aware of most of the information produced. Nor can it allocate funds or research areas efficiently.

So a critical question when thinking about super-intelligences is, How does the efficiency of intelligence scale with resources? Not linearly. To a first approximation, adding more scientists at this point accomplishes nothing.

On the other hand, merely recognizing and solving the organizational problems of science that we currently have would produce results similar to a fast singularity.

Comment author: whales 14 October 2014 07:23:33AM 5 points [-]

I measured science and technology output per scientist using four different lists of significant advances, and found that significant advances per scientist declined by 3 to 4 orders of magnitude from 1800 to 2000. During that time, the number of scientific journals has increased by 3 to 4 orders of magnitude, and a reasonable guess is that so did the number of scientists.

I'd be really interested in reading more about this.

Comment author: [deleted] 17 September 2014 06:14:22PM *  2 points [-]

There's something wrong with the first link (I guess you typed the URL on a smartphone autocorrecting keyboard or similar).

EDIT: I think this is the correct link.

In response to comment by [deleted] on What are your contrarian views?
Comment author: whales 17 September 2014 06:27:23PM 2 points [-]

Yeah, that happened when I edited a different part from my phone. Thanks, fixed.

Comment author: VAuroch 17 September 2014 04:14:11AM *  3 points [-]

Like a few others, I agree with the first two but emphatically disagree with the last. And if you were right about it, I'd expect Ozy to have taken Scott to task about it, and him to have admitted to being somewhat wrong and updated on it.

EDIT: This has, in fact, happened.

Comment author: whales 17 September 2014 09:20:52AM *  5 points [-]

See this tumblr post for an example of Ozy expressing dissatisfaction with Scott's lack of charity in his analysis of SJ (specifically in the "Words, Words, Words" post). My impression is that this is a fairly regular occurrence.

You might be right about him not having updated. If anything it seems that his updates on the earlier superweapons discussion have been reverted. I'm not sure I've seen anything comparably charitable from him on the subject since. I don't follow his thoughts on feminism particularly closely, so I could easily be wrong (and would be glad to find I'm wrong here).

Comment author: Thecommexokid 23 August 2014 06:02:04PM 1 point [-]

At first, I didn't seem to exercise this skill on days where I wasn't doing cognitively demanding work, or when most of my work was not in an academic context (typically weekends). Over time, I began doing so more, although still less than on demanding academic days.

I know quite a bit of time has passed since you posted this, but do you recall any specific instances of non-cognitively-demanding weekend-type confusions you could share?

Comment author: whales 24 August 2014 07:55:27PM 0 points [-]

I wrote down a handful as I was doing this, but not all of them. There were a couple about navigation (where rather than say "well, I don't know where I am, I'll just trust the group" I figured out how I was confused about different positions of landmarks). I avoided overbaking my cookies when the recipe had the wrong time written down. Analytics for a site I run pointed to a recent change causing problems for some people, and I saw the (slight) pattern right away but ignored it until it got caught on my confusion hook. It's also a nice hook for asking questions in casual conversations. People are happy to explain why they like author X but not the superficially similar author Y I've heard them complain about before, for example.

Comment author: BrienneYudkowsky 22 August 2014 06:04:51PM 2 points [-]

This is one of the most valuable things I've read in months. Thank you!

Comment author: whales 23 August 2014 03:55:45PM 1 point [-]

Thanks, I'm glad you liked it!

Did someone link this recently? It seems to have gotten a new burst of votes.

Comment author: Metus 11 August 2014 11:24:02AM *  8 points [-]

In the last open thread Lumifer linked to a list by the American Statistical Association with points that need to be understood to be considered statistically literate. In the same open thread in another comment sixes_and_sevens asked for statements we know are true but the average lay person gets wrong. As response he mainly got examples from the natural sciences and mathematics. Which makes me wonder, can we make a general test of education in all of these fields of knowledge that can be automatically graded? This test would serve as a benchmark for traditional educational methods and for autodidacts checking themselves.

I imagine having simple calculations for some things and multiple-choice tests for other scenarios where intuition suffices.

Edit: Please don't just upvote, try to point to similar ideas in your respective field or critique the idea.

Comment author: whales 11 August 2014 06:20:48PM *  2 points [-]

There are concept inventories in a lot of fields, but these vary in quality and usefulness. The most well-known of these is the Force Concept Inventory for first semester mechanics, which basically aims to test how Aristotelian/Newtonian a student's thinking is. Any physicist can point out a dozen problems with it, but it seems to very roughly measure what it claims to measure.

Russ Roberts (host of the podcast EconTalk) likes to talk about the "economic way of thinking" and has written and gathered links about ten key ideas like incentives, markets, externalities, etc. But he's relatively libertarian, so the ideas he chose and his exposition will probably not provide a very complete picture. Anyway, EconTalk has started asking discussion questions after each podcast, some of which aim to test basic understanding along these lines.

View more: Next