A toy model of the treacherous turn
Jaan Tallinn has suggested creating a toy model of the various common AI arguments, so that they can be analysed without loaded concepts like "autonomy", "consciousness", or "intentionality". Here a simple attempt for the "treacherous turn"; posted here for comments and suggestions.
Meet agent L. This agent is a reinforcement-based agent, rewarded/motivated by hearts (and some small time penalty each turn it doesn't get a heart):

When the uncertainty about the model is higher than the uncertainty in the model
Most models attempting to estimate or predict some elements of the world, will come with their own estimates of uncertainty. It could be the Standard Model of physics predicting the mass of the Z boson as 91.1874 ± 0.0021 GeV, or the rather wider uncertainty ranges of economic predictions.
In many cases, though, the uncertainties in or about the model dwarf the estimated uncertainty in the model itself - especially for low probability events. This is a problem, because people working with models often try to use the in-model uncertainty and adjust it to get an estimate of the true uncertainty. They often realise the model is unreliable, but don't have a better one, and they have a measure of uncertainty already, so surely doubling and tripling this should do the trick? Surely...
The following three cases are going to be my go-to examples for showing what a mistake this can be; they cover three situations: extreme error, being in the domain of a hard science, and extreme negative impact.
Mapping our maps: types of knowledge
Related to: Map and Territory.
This post is based on ideas that came to be during my second-year nursing Research Methods class. The fact that I did terribly in this class maybe indicates that I shouldn’t be trying to explain it to anyone, but it also has a lot to do with the way I zoned out for most of every class, mulling over the material that would later become this post.
Types of map: the level of abstraction, or ‘how many steps away from reality’?
Probably in the third or fourth Research Methods class, we learned that any given research proposal could be divided into one of the following four categories:
- Descriptive
- Exploratory
- Explanatory
- Predictive
Updating, part 1: When can you change your mind? The binary model
I was recently disturbed by my perception that, despite years of studying and debating probability problems, the LessWrong community as a whole has not markedly improved its ability to get the right answer on them.
I had expected that people would read posts and comments by other people, and take special note of comments by people who had a prior history of being right, and thereby improve their own accuracy.
But can that possibly work? How can someone who isn't already highly-accurate, identify other people who are highly accurate?
Aumann's agreement theorem (allegedly) says that Bayesians with the same priors agree. But it doesn't say that doing so helps. Under what circumstances does revising your opinions, by updating in response to people you consider reliable, actually improve your accuracy?
To find out, I built a model of updating in response to the opinions of others. It did, eventually, show that Bayesians improve their collective opinions by updating in response to the opinions of other Bayesians. But this turns out not to depend on them satisfying the conditions of Aumann's theorem, or on doing Bayesian updating. It depends only on a very simple condition, established at the start of the simulation. Can you guess what it is?
I'll write another post describing and explaining the results if this post receives a karma score over 10.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)