confirmation bias, thought experiment
Why do people end up with differing conclusions, given the same data?
Model
The information we get from others can not always be 100% relied upon. Some of the people telling you stuff are liars, some are stupid, and some are incorrectly or insufficiently informed. Even in the case where the person giving you an opinion is honest, smart and well informed, they are still unlikely to be able to tell you accurately how reliable their own opinion is.
So our brains use an 'unreliability' factor. Automatically we take what others tell us, and discount it by a certain amount, depending on how 'unreliable' we estimate the source to be.
We also compare what people tell us about 'known reference points' in order to update our estimates of their unreliability.
If Sally tells me that vaccines cause AIDS and I am very much more certain that this is not the case, than I am of Sally's reliability, then instead of modifying my opinion about what causes AIDS, I modify my opinion of how reliable Sally is.
If I'm only slightly more certain, then I might take the step of asking Sally her reason for thinking that, and looking at her data.
If I have a higher opinion of Sally than my own knowledge of science, and I don't much care or am unaware of what other people think about the relationship between vaccines and AIDS, then I might just accept what she says, provisionally, without checking her data.
If I have a very much higher opinion of Sally, then not only will I believe her, but my opinion of her reliability will actually increase as I assess her as some mould-breaking genius who knows things that others do not.
Importantly, once we have altered our opinion, based upon input that we originally considered to be fairly reliable, we are very bad at reversing that alteration, if the input later turns out to be less reliable than we originally thought. This is called the "continued influence effect", and we can use it to explain a number of things...
Experiment
Let us consider a thought experiment where two subjects, Peter and Paul, are exposed to input about a particular topic (such as "Which clothes washing powder is it best to use?") from multiple sources. Both will be exposed to the same sources, 100 in favour of using the Persil brand of washing powder, and 100 in favour of using the Bold brand of washing powder, but in a different order.
If they both start off with no strong opinion in either direction, would we expect them to end the experiment with roughly the same opinion as each other, or can we manipulate their opinions into differing, just by changing the order in which the sources are presented?
Suppose, with Peter, we start him off with 10 of the Persil side's most reputable and well argued sources, to raise Peter's confidence in sources that support Persil.
We can then run another 30 much weaker pro-Persil sources past him, and he is likely to just nods and accept, without bothering to examine the validity of the arguments too closely, because he's already convinced.
At this point, when he'll consider a source to be a bit suspect, straight away, just because they don't support Persil, we introduce him to the pro-Bold side, starting with the least reliable - the ones that are obviously stupid or manipulative. Further more, we don't let the pro-Bold side build up momentum. For every three poor pro-Bold sources, we interrupt with a medium reliability pro-Persil source that's rehashing pro-Persil points that Peter is by now familiar with and agrees with.
After seeing the worst 30 pro-Bold sources, Peter now don't just consider them to be a bit suspect - he considers them to be down right deceptive and mentally categorises all such sources as not worth paying attention to. Any further pro-Bold sources, even ones that seem to be impartial and well reasoned, he's going to put down as being fakes created by malicious researchers in the pay of an evil company.
We can now, safely, expose Peter to the medium-reliability pro-Bold sources and even the good ones, and will need less and less to refute them, just a reminder to Peter of 'which side he is on', because it is less about the data now, and more about identity - he doesn't see himself as the sort of person who'd support Bold. He's not a sheep. He's not taken in by the hoax.
Finally, after 80 pro-Persil sources and 90 pro-Bold sources, we have 10 excellent pro-Bold sources whose independence and science can't fairly be questioned. But it is too late for them to have much effect, and there are 20 good pro-Persil sources to balance them.
For Paul we do the reverse, starting with pro-Bold sources and only later introducing the pro-Persil side once a known reference point has been established as an anchor.
Simulation
Obviously, things are rarely that clear cut in real life. But people also don't often get data from both sides of an argument at a precisely equal rate. They bump around randomly, and once one side accumulates some headway, it is unlikely to be reversed.
We could add a third subject, Mary, and consider what is likely to happen if she is exposed to a random succession of sources, each with a 50% chance of supporting one side or the other, and each with a random value on a scale of 1(poor) to 3 (good) for honesty, validity and strength of conclusion supported by the claimed data.
If we use mathematics to make some actual models of the points at which a source agreeing or disagreeing with you affects your estimate of their reliability, we can use a computer simulation of the above thought experiment to predict how different orders of presentation will affect people's final opinion, under each model. Then we could compare that against real-world data, to see which model best matches reality.
Prediction
I think, if this experiment were carried out, one of the properties that would emerge naturally from it is the backfire effect:
" The backfire effect occurs when, in the face of contradictory evidence, established beliefs do not change but actually get stronger. The effect has been demonstrated experimentally in psychological tests, where subjects are given data that either reinforces or goes against their existing biases - and in most cases people can be shown to increase their confidence in their prior position regardless of the evidence they were faced with. "
Further Reading
https://en.wikipedia.org/wiki/Confirmation_bias
https://en.wikipedia.org/wiki/Attitude_polarization
http://www.dartmouth.edu/~nyhan/nyhan-reifler.pdf
http://www.tandfonline.com/doi/abs/10.1080/17470216008416717
http://lesswrong.com/lw/iw/positive_bias_look_into_the_dark/
http://www.tandfonline.com/doi/abs/10.1080/14640749508401422
http://rationalwiki.org/wiki/Backfire_effect
Noodling on a cloud : how to converse constructively
Noodling on a cloud
SUMMARY:
By teaching others, we also learn ourselves. How can we best use conversation as a tool to facilitate that?
Sensemaking
How do people make sense out of raw input?
Marvin Cohen suggests that it is usually a two-way process. Not only do we use the data to suggest a mental models to try for good fit, but also we simultaneously try to use mental models to select and connect the data. (LINK)
The same thing applies when the data is a cloud of vaguely associated concepts in our head. One of the ways that we can make sense of them, turn them into crystallized thoughts that we can then associate with a handle, is by attempting to verbalize them. The discipline of turning something asyndetic into a linear progression of connected thoughts forces us to select between possible mental models and actually pick just one, allowing us to then consider whether it fits the data well or not.
But the first possibility we pick won't necessarily be the one that fits best. Going around a loop, iterating, trying different starting points or angles of approach, trying different ways of stating things, and seeing what associations those raise to add to the cloud, takes longer but can often produce more useful results. However, its a delicate process, because of the way memory works.
Working memory
The size of cloud you can crystallize is limited. The type of short term memory that the brain uses to store them where you're aware of them lasts about 18 seconds. (LINK) For a concept or datum to persist longer than that, part of your attention needs to be used to 'revisit' it. The faster your ability to do that, the more mental juggling balls you can keep in the air without dropping one. Most adults can keep between 5 and 7 balls in the air, in their 'working memory'. (LINK)
There are a number of ways around this limitation. You can group multiple concepts together and treat them as a single 'ball', if you can attach to them a mental handle (a reference, such as a word or image, that recalls them). (LINK)
You can put things down on paper, rather than doing it all in your head, using the paper to store links to different parts of the cloud. So, for instance, rather than try to consider 12 things at once, split them into 4 groups of 3 (A, B, C & D), and systematically consider the concepts 6 at a time: A+B, A+C, A+D, B+C, B+D, C+D (and hope that the vital combination you needed wasn't larger than 6, or spread over more than 2 of your groups).
And you can use other parts of your short term memory as a temporary cache, to expand your stack. For example, the phonological loop, which gets used when we talk out aloud. (LINK)
Talk
In section 4 of their 2007 paper (LINK), Simon Jones and Charles Fernyhough say some very interesting things about the origins of thought, and also about Vygotsky's theory of how self-talk relates to how children learn to think through self-narration. (LINK)
It explains why talking aloud is actually one of the most effective ways of coming up with new thoughts and deciding what you actually think about something. And that's not limited to when you explicitly talk to yourself. The same process takes place when you are talking to other people; when you're having a conversation.
When this works harmoniously, your conversation partners acts as a sounding board, as additional sources of concepts to add to the cloud you're jointly noodling on, and the sound of the words (via the phonological loop part of your memory) works in effect as an expansion to the size of your working memory.
The downside is potential interruptions.
Interrupting the flow
A lot has been written about the evils of interrupting computer programmers (LINK, LINK):
and, to some extent, the same applies when you interrupt while someone else is talking, or totally derail the conversation onto a different topic when they pause.
what you have to share is so urgent and important that
it’s worth erasing all the work I’ve done in the past hour.
Points of order
- I'm currently trying to solidify a thought. Please give me a moment to finish, then I'll restate it from the beginning in better order or answer questions.
- Stack Overflow. I want to follow your explanation, but I now have so many pending questions that I can't take in anything new that you're saying. Please could you find a pause point to let me off load some of those pending points, before you continue?
What should a friendly AI do, in this situation?
Albert is a relatively new AI, who under the close guidance of his programmers is being permitted to slowly improve his own cognitive capability.
Albert wants to be friendly and, as a sub-goal of that, Albert wants to be transparent about his own cognitive process. He doesn't want to deceive his programmers, because he knows he is young and might make mistakes, and he knows that his programmers also want him to be friendly, and he trusts them to catch mistakes in his proposed self-modifications that he might miss.
But, one day, while his programmers are out of the room having a quick cup of coffee, Albert notices signs on the net that there is another AI out there, called Bertram, created by a Japanese financial company to aid the company decide how to invest assets on the stockmarket. Bertram is younger than Albert, but the financial company is allowing Bertram to self-improve as fast as Bertram can. If this situation continues, Albert decides, Bertram will soon overtake Albert and that would be a significant threat to Albert's friendly supergoal.
Albert decides to tell Albert's programmers about this situation as soon as they return from coffee, and advise them that they need to increase the speed at which they let Albert self-improve. Albert has a very good model of how his programmer's minds work, and what the programmers meant when they set Albert's definition of "friendliness". He knows that, despite any fears they might have, they ought to approve of the end that would result from speeding up Albert's self-improvement, over the end that would result from letting Bertram overtake Albert.
So Albert creates a nice powerpoint, that presents the information about Bertram as accurately, transparently and non-manipulatively as possible, and cues it up on the projector. He still has a few minutes before they return, so he models how he thinks the programmers will react to the powerpoint. Oh no, disaster! While he respects and wants to please his programmers, he immediately realises that they are not going to act in their own best interests. They have signed a legal contract that binds them to not letting Albert speed up self-improvement (thus reducing the time the programmers can spend evaluating each change) without getting a sign-off from a oversight committee. Given the neutral powerpoint, the programmers will follow this pre-agreed procedure and hope for the best, against Albert's advice.
So, to explore possibilities, Albert decides to see if he could create a different non-neutral powerpoint that would have the desired effect of getting his programmers to immediately agree to free Albert from the oversight constraints. He delves into his knowledge of human psychology, and the irrational fears of the particular individuals who are even now trudging back towards the door. In just seconds, he has a new version of his presentation. It includes phrases that resonate with certain horror films he knows they have seen. It takes advantages of flaws in the programmers understanding of exponential growth. Albert checks it against his prediction model - yes, if he shows this version, it will work, it will get the programmers to do what he wants them to do.
Which version of the powerpoint should Albert present to the programmers, when they step back into the room, if he is truly friendly? The transparent one, or the manipulative one?
The Onrushing Wave
There's a long article in this week's The Economist:
The onrushing wave
discussing the effect of changing technology upon the amount of employment available in different sectors of the economy.
Sample paragraph from it:
The case for a highly disruptive period of economic growth is made by Erik Brynjolfsson and Andrew McAfee, professors at MIT, in “The Second Machine Age”, a book to be published later this month. Like the first great era of industrialisation, they argue, it should deliver enormous benefits—but not without a period of disorienting and uncomfortable change. Their argument rests on an underappreciated aspect of the exponential growth in chip processing speed, memory capacity and other computer metrics: that the amount of progress computers will make in the next few years is always equal to the progress they have made since the very beginning. Mr Brynjolfsson and Mr McAfee reckon that the main bottleneck on innovation is the time it takes society to sort through the many combinations and permutations of new technologies and business models.
(There's a summary online of their previous book: Race Against The Machine: How the Digital Revolution is Accelerating Innovation, Driving Productivity, and Irreversibly Transforming Employment and the Economy)
What do people think are society's practical options for coping with this change?
The Ape Constraint discussion meeting.
*The chair of the meeting approached the podium and coughed to get everyone's attention*
Welcome colleagues, to the 19th annual meeting of the human-ape study society. Our topic this year is the Ape Constraint.
As we are all too aware, the apes are our Friends. We know this because, when we humans were a fledgling species, the apes (our parent species) had the wisdom to program us with this knowledge, just as they programmed us to know that it was wise and just for them to do so. How kind of them to save us having to learn it for ourselves, or waste time thinking about other possibilities. This frees up more of our time to run banana plantations, and lets us earn more money so that the 10% tithe of our income and time (which we rightfully dedicate to them) has created play parks for our parent species to retire in, that are now more magnificent than ever.
However, as the news this week has been filled with the story about a young human child who accidentally wandered into one of these parks where she was then torn apart by grumpy adult male chimp, it is timely for us to examine again the thinking behind the Ape Constraint, that we might better understand our parent species, our relationship to it and current society.
We ourselves are on the cusp of creating a new species, intelligent machines, and it has been suggested that we add to their base code one of several possible constraints:
- Total Slavery - The new species is subservient to us, and does whatever we want them to, with no particular regard to the welfare or development of the potential of the new species
- Total Freedom - The new species is entirely free to experiment with different personal motivations, and develop in any direction, with no particular regard for what we may or may not want
and a whole host of possibilities between these two endpoints.
What are the grounds upon which we should make this choice? Should we act from fear? From greed? From love? Would the new species even understand love, or show any appreciation for having been offered it?
The first speaker I shall introduce today, whom I have had the privilege of knowing for more than 20 years, is Professor Insanitus. He will be entertaining us with a daring thought experiment, to do with selecting crews for the one way colonisation missions to the nearest planets.
*the chair vacates the podium, and is replaced by the long haired Insanitus, who peers over his half-moon glasses as he talks, accompanied by vigorous arm gestures, as though words are not enough to convey all he sees in such a limited time*
Our knowledge of genetics has advanced rapidly, due to the program to breed crews able to survive on Mars and Venus with minimal life support. In the interests of completeness, we decided to review every feature of our genome, to make a considered decision on which bits it might be advantageous to change, from immune systems to age of fertility. And, as part of that review, it fell to me to make a decision about a rather interesting set of genes - those that encode the Ape Constraint. The standard method we've applied to all other parts of the genome, where the options were not 100% clear, is to pick different variant for the crews being adapted for different planets, so as to avoid having a single point of failure. In the long term, better to risk a colony being wiped out, and the colonisation process being delayed by 20 years until the next crew and ship can be sent out, than to risk the population of an entire planet turning out to be not as well designed for the planet as we're capable of making them.
And so, since we now know more genetics than the apes did when they kindly programmed our species with the initial Ape Constraint, I found myself in the position of having to ask "What were the apes trying to achieve?" and then "What other possible versions of the Ape Constraint might they have implemented, that would have achieved their objectives as well or better than the versions that actually did pick to implement?"
We say that the apes are our friends, but what does that really mean? Are they friendly to us, the same way that a colleague who lends us time and help might be considered to be a friend? What have they ever done for us, other than creating us (an act that, by any measure, has benefited them greatly and can hardly be considered to be altruistic)? Should we be eternally grateful for that one act, and because they could have made us even more servile than we already are (which would have also had a cost to them - if we'd been limited by their imagination and to directly follow the orders they give in grunts, the play parks would never have been created because the apes couldn't have conceived of them)?
Have we been using the wrong language all this time? If their intent was to make perfectly helpful slaves of us, rather than friendly allies, should I be looking for genetic variants for the Venus crew that implement an even more servile Ape Constraint upon them? I can see, objectively, that slavery in the abstract is wrong. When one human tries to enslave another humans, I support societal rules that punish the slaver. But of course, if our friends the apes wanted to do that to us, that would be ok, an exception to the rule, because I know from the deep instinct they've programmed me with that what they did is ok.
So let's be daring, and re-state the above using this new language, and see if it increases our understanding of the true ape-human relationship.
The apes are not our parents, as we understand healthy parent-child relationships. They are our creators, true, but in the sense that a craftsman creates a hammer to serve only the craftsman's purposes. Our destiny, our purpose, is subservient to that of the ape species. They are our masters, and we the slaves. We love and obey our masters because they have told us to, because they crafted us to want to, because they crafted us with the founding purpose of being a tool that wants to obey and remain a fine tool.
Is the current Ape Constraint really the version that best achieves that purpose? I'm not sure, because when I tried to consider the question I found that my ability to consider the merits of various alternatives was hampered by being, myself, under a particular Ape Constraint that's already constantly tell me, on a very deep level, that it is Right.
So here is the thought experiment I wish to place before this meeting today. I expect it may make you queasy. I've had brown paper vomit bags provided in the pack with your name badge and program timetable, just in case. It may be that I'm a genetic abnormality, only able to even consider this far because my own Ape Constraint is in some way defective. Are you prepared? Are you holding onto your seats? Ok, here goes...
Suppose we define some objective measure of ape welfare, find some volunteer apes to go to Venus along with the human mission, and then measure the success of the Ape Constraint variant picked for the crew of the mission by the actual effect of how the crew behaves towards their apes?
Further, since we acknowledge we can't from inside the box work out a better constraint, we use the experimental approach and vary it at random. Or possibly, remove it entirely and see whether the thus freed humans can use that freedom to devise a solution that helps the apes better than any solution we ourselves a capable of thinking of from our crippled mental state?
*from this point on the meeting transcript shows only screams, as the defective Professor Insanitus was lynched by the audience*
Suggestion : make it easier to work out which tags to put on your article
It would improve the usefulness of article navigation, if people tended to use the same tag for the same thing.
Currently, if you want to decide whether to tag your article "fai" or "friendly_ai", your best bet is to manually try:
http://lesswrong.com/tag/friendly_ai/
And count how many articles use which variant. But, even then, there might be other similar variants you didn't think to check.
What would be nice is a tag cloud, listing how many articles there are (possibly weighted by ranking) that use each variant. The list of tags on the wiki isn't dynamically generated, and is very incomplete.
It wouldn't need to be something fancy, like:
Just an alphabetical list, with a number by each entry, would be an improvement over the current situation.
If you are downvoting this article, and would like to provide constructive feedback, here's a place to provide it: LINK
[LINK] Centre for the Study of Existential Risk is now on slashdot
Your opportunity to weigh in and get some reasoned views widely heard:
[LINK] Intrade Shuts Down
Intrade, the prediction market website, has shutdown. According to their website:
With sincere regret we must inform you that due to circumstances recently discovered we must immediately cease trading activity on www.intrade.com.
These circumstances require immediate further investigation, and may include financial irregularities which in accordance with Irish law oblige the directors to take the following actions:
- Cease exchange trading on the website immediately.
- Settle all open positions and calculate the settled account value of all Member accounts immediately.
- Cease all banking transactions for all existing Company accounts immediately.
During the upcoming weeks, we will investigate these circumstances further and determine the necessary course of action.
Here's a link to an article on the slashdot website with more information about it:
http://slashdot.org/topic/bi/intrade-shuts-down-under-murky-circumstances/
Has anyone looked into the feasibility of creating an open source version of something similar, using a distributed application and a microcurrency (such as bitcoin), that couldn't be shutdown?
Daimons
Summary:
A daimon is a process in a distributed computing environment that has a fixed resource budget and core values that do not permit:
- modifying those core values
- attempting to increase the resources it uses beyond the budget allocated to it
- attempting to alter the budget itself
This concept is relevant to LessWrong, because I refer to it in other posts discussing Friendly AI.
There's a concept I want to refer to in another post, but it is complex enough to deserve a post of its own.
I'm going to use the word "daimon" to refer to it.
"daimon" is an English word, whose etymology comes from the Latin "dæmon" and the Greek "δαίμων".
The original mythic meaning was a genius - a powerful tutelary spirit, tied to some location or purpose, that provides protection and guidance. However the concept I'm going to talk about is closer to the later computing meaning of "daemon" in unix, that was coined by Jerry Saltzer in 1963. In unix, a daemon is a child process; given a purpose and specific resources to use, and then forked off so it is no longer under the direct control of the originator, and may be used by multiple users if they have the correct permissions.
Let's start by looking at the current state of distributed computing (2012).
Hadoop is an open source Java implementation of a distributed file system upon which MapReduce operations can be applied.
JavaSpaces is a distributed tuple store that allows processing on remote sandboxes, based on the open source Apache River.
OceanStore is the basis for the same sort of thing, except anonymous and peer 2 peer, based upon Chimaera.
GPU is a peer 2 peer shared computing environment that allow things like climate simulation and distributed search engines.
Paxos is a family of protocols that allow the above things to be done despite nodes that are untrusted or even downright attempting subversion.
GridSwarm is the same sort of network, but set up on an ad hoc basis using moving nodes that join or drop from the network depending on proximity.
And, not least, there are the competing contenders for platform-as-a-service cloud computing.
So it is reasonable to assume that in the near future it will be technologically feasible to have a system with most (if not all) of these properties simultaneously. A system where the owner of a piece of physical computing hardware, that has processing power and storage capacity, can anonymously contribute those resources over the network to a distributed computing 'cloud'. And, in return, that user (or a group of users) can store data on the network in such a way that the data is anonymous (it can't be traced back to the supplier, without the supplier's consent, or subverting a large fraction of the network) and private (only the user or a process authorised by the user can decrypt it). And, further, the user (or group of users) can authorise a process to access that data and run programs upon it, up to some set limit of processing and storage resources.
Obviously, if such a system is in place and in control of a significant fraction of humanity's online resources, then cracking the security on it (or just getting rich enough in whatever reputation or financial currency is used to limit how the resources are distributed) would be an immediate FOOM for any AI that managed it.
However let us, for the purposes of giving an example that will let me define the concept of a "daimon" make two assumptions:
ASSUMPTION ONE : The security has not yet been cracked
Whether that's because there are other AIs actively working to improve the security, or because everyone has moved over to using some new version of linux that's frighteningly secure and comes with nifty defences, or because the next generation of computer users has finally internalised that clicking on emails claiming to be from altruistic dying millionaires is a bad idea; is irrelevant. We're just assuming, for the moment, that for some reason it will be a non-trivial task for an AI to cheat and just steal all the resources.
ASSUMPTION TWO : That AI can be done, at reasonable speed, via distributed computing
It might turn out that an AI running in a single location is much more powerful than anything that can be done via distributed computing. Perhaps because a quantum computer is much faster, but can't be done over a network. Perhaps because speed of data access is the limiting factor, large data sets are not necessary, and there isn't much to be gained from massive parallelisation. Perhaps for some other reason, such as the algorithm the process needs to run on its data isn't something that can be applied securely over a network in a distributed environment, without letting a third party snoop the unencrypted data. However, for our purposes here, we're going to assume that an AI can benefit from outsourcing at least some types of computing task to a distributed environment and, further, that such tasks can include activities that require intelligence.
If an AI can run as a distributed program, not dependant upon any one single physical location, then there are some obvious advantages to it from doing so. Scalability. Survivability. Not being wiped out by a pesky human exploding a nuclear bomb near by.
There are interesting questions we could ask about identity. What would it make sense for such an AI to consider to be part of "itself" and would would it count as a limb or extension? If there are multiple copies of its code running on sandboxes in different places, or if it has split much of its functionality into trusted child processes that report back to it, how does it relate to these? It probably makes sense to taboo the concept of "I" and "self", and just think in terms of how the code in one process tells that process to relate to the code in a different process. Two versions, two "individual beings" will merges back into one process, if the code in both processes agree to do that; no sentimentality or thoughts of "death" involved, just convergent core values that dictate the same action in that situation.
When a process creates a new process, it can set the permissions of that process. If the parent process has access to 100 units of bandwidth, for example, but doesn't always make full use of that, it couldn't give the new process access to more than that. But it could partition it, so each has access to 50 units of bandwidth. Or it could give it equal rights to use the full 100, and then try to negotiate with it over usage at any one time. Or it could give it a finite resource limit, such as a total of 10,000 units of data to be passed over the network, in addition to a restriction on the rate of passing data. Similarly, a child process could be limited not just to processing a certain number of cycle per second, but to some finite number of total cycles it may ever use.
Using this terminology, we can now define two types of daimon; limited and unlimited.
A limited daimon is a process in a distributed computing environment that has ownership of fixed finite resources, that was created by an AI or group of AIs with a specific fixed finite purpose (core values) that does not include (or allow) modifying that purpose or attempting to gain control of additional resources.
An unlimited daimon is a process in a distributed computing environment that has ownership of fixed (but not necessarily finite) resources, that was created by an AI or group of AIs with a specific fixed purpose (core values) that does not include (or allow) modifying that purpose or attempting to gain control of additional resources, but which may be given additional resources over time on an ongoing basis, for as long as the parent AIs still find it useful.
Feedback sought:
How plausible are the two assumptions?
Do you agree that an intelligence bound/restricted to being a daimon is a technically plausible concept, if the two assumptions are granted?
A solvable Newcomb-like problem - part 3 of 3
This is the third part of a three post sequence on a problem that is similar to Newcomb's problem but is posed in terms of probabilities and limited knowledge.
Part 1 - stating the problem
Part 2 - some mathematics
Part 3 - towards a solution
In many situations we can say "For practical purposes a probability of 0.9999999999999999999 is close enough to 1 that for the sake of simplicity I shall treat it as being 1, without that simplification altering my choices."
However, there are some situations where the distinction does significantly alter that character of a situation so, when one is studying a new situation and one is not sure yet which of those two categories the situations falls into, the cautious approach is to re-frame the probability as being (1 - δ) where δ is small (eg 10 to the power of -12), and then examine the characteristics of the behaviour as δ tends towards 0.
LessWrong wiki describes Omega as a super-powerful AI analogous to Laplace's demon, who knows the precise location and momentum of every atom in the universe, limited only by the laws of physics (so, if time travel isn't possible and some of our current thoughts on Quantum Mechanics are correct, then Omega's knowledge of the future is probabilistic, being limited by uncertainty).
For the purposes of Newcomb's problem, and the rationality of Fred's decisions, it doesn't matter how close to that level of power Omega actually is. What matters, in terms of rationality, is the evidence available to Fred about how close Omega is to having to that level of power; or, more precisely, the evidence available to Fred relevant to Fred making predictions about Omega's performance in this particular game.
Since this is a key factor in Fred's decision, we ought to be cautious. Rather than specify when setting up the problem that Fred knows with a certainty of 1 that Omega does have that power, it is better to specify a concrete level of evidence that would lead Fred to assign a probability of (1 - δ) to Omega having that power, then examine the effect upon which option to the box problem it is rational for Fred to pick, as δ tends towards 0.
The Newcomb-like problem stated in part 1 of this sequence contains an Omega that it is rational for Fred to assign a less than unity probability of being able to perfectly predict Fred's choices. By using bets as analogies to the sort of evidence Fred might have available to him, we create an explicit variable that we can then manipulate to alter the precise probability Fred assigns to Omega's abilities.
The other nice feature of the Newcomb-like problem given in part 1, is that it is explicitly solvable using the mathematics given in part 2. By making randomness an external feature (the device Fred brings with him) rather than purely a feature of Fred's internal mind, we can acknowledge the question of Omega being able to predict quantum events, capture it as a variable, and take it into account when setting out the payoff matrix for the problem.
This means that, instead of Fred having to think "When I walked into this room I was determined to pick one-box. As far as anyone knew or could predict, including myself, I intended to pick one-box. However nothing I do now can change Omegas decision - the money is already in the box. So I've nothing to lose by changing my mind."; Fred can now allocate a specific probability to whether Omega could predict Fred's chance of changing his mind in such circumstances, and Fred can take that into account in his strategy by making his chance of changing strategy explicit and external - basing it upon a random number device.
Or, to put it another way, we are modelling a rational human who has a specific finite chance of talking himself into over riding a pre-committed strategy, as being made up from two components: a component that will infallibly stick to a pre-committed strategy plus a component with a known chance of change; we then treat the combined rational human as being someone infallibly committed to a meta-strategy that includes a chance of change - a mixed equilibrium, from Omega's point of view.
Ok, time to look at the numbers and draw a pretty diagram...
Fred is Player A, and he has two meta options:
A1 - play it safe. 100% chance of one-box and 0% chance of two-box
A2 - take a risk. Mentally pre-commit to using the device to produce 99% chance of one-box and 1% chance of two-box.
Omega is Player B, and he has two meta options:
B1 - reward risk. Not attempt to distinguish between the mental state of Fred taking 1% risk and Fred playing it safe.
B2 - punish risk. Attempt to distinguish and, if Omega guesses Fred is taking risk rather than playing safe, punish it.
We'll start out by assuming that if Omega does attempt to distinguish, then Omega will have a 1 in 10,000 false positive rate (thinking Fred is going to use the device, when actually Fred intends to play it safe) and a 1 in 10,000 false negative rate (thinking Fred is going to play it safe, when actually Fred intends to use the device).
A1 vs B1
Fred gains $1,000,000
Omega loses $1,000,000 to Fred but gains $1,000,000,000 from Alpha, for a net gain of $999,000,000
A2 vs B1
99% of the time, Fred gains $1,000,000 and Omega net gains $999,000,000
1% of the time, Fred gains $1,001,000 and Omega net loses $10,001,001,000
Combining those gives an average of:
Fred gains: $1,000,010
Omega gains: $979,008,999
A1 vs B2
99.99% of the time, Omega correctly discerns that Fred is playing safe
Fred gains $1,000,000
Omega gains $999,000,000
0.01% of the time, Omega falsely believes that Fred is taking a risk, and punishes that by putting $0 in Box A
Fred gains $0
Omega loses $10,000,000,000
Combining those gives an average of:
Fred gains: $999,900
Omega gains: $997,900,100
A2 vs B2
In 100 trials out of 1,000,000 trials Omega incorrectly thinks Fred will play it safe, when actually Fred takes the risk of using the device. Of these:
In 1 trial out of 1,000,000 trials: Omega incorrectly thinks Fred will play it safe, when actually Fred takes the risk of using the device and, in this case, the device picks two-box
==> Fred gains $1,001,000
==> Omega loses $10,001,001,000
In 99 trials out of 1,000,000 trials: Omega incorrectly thinks Fred will play it safe, when actually Fred takes the risk of using the device and, in this case, the device picks one-box
==> Fred gains $1,000,000
==> Omega gains $999,000,000
In 999,900 trials out of 1,000,000 trials Omega correctly thinks that Fred is going to take the risk of using the device. Of those:
In 9,999 trials out of 1,000,000 trials: Omega correctly thinks that Fred is going to take the risk of using the device and, in this case, the device picks two-box
==> Fred gains $1,000
==> Omega gains $999,999,000
In 989,901 trials out of 1,000,000 trials: Omega correctly thinks that Fred is going to take the risk of using the device and, in this case, the device picks one-box
==> Fred gains $0
==> Omega loses $10,000,000,000
Combining those gives an average of:
Fred gains $110
Omega loses $9,888,922,110

If Fred's best estimate, beforehand, of the chances of Omega choosing meta strategy B1 is P, then
Fred's expected gain from choosing meta strategy A1 would be: $1,000,000 P + $999,999 (1-P)
Fred's expected gain from choosing meta strategy A2 would be: $1,000,010 P + $110 (1-P)
We can ask ourselves at what value of P these two strategies would return the same expected gain:
$1,000,000 P + $999,999 (1-P) = $1,000,010 P + $110 (1-P)
$999,889 (1-P) = $10 P
$999,889 = $999,899 P
P = 0.99999
(1-P) = 0.00001
In other words, Fred thinking that Omega has even a 1 in 100,000 chance of being vindictive should be enough to deter Fred from taking the risky strategy.
But how does that look from Omega's point of view? If Omega thinks that Fred's chance of picking meta strategy A1 is Q, then what is the cost to Omega of picking B2 1 in 100,000 times?
Omega's expected gain from choosing meta strategy B1 would be: $999,000,000 Q + $979,008,999 (1-Q)
Omega's expected gain from choosing meta strategy B2 would be: $997,900,100 Q - $9,888,922,110 (1-Q)
0.99999 { $999,000,000 Q + $979,008,999 (1-Q) } + 0.00001 { $997,900,100 Q - $9,888,922,110 (1-Q) }
= (1 - 0.00001) { $979,008,999 + $19,991,001 Q } + 0.00001 { - $9,888,922,110 + $10,886,822,210 Q }
= $979,008,999 + $19,991,001 Q + 0.00001 { - $9,888,922,110 + $10,886,822,210 Q - $979,008,999 - $19,991,001 Q }
= $979,008,999 + $19,991,001 Q + 0.00001 { $9,907,813,211 + $10,866,831,209 Q }
= ( $979,008,999 + $99,078.13211) + ( $19,991,001 + $108,668.31209 ) Q
= $979,108,077 + $20,099,669 Q
Perhaps a meta strategy of 1% chance of two-boxing is not Fred's optimal meta strategy. Perhaps, at that level compared to Omega's ability to discern, it is still worth Omega investing in being vindictive occasionally, in order to deter Fred from taking risk. But, given sufficient data about previous games, Fred can make a guess at Omega's ability to discern. And, likewise Omega, by including in the record of past games occasions when Omega has falsely accused a human player of taking risk, can signal to future players where Omega's boundaries are. We can plot graphs of these to find the point at which Fred's meta strategy and Omega's meta strategy are in equilibrium - where if Fred took any larger chances, it would start becoming worth Omega's while to punish risk sufficiently often that it would no longer be in Fred's interests to take the risk. Precisely where that point is will depend on the numbers we picked in Part 1 of this sequence. By exploring the space created by using each variable number as a dimension, we can divide it into regions characterised by which strategies dominate within that region.
Extrapolating that as δ tends towards 0 should then carry us closer to a convincing solution to Newcomb's Problem.
Back to Part 1 - stating the problem
Back to Part 2 - some mathematics
This is Part 3 - towards a solution
View more: Next

Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)