Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
In case you haven't seen it yet, Quora hosted an interesting discussion of different strategies for containing / mitigating AI risk, boosted by a $500 prize for the best answer. It attracted sci-fi author David Brin, U. Michigan professor Igor Markov, and several people with PhDs in machine learning, neuroscience, or artificial intelligence. Most people from LessWrong will disagree with most of the answers, but I think the article is useful as a quick overview of the variety of opinions that ordinary smart people have about AI risk.
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
In the Mary's room thought experiment, Mary is a brilliant scientist in a black-and-white room who has never seen any colour. She can investigate the outside world through a black-and-white television, and has piles of textbooks on physics, optics, the eye, and the brain (and everything else of relevance to her condition). Through this she knows everything intellectually there is to know about colours and how humans react to them, but she hasn't seen any colours at all.
After that, when she steps out of the room and sees red (or blue), does she learn anything? It seems that she does. Even if she doesn't technically learn something, she experiences things she hadn't ever before, and her brain certainly changes in new ways.
The argument was intended as a defence of qualia against certain forms of materialism. It's interesting, and I don't intent to solve it fully here. But just like I extended Searle's Chinese room argument from the perspective of an AI, it seems this argument can also be considered from an AI's perspective.
Consider a RL agent with a reward channel, but which currently receives nothing from that channel. The agent can know everything there is to know about itself and the world. It can know about all sorts of other RL agents, and their reward channels. It can observe them getting their own rewards. Maybe it could even interrupt or increase their rewards. But, all this knowledge will not get it any reward. As long as its own channel doesn't send it the signal, knowledge of other agents rewards - even of identical agents getting rewards - does not give this agent any reward. Ceci n'est pas une récompense.
This seems to mirror Mary's situation quite well - knowing everything about the world is no substitute from actually getting the reward/seeing red. Now, a RL's agent reward seems closer to pleasure than qualia - this would correspond to a Mary brought up in a puritanical, pleasure-hating environment.
Closer to the original experiment, we could imagine the AI is programmed to enter into certain specific subroutines, when presented with certain stimuli. The only way for the AI to start these subroutines, is if the stimuli is presented to them. Then, upon seeing red, the AI enters a completely new mental state, with new subroutines. The AI could know everything about its programming, and about the stimulus, and, intellectually, what would change about itself if it saw red. But until it did, it would not enter that mental state.
If we use ⬜ to (informally) denote "knowing all about", then ⬜(X→Y) does not imply Y. Here X and Y could be "seeing red" and "the mental experience of seeing red". I could have simplified that by saying that ⬜Y does not imply Y. Knowing about a mental state, even perfectly, does not put you in that mental state.
This closely resembles the original Mary's room experiment. And it seems that if anyone insists that certain features are necessary to the intuition behind Mary's room, then these features could be added to this model as well.
Mary's room is fascinating, but it doesn't seem to be talking about humans exclusively, or even about conscious entities.
I'm about a third of the way through Stanovich's Decision Making and Rationality in the Modern World. Basically, I've gotten through some of the more basic axioms of decision theory (Dominance, Transitivity, etc).
As I went through the material, I noted that there were a lot of these:
Decision 5. Which of the following options do you prefer (choose one)?
A. A sure gain of $240
B. 25% chance to gain $1,000 and 75% chance to gain nothing
The text goes on to show how most people tend to make irrational choices when confronted with decisions like this; most strikingly was how often irrelevant contexts and framing effected people's decisions.
But I understand the decision theory bit; my question is a little more complicated.
When I was choosing these options myself, I did what I've been taught by the rationalist community to do in situations where I am given nice, concrete numbers: I shut up and I multiplied, and at each decision choose the option with the highest expected utility.
Granted, I equated dollars to utility, which Stanovich does mention that humans don't do well (see Prospect Theory).
In the above decision, option B clearly has the higher expected utility, so I chose it. But there was still a nagging doubt in my mind, some part of me that thought, if I was really given this option, in real life, I'd choose A.
So I asked myself: why would I choose A? Is this an emotion that isn't well-calibrated? Am I being risk-averse for gains but risk-taking for losses?
What exactly is going on?
And then I remembered the Prisoner's Dilemma.
A Tangent That Led Me to an Idea
Now, I'll assume that anyone reading this has a basic understanding of the concept, so I'll get straight to the point.
In classical decision theory, the choice to defect (rat the other guy out) is strictly superior to the choice to cooperate (keep your mouth shut). No matter what your partner in crime does, you get a better deal if you defect.
Now, I haven't studied the higher branches of decision theory yet (I have a feeling that Eliezer, for example, would find a way to cooperate and make his partner in crime cooperate as well; after all, rationalists should win.)
Where I've seen the Prisoner's Dilemma resolved is, oddly enough, in Dawkin's The Selfish Gene, which is where I was first introduced to the idea of an Iterated Prisoner's Dilemma.
The interesting idea here is that, if you know you'll be in the Prisoner's Dilemma with the same person multiple times, certain kinds of strategies become available that weren't possible in a single instance of the Dilemma. Partners in crime can be punished for defecting by future defections on your own behalf.
The key idea here is that I might have a different response to the gamble if I knew I could take it again.
Let's put on our probability hats and actually crunch the numbers:
Format - Probability: $Amount of Money | Probability: $Amount of Money
Assuming one picks A over and over again, or B over and over again.
1 $240-----------------------------------------------------------------------------------------1/4: $1,000 | 3/4: $0
2 $480----------------------------------------------------------------------1/16: $2,000 | 6/16: $1,000 | 9/16: $0
3 $720---------------------------------------------------1/64: $3,000 | 9/64: $2,000 | 27/64: $1,000 | 27/64: $0
4 $960------------------------1/256: $4,000 | 12/256: $3,000 | 54/256: $2,000 | 108/256: $1,000 | 81/256: $0
5 $1,200----1/1024: $5,000 | 15/1024: $4,000 | 90/256: $3,000 | 270/1024: $2,000 | 405/1024: $1,000 | 243/1024: $0
And so on. (If I've ma de a mistake, please let me know.)
It is certainly true that, in terms of expected money, option B outperforms option A no matter how many times one takes the gamble, but instead, let's think in terms of anticipated experience - what we actually expect to happen should we take each bet.
The first time we take option B, we note that there is a 75% chance that we walk away disappointed. That is, if one person chooses option A, and four people choose option B, on average three out of those four people will underperform the person who chose option A. And it probably won't come as much consolation to the three losers that the winner won significantly bigger than the person who chose A.
And since nothing unusual ever happens, we should think that, on average, having taken option B, we'd wind up underperforming option A.
Now let's look at further iterations. In the second iteration, we're more likely than not to have nothing having taken option B twice than we are to have anything.
In the third iteration, there's about a 57.8% chance that we'll have outperformed the person who chose option A the whole time, and a 42.2% chance that we'll have nothing.
In the fourth iteration, there's a 73.8% chance that we'll have matched or done worse than the person who has chose option A four times (I'm rounding a bit, $1,000 isn't that much better than $960).
In the fifth iteration, the above percentage drops to 63.3%.
Now, without doing a longer analysis, I can tell that option B will eventually win. That was obvious from the beginning.
But there's still a better than even chance you'll wind up with less, picking option B, than by picking option A. At least for the first five times you take the gamble.
If we act to maximize expected utility, we should choose option B, at least so long as I hold that dollars=utility. And yet it seems that one would have to take option B a fair number of times before it becomes likely that any given person, taking the iterated gamble, will outperform a different person repeatedly taking option A.
In other words, of the 1025 people taking the iterated gamble:
we expect 1 to walk away with $1,200 (from taking option A five times),
we expect 376 to walk away with more than $1,200, casting smug glances at the scaredy-cat who took option A the whole time,
and we expect 648 to walk away muttering to themselves about how the whole thing was rigged, casting dirty glances at the other 377 people.
After all the calculations, I still think that, if this gamble was really offered to me, I'd take option A, unless I knew for a fact that I could retake the gamble quite a few times. How do I interpret this in terms of expected utility?
Am I not really treating dollars as equal to utility, and discounting the marginal utility of the additional thousands of dollars that the 376 win?
What mistakes am I making?
Also, a quick trip to google confirms my intuition that there is plenty of work on iterated decisions; does anyone know a good primer on them?
I'd like to leave you with this:
If you were actually offered this gamble in real life, which option would you take?
Irregularly scheduled Less Wrong meetups are taking place in:
- Cologne meetup May: 21 May 2016 05:00PM
- European Community Weekend: 02 September 2016 03:35PM
- San Antonio Meetup - Double Crux Game: 22 May 2016 02:00PM
- Sao Paulo - Meetup de maio: 21 May 2016 02:00PM
The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:
- [Moscow] Games in Kocherga club: FallacyMania, Zendo, Tower of Chaos, Training game: 25 May 2016 07:40PM
- Sydney Rationality Dojo - June: 05 June 2016 04:00PM
- Sydney Rationality Dojo - July: 03 July 2016 04:00PM
Locations with regularly scheduled meetups: Austin, Berlin, Boston, Brussels, Buffalo, Canberra, Columbus, Denver, Kraków, London, Madison WI, Melbourne, Moscow, New Hampshire, New York, Philadelphia, Research Triangle NC, San Francisco Bay Area, Seattle, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers and a Slack channel for daily discussion and online meetups on Sunday night US time.
Related to: 37 ways that words can be wrong.
Consider the following sentence (from Internet; but I have heard it before): 'Lichens consist of fungi and algae, but they are more than the sum of their constituents.'
It is supposed to say something like 'the fungus and the alga don't just live very close to each other, they influence each other's habitat(s) and can be considered, for most purposes, to form a physiologically integrated body'. It never actually says that, although people gradually come to this conclusion if they look at illustrations or read long enough. And I don't think the phrase is sufficiently catchy to explain its popularity; rather, that it is a tenuous introduction to the much-later-explained term 'synergism'. A noble (in principle) preparation of the mind.
Yet how is a lichen 'more than the sum of fungus and alga'? I suppose one could speak of a 'sum' if the lichen was pulverized and consumed as medicine, and then its effect on the patient was compared to that of the mixture of similarly treated fungus (grown how exactly?) and alga (same here). It doesn't exist in the wild. It shouldn't exist in the literature.
A child is not bothered by its lack of sense. When she encounters 'synergism', she'll remember having been told of something like it, and be reassured by the unity of science. It flies under the radar of 'established biological myths', because it doesn't have enough meaning to be one.
I picked a dictionary of zoological terms and tried to recall how the notions were put before me for the first time, but of course I failed. (I guess it should be high-level things, like 'variability', or colloquial expressions - 'bold as a lion', etc., that distort and get distorted the most.) They seem to 'have always been there'. Then, I looked at the definitions and tried to imagine them misapplied (intuitively, a simpler task). No luck. Yet someday, something other truly unknown to me will appear familiar and simple.
We can weed out improper concepts from textbooks, but there are too many sources which are written far more engagingly and 'clearly', and which propagate not even wrong ideas. Explained like I'm five.
And never named.