That is among the reasons why I keep telling SIAI people to never reply to "AI risks are small" with "but a small probability is still worth addressing". Reason 2: It sounds like an attempt to shut down debate over probability. Reason 3: It sounds like the sort of thing people are more likely to say when defending a weak argument than a strong one. Listeners may instinctively recognize that as well.
Existential risks from AI are not under 5%. If anyone claims they are, that is, in emotional practice, an instant-win knockdown argument unless countered; it should be countered directly and aggressively, not weakly deflected.
To deal with people making that claim more easily, I'd like to see a post by you or someone else involved with SIAI summarizing the evidence for existential risks from AI, including the arguments for a hard takeoff and for why the AI's goals must hit a narrow target of Friendliness.
Existential risks from AI are not under 5%. If anyone claims they are, that is, in emotional practice, an instant-win knockdown argument unless countered; it should be countered directly and aggressively, not weakly deflected.
If you talk about the probability of a coin coming up heads, that is a question that well-informed people can be expected to agree on - since it can be experimentally determined.
However, the probability of civilisation being terminally obliterated isn't a probability that can easily be measured by us. Either all earth-sentients will be obliterated, or they won't be. However, we can't assign probabilities and check them aftterwards using frequency analysis. We can't have a betting market on the probability - since one side never pays out. From the perspective of a human, the probability is just not meaningful - there's no way for a human to measure it.
Possibly our distant descendants will figure out a reasonable estimate of what the chances of oblivion are (to a sufficiently well-informed agent) - e.g. by recreating the Earth many times and repeatedly running the experiment. I think that claims to know what the results of that experiment are would represent overconfidence. The fraction of Earths obliterated by disasters at the hands of machines could be very low, very high, or somewhere in between - we just don't know with very much confidence.
Well, and of course "we don't know with very much confidence" is a statement about the standard deviation, not about the mean. The standard deviation may impact a legal decision or human argument, but not the probability estimate itself.
The issue is not really about standard deviations, it is that probability is subjective. Humans are in a very bad position to determine this probability - we have little relevant experience, we can't usefully bet on it, and if there are differences or disagreement, it is very difficult to tell who is right. The "human probability" seems practically worthless - a reflection of our ignorance, not anything with much to do with the event. We need that probability to guide our actions - but we can hardly expect two people to agree on it.
The nearest think I can think of which is well defined is the probability that our descendants put on the event retrospectively. A probability estimate by wiser and better informed creatures of the chances of a world like our own making it. That estimate could - quite plausibly - be very low or very high.
Given a certain chunk of information, the evidence in it isn't subjective. Priors may be subjective, although there is a class of cases where they're objective too. "It is difficult to tell who is right" is an informative statement about the human decision process, but not really informative about probability.
Given a certain chunk of information, the evidence in it isn't subjective. Priors may be subjective, although there is a class of cases where they're objective too.
Well, two agents with the same priors can easily come to different conclusions as a result of observing the same evidence. Different cognitive limitations can result in that happening.
If the fear of thinking different really is stronger than the fear of death, is it possible that people just aren't that bothered about the end of the world, whether the probability is high or low?
On the emotional level, end of the world doesn't bother me that much because everyone dies with me. Furthermore, there is nobody left to mourn. Losing half the Earth's population, on the other hand, feels a lot scarier.
I enjoyed reading this comment rather a lot, since it allowed me to find myself in the not-too-common circumstance of noticing that I disagree with Eliezer to a significant (for me) degree.
Insofar as I'm able to put a number on my estimation of existential risks from AI, I also think that they're not under 5%. But I'm not really in the habit of getting into debates on this matter with anyone. The case that I make for myself (or others) for supporting SIAI is rather of the following kind:
If there are any noticeable existential risks, it's extremely important to spend resources on addressing them.
When looking at the various existential risks, most are somewhat simple to understand (at least after one has expended some effort on it), and are either already receiving a somewhat satisfactory amount of attention, or are likely to receive such attention before too long. (This doesn't necessarily mean that they would be of a small probability, but rather that what can be done already seems like it's mostly gonna get done.)
AI risks stand out as a special case, that seems really difficult to understand. There's an exceptionally high degree of uncertainty in estimates I'm able to make of their probability; in fact I find it very difficult to make any satisfactorily rigorous estimations at all. Such lack of understanding is a potentially very dangerous thing. I want to support more research into this.
The key point in my attitude that I would emphasize, is the interest in existential risks in general. I wouldn't try to seriously talk about AI risks to anyone who couldn't first be stimulated to find within themselves such a more general serious interest. And then, if people have that general interest, they're interested in going over the various existential risks there are, and it seems to me that sufficiently smart ones realize that the AI risks are a more difficult topic than others (at least after reading e.g. SIAI stuff; things might seem deceptively simple before one has a minimum threshold level of understanding).
So, my disagreement is that I indeed would to a degree avoid debates over probability. After a general interest in existential risks being present, I would instead of probabilities argue about the difficulty of the AI topic, and how such a lack of understanding is a very dangerous thing.
(I'm not really expressing a view on whether my approach is better or worse, though. Haven't reflected on the matter sufficiently to form a real opinion on that, though for the time being I do continue to cling to my view instead of what Eliezer advocated.)
If the fear of thinking different really is stronger than the fear of death, is it possible that people just aren't that bothered about the end of the world, whether the probability is high or low?
People shouldn't neglect small probabilities. The math works for small probabilities.
People should discount large utilities. Utilities are not additive. Nothing in economic theory suggests that they are additive.
It is well understood that the utility of two million dollars is not necessarily twice the utility of one million dollars. Yet it is taken here as axiomatic that the utility of two saved lives is twice the utility of one saved life. Two people tortured is taken to be twice the disutility of one person tortured. Why? As far as I can tell, the only answer given here is that our moral intuitions (against simple additivity) are wrong and will be corrected by sufficient reflection.
That is my take on the issue.
Does your utility function treat "a life saved by Perplexed" differently from just "a life"? I could understand an egoist who does not terminally value other lives at all (as opposed to instrumentally valuing saving lives as a way to obtain positive emotions or other benefits for oneself), but a utility function that treats "a life saved by me" differently from just "a life" seems counterintuitive. If the utility of a life saved by Perplexed not different from the utility of another life, then unless your utility function just happens to have a sharp bend at the current world population level, the utility of two saved lives can't be much less than twice the utility of one saved life. (See Eliezer's version of this argument, and more along this vein, here.)
Does your utility function treat "a life saved by Perplexed" differently from just "a life"?
I'm torn between responding with "Good question!" versus "What difference does it make?". Since I can't decide, I'll make both responses.
Good question! You are correct in surmising that the root justification for much of the value that I attach to other lives is essentially instrumental (via channels of reciprocity). But not all of the justification. Evolution has instilled in me the instinct of valuing the welfare (fitness) of kin at a significant fraction of the value of my own personal welfare. And then there are cases where kinship and reciprocity become connected in serial chains. So the answer is that I discount based on 'remoteness' where remoteness is a distance metric reflecting both genetic and social-interactive inverse connectedness.
What difference does it make? This is my utility function we are talking about, and it is only operational in deciding my own actions. So, even if my utility function attached huge value to lives saved by other people, it is not clear how this would change my behavior. The question seems to be whether people ought to have multiple utility functions - one for directing their own rational choices; the others for some other purpose.
I am currently reading Binmore's two-volume opus Game Theory and the Social Contract. I strongly recommend it to everyone here who is interested in decision theory and ethics. Although Binmore doesn't put it in these terms, his system does involve two different sets of values, which are used in two different ways. One is the set of values used in the Game of Life - a set of values which may be as egoistic as the agent wishes (or as altruistic). However, although the agent is conceptually free in the Game of Life, as a practical matter, he is coerced by everyone else to adhere to a Social Contract. Due to this coercion, he mostly behaves morally.
But how does the Social Contract arise? In Binmore's normative fiction, it arises by negotiated consensus of all agents. The negotiation takes place in a Rawlsian Original Position under a Veil of Ignorance. Since the agent-while-negotiating has different self-knowledge than does the agent-while-living, he manifests different values in the two situations - particularly with regard to utilities which accrue indexically. So, according to Binmore, even an agent who is inherently egoistic in the Game of Life will be egalitarian in the Game of Morals where the Social Contract is negotiated. Different values for a different purpose.
That is the concise summary of the ethical system that Binmore is constructing in the two volumes. But he does a marvelously thorough job of ground-clearing - addressing mistakes made by Kant, Rawls, Nozick, Parfit, and others regarding the Prisoner's Dilemma, Newcomb's 'paradox', whether it is rational to vote (probably wasted), etc. And in the course of doing so, he pretty thoroughly demolishes what I understand to be the orthodox position on these topics here at Less Wrong.
Really, really recommended.
Thanks for pointing me to Binmore's work. It does sound very interesting.
Evolution has instilled in me the instinct of valuing the welfare (fitness) of kin at a significant fraction of the value of my own personal welfare.
This is tangential to your point, but what would you say to a utilitarian who says:
"Evolution (memetic evolution, that is) has instilled in me the idea of that I should linearly value the welfare of others regardless of kinship, regardless of what instincts I got from my genes."
And in the course of doing so, he pretty thoroughly demolishes what I understand to be the orthodox position on these topics here at Less Wrong.
By "orthodox position" are you referring to TDT-related ideas? I've made the point several times that I doubt they apply to humans. (I don't vote myself, actually.) I don't see how Binmore could have "demolished" those ideas as they relate to AIs since he couldn't have learned about them when he wrote his books.
what would you say to a utilitarian who say: "Evolution (memetic evolution, that is) has instilled in me the idea of that I should linearly value the welfare of others regardless of kinship, regardless of what instincts I got from my genes."
There are two separate issues here. I assume that by "linearly" you are referring to the subject that started this conversation: my claim that utilities "are not additive", an idea also expressed as "diminishing returns", or diminishing marginal utility of additional people. I probably would not dispute the memetic evolution claim if it focused on "linearity".
The second issue is a kind of universality - all people valued equally regardless of kinship or close connectedness in a network of reciprocity. I would probably express skepticism at this claim. I would probe the claim to determine whether the selection operates at the level of the meme, the individual, or the society. And then I would ask how that meme contributes to its own propagation at that level.
By "orthodox position" are you referring to TDT-related ideas?
Mostly, I am referring to views expressed by EY in the sequences and frequently echoed by LW regulars in comments. Some of those ideas were apparently repeated in the TDT writeup (though I may be wrong about that - the write-up was pretty incoherent.)
I would probe the claim to determine whether the selection operates at the level of the meme, the individual, or the society.
I'm guessing mostly at the meme level.
And then I would ask how that meme contributes to its own propagation at that level.
It seems pretty obvious, doesn't it? Utilitarianism makes a carrier believe that they should act to maximize social welfare and that more people believing utilitarianism would help toward that goal, so carriers think they should try to propagate the meme. Also, many egoists may believe that utilitarians would be more willing to contribute to the production of public goods, which they can free ride upon, so they would tend to not argue publicly against utilitarianism, which further contributes to its propagation.
Your just-so story is more complicated than you seem to think. It involves an equilibrium of at least two memes: an evangelical utilitarianism which damages the host but propagates the meme, plus a cryptic egoism which presumably benefits the host but can't successfully propagate (it repeatedly arises by spontaneous generation, presumably).
I could critique your story on grounds of plausibility (which strategy do crypto-egoists suggest to their own children?) but instead I will ask why someone infected by the evangelical utilitarianism meme would argue as you suggested in the great-grandparent:
"Evolution (memetic evolution, that is) has instilled in me the idea of that I should linearly value the welfare of others regardless of kinship, regardless of what instincts I got from my genes."
Isn't it more likely that someone realizing that they have been subverted by a selfish meme would be trying to self-modify?
Isn't it more likely that someone realizing that they have been subverted by a selfish meme would be trying to self-modify?
What does "subverted" mean in this context? For example I devote a lot of resources into thinking about philosophical problems which does not seem to contribute to my genetic fitness. Have I been "subverted" by a selfish meme (i.e., the one that says "the unexamined life is not worth living")? If so, I don't feel any urge to try to self-modify away from this. Couldn't a utilitarian feel the same?
I devote a lot of resources into thinking about philosophical problems which does not seem to contribute to my genetic fitness. Have I been "subverted" by a selfish meme (i.e., the one that says "the unexamined life is not worth living")?
Possibly. It depends on why you do that. The other main hypotheses are that your genetic program may just be manfunctioning in an unfamiliar environment, or that the philosophical problems do - in fact - have some chance of turning out to be adaptive.
If so, I don't feel any urge to try to self-modify away from this.
Right. So: that could be a result of the strategy of the meme to evade your memetic immune system - or the result of reduced memetic immunity as a result of immune system attacks by other memes you have previously been exposed to.
Any meme that makes a human more meme-friendly benefits itself - as well as all the other memes in the ideosphere. Consequently it tends to becomes popular - since every other meme wants to be linked to it.
A utilitarian might well be indifferent to the self-serving nature of the the meme. But, as I recall, you brought up the question in response to my suggestion that my own (genetic) instincts derive a kind of nobility from their origin in the biological process of natural selection for organism fitness. Would our hypothetical utilitarian be proud of the origin of his meme in the cultural process of selection for meme self-promotion?
I don't think you mentioned "nobility" before. What you wrote was just:
Evolution has instilled in me the instinct of valuing the welfare (fitness) of kin at a significant fraction of the value of my own personal welfare.
which seemed to me to be a kind of claim that a utilitarian could make with equal credibility. If you're now saying that you feel noble and proud that your values come from biological instead of cultural evolution... well I've never seen that expressed anywhere else before, so I'm going to guess that most people do not have that kind of feeling.
...seemed to me to be a kind of claim that a utilitarian could make with equal credibility.
Well, he could credibly make that claim if he could credibly assert that the ancestral environment was remarkably favorable for group selection.
... you're now saying that you feel noble and proud that your values come from biological instead of cultural evolution...
What I actually said was "my own (genetic) instincts derive a kind of nobility from their origin ...". The value itself claims a noble genealogy, not a noble essence. If I am proud on its behalf, it is because that instinct has been helping to keep my ancestral line alive for many generations. I could say something similar for a meme which became common by way of selection at the individual or societal level. But what do I say about a selfish meme. That I am not the only person that it fooled and exploited? I'm going to guess that most people do have that kind of feeling.
I think you misinterpreted the context. I endorsed kin selection, together with discounting the welfare of non-kin. Someone (not me!) wishing to be a straight utilitarian and wishing to treat kin and non-kin equally needs to endorse group selection in order to give their ethical intuitions a basis in evolutionary psychology. Because it is clear that humans engage in kin recognition.
Now I see how you are reading the "kind of claim that a utilitarian could make" bit.
As you previously observed, the actual answer to this involves cultural evolution - not group selection.
The "evolutionary psychology" explanation is that humans developed sophisticated culture which was - on average - beneficial, but which allowed all kinds of deleterious memes in with the beneficial ones.
A utilitarian could claim:
Evolution has produced in me the tendency to value the welfare of non-kin at a significant fraction of the value of my own personal welfare.
...on the grounds that their evolution involved gene-meme coevolution - and that inevitably involves a certain amount of memetic hijacking by deleterious memes - such as utilitarianism.
Isn't it more likely that someone realizing that they have been subverted by a selfish meme would be trying to self-modify?
I struggle to understand what is going on there as well. I think some of these folk have simultaneously embraced a kind of "genes=bad, memes=good" memeplex. This says something like: nature red in tooth and claw is evil, while memes turn brutish cavemen into civilized humans. The memes are the future, and they are good. That is a meme other memes want to associate with. Obviously if you buy into such an idea, then that promotes the interests of all of your memes, often at the expense of your genes.
The longer we argue, and the more we ponder, the more we empower the memes. I don't have a problem with that.
Utilitarianism makes a carrier believe that they should act to maximize social welfare and that more people believing utilitarianism would help toward that goal, so carriers think they should try to propagate the meme. Also, many egoists may believe that utilitarians would be more willing to contribute to the production of public goods, which they can free ride upon, so they would tend to not argue publicly against utilitarianism, which further contributes to its propagation.
My hypothesis seems a teensy bit different:
Utilitarianism is a means of signalling what an unselfish goody-two-shoes you are - and many like to send that signal, even if they don't walk the walk. Utilitarianism seems to have hooked some moral philosophers - whoso job descripiton required them to send that message.
Also, utilitarianism is a tool used by charities and causes to manipulate people into giving away their worldly goods. So: there are some financial forces that lead to its marketing and promotion.
I am sceptical about your story about egoists regarding utilitarians positively. Give me an egoist any day. The utilitarian is probably lying to others and to themselves, battling their nature and experiencing inner conflict. Their brain has been hijacked by a sterilising meme. At any moment, I don't know if thier utilitarian side will be dominant, or whether their natural programming will be. That makes it hard for me to model and deal with them.
You can always trust a dishonest man, said the famous philosopher. But you couldn't trust him, after all; he wasn't as dishonest as he claimed.
This is tangential to your point, but what would you say to a utilitarian who says:
"Evolution (memetic evolution, that is) has instilled in me the idea of that I should linearly value the welfare of others regardless of kinship, regardless of what instincts I got from my genes."
Since such a belief is pretty potentially disasterous for the genes - so: how come it got through the memetic immune system? Perhaps this is a case of meme evolution outstripping gene evolution, resulting in virulent memes that can memetically hijack people's brains. However, many seem to have working immune systems - and can resist this meme. Do the utilitarians have weakened memetic immunity? What can have led to that? Were they not taught about the risks of memetic hijacking in school - or by their families?
Does your utility function treat "a life saved by Perplexed" differently from just "a life"? I could understand an egoist who does not terminally value other lives at all (as opposed to instrumentally valuing saving lives as a way to obtain positive emotions or other benefits for oneself), but a utility function that treats "a life saved by me" differently from just "a life" seems counterintuitive.
Surely we expect natural selection to build organisms that value the lives of their relatives. If you save a life, it is surely more likely to be that of a relative than a randomly-selected life - so organisms valuing "local" lives seems only natural to me.
The value of saved vs. new vs. cloned lives is a worthwhile question (and yes, it's only one example) - to introspect on.
I'd gain more satisfaction out of saving a group of people by defeating the cause directly - safely killing or capturing the kidnappers rather than paying the ransom. I'd rather save all those at risk by defeating the entire threat, permanently. If I can only save a small fraction of the group threatened by a single cause, that's less satisfying. But maybe in what you'd think would be a nearly-linear region (you can save a few people from starvation today, for sure), I'd be more than half as satisfied by helping one identifiable person and being able to monitor the consequences than I would by helping two (out of an ocean of a billion). Further, in those "drop in a bucket" cases, I'd expect some desire to save people from diverse threats, as long as the reduced efficiency wasn't too high to justify the thrill of novelty. This desire would be in tension with conserving research/decision effort (just save one more life in the way already researched, prepared, and tested), consistency, a desire for complete victory (but I postulated that my maximal impact was too small - but becoming part of an alliance that achieves complete victory would be nice).
Part of the value of saving existing lives is that I feel a sense of security knowing that I and people like me are fighting such threats as might someday affect me - a reflexive feeling of having allies in the world who might help me - not as a result of anonymous charity (which would be irrational), but as a result of my being the type of person who, when having resources to spare, helps where it's needed more.
But I'm convinced by mathematical arguments that utility should be additive. If the value of N things in the real world is not N times the value of 1 thing, then I handle that in how I assign utility to world states. I want to use additive utility, and as far as I can tell I'm immune to arguments about nonlinearity of objects.
I'm convinced by mathematical arguments that utility should be additive. If the value of N things in the real world is not N times the value of 1 thing, then I handle that in how I assign utility to world states.
I don't disagree. My choice of slogan wording - "utility is not additive" - doesn't capture what I mean. I meant only to deny that the value of something happening N times is (N x U) where U is the value of it happening once.
Correct. In fact, I probably confused things here by using the word "discount" for what I am suggesting here. Let me try to summarize the situation with regard to "discounting".
Time discounting means counting distant future utility as less important than near future utility. EY, in the cited posting, argues against time discounting. (I disagree with EY, for what it is worth.)
"Space discounting" is a locally well-understood idea that utility accruing to people distant from the focal agent is less important than utility accruing to the focal agent's friends, family, and neighbors. EY presumably disapproves of space discounting. (My position is a bit complicated. Distance in space is not the relevant parameter, but I do approve of discounting using a similar 'remoteness' parameter.)
The kind of 'discounting' of large utilities that I recommended in the great-grandparent probably shouldn't be called 'discounting'. I would sloganize it as "utilities are not additive." The parent used the phrase 'diminishing returns'. That is not right either, though it is probably better than 'discounting'. Another phrase that approximates what I was suggesting is 'bounded utility'. (I'm pretty sure I disagree with EY on this one too.)
The fact that I disagree with EY on discounting says absolutely nothing about whether I agree with EY on AI risk, reductionism, exercise, and who writes the best SciFi. That shouldn't need to be said, but sometimes it seems to be necessary in your (XiXiDu's) case.
I would sloganize it as "utilities are not additive."
How about: "Large utilities are not additive for humans".
That was about time discounting, not diminishing returns.
I see, thanks. But I am clueless about the important difference between time discounting and diminishing returns. If I can save people stretched over time or space, why is it wrong to discount people over time but rational to apply diminishing returns to the amount of people you save at a certain time? I mean, you would value to have 5 sports cars for your own use but wouldn't care to have thousands. Likewise you would care to have 5 sports cars in a year but wouldn't care to have them in a hundred years. You discount the number of cars for your own use because you can't drive a thousand cars. You discount the time until you get to drive the cars because you don't expect to value cars in a hundred years.
The main argument put forth in the article I linked to is that you shouldn't discount due to the risk of being inconsistent. But the same can be said about diminishing returns, i.e. space discounting. Because the only reason why you don't care about even more of the same is that you are only able to value so much at the same time. You don't care about another friend if you already got a thousand friends only because friend number 1001 doesn't fit into your memory space. That is, if you knew friend 1001 you wouldn't want to miss him. But you can't imagine having yet another friend, just like you can't imagine driving sports cars in a hundred years. But if you change the context, either by learning about friend 1001, or by traveling into the future and learning that vintage sports cars are highly valued, you also change your preferences. That is, time discounting and diminishing returns will lead to the same kind of preference reversal if you view yourself stretched over time or space as one and the same agent. Such preference reversals seem to be rather context-dependent than to be related to either the number of items or number of years.
So this is the explanation of my confusion that caused me to believe that the arguments put forth in the article also apply to diminishing returns, not just time discounting.
2) Just to amplify point 1) a bit: you shouldn’t always maximize expected utility if you only live once. Expected values — in other words, averages — are very important when you make the same small bet over and over again. When the stakes get higher and you aren’t in a position to repeat the bet over and over, it may be wise to be risk averse.
Expected utilities do not work like that. If you're risk averse you embody that in the utility function by assigning diminishing returns (and this can indeed lead to a situation where you would take a bet 1000 times but would not take it once), you do not stop maximising expected utility.
Expected utilities do not work like that.
If a mathematician like John Baez can be that wrong, doesn't that mean the topic needs further attention? Not necessarily in the sense of research but that people are given specific resources to read up on so that they don't make similar mistakes in the future.
I suspect that John Baez and the people from GiveWell are capable of understanding what you understand about this topic. All of them have read a lot of LW and interviewed the SIAI. If you take that into account, their intelligence and knowledge of the positions hold by the SIAI, is there a way to figure out what went wrong and what we can improve so that those and other people understand how they are wrong?
I am just trying to locate the problem. What do you think is the cause of their disagreement?
Baez: ... you shouldn’t always maximize expected utility if you only live once.
BenElliot: [Baez is wrong] Expected utilities do not work like that.
XiXiDu: If a mathematician like John Baez can be that wrong ...
A mathematician like Baez can indeed be that wrong, when he discusses technical topics that he is insufficiently familiar with. I'm sure Baez is quite capable of understanding the standard position of economists on this topic (the position echoed by BenElliot). But, as it apparently turns out, Baez has not yet done so. No big deal. Treat Baez as an authority on mathematical physics, category theory, and perhaps saving the environment. He is not necessarily an authority on the foundations of microeconomics.
A mathematician like Baez can indeed be that wrong, when he discusses technical topics that he is insufficiently familiar with.
What about Robin Hanson? See for example his post here and here. What is it that he is insufficiently familiar with? Or what about Katja Grace who has been a visiting fellow of the SIAI? See her post here (there are many other posts by her).
And the people from GiveWell even knew about Pascal's Mugging, what is it that they are insufficiently familiar with?
I mean, those people might disagree for different reasons. But I think that too often the argument is used that people just don't know what they are talking about, rather than trying to find out why else they might disagree. As I said in the OP, none of them doubts that there are risks from AI, just that we don't know enough to take them too seriously at this moment. Whereas the SIAI says that the utility associated with AI related matters outweighs those doubts. So if we were going to pinpoint the exact nature of disagreement, would it maybe all come down to how seriously we should take vague possibilities?
And if you are right that the whole problem is that they are insufficiently familiar with the economics of existential risks, then isn't that something that should be improved by putting some effort into raising the awareness of why it is rational not to disregard risks from AI even if one believes that they are very unlikely?
For the record, I never said I disagreed with the people from Givewell. I don't, my charity of choice is currently Village Reach. I merely disagree with Baez when he says we should not maximise expected utility. I would be very surprised to find Robin Hanson making the same mistake (if I did I would seriously re-think my own position, and possibly lower my respect for Hanson significantly).
Please stop trying to view the world in just two sides, Hanson's arguments are arguments that the probability of a singularity (as Eliezer sees it) is low enough that an expected utility maximiser would not spend much time worrying about it (at least, I think that's his point, all he explicitly argues is that the probability is low). His point is not, even slightly, an argument against the utility maximisation.
What benelliot said.
Sheesh! Please don't assume that everyone who disagrees with one point you made is doing so because he disagrees with the whole thrust of your thinking.
A mathematician like Baez can indeed be that wrong, when he discusses technical topics that he is insufficiently familiar with.
What about Robin Hanson? See for example his post here and here.
Doesn't seem to agree with Baez on the subject of utility maximisation. Baez was making no sense - he does seem to be "that wrong" on the topic.
He isn't wrong, he's just used to using different language than you are. And I might add that the language he is using is, as far as I can tell, the far more commonly accepted notion of utility, rather than VNM utility, which is what I assume you are talking about. By "commonly accepted" I mean that the average technical person who uses the word utility probably is not thinking about VNM utility. So if you want to write Baez's views off, you should at least first agree on the same definition and then ask the same question.
See my other comment here. I originally misattributed the Baez quote to XiXiDu, so the reply was addressed to him directly.
This focuses a bit heavily on Pascal's mugging and not on existential risks, but since you may have given me an entirely new idea about it and since it also goes into the idea of really good arguments, I think it seems reasonable to put it here.
Previously, I have been thinking of Pascal's Mugging in terms of a spam filter. Pascal's Mugging resembles spam, so it should be discarded. However, I've thought of an entirely different way to approach Pascal's Mugging after reading your post and I wanted to post it here for thoughts.
Let's say someone who looks relatively harmless walks up from out of an alley and says that they will cause a lifetime of torture to ONE person if you don't give him some small amount of money.
Many people would think "He's referring to me! Eek, I'm being mugged, and not just mugged but mugged by a crazy guy!" Rational people might run some quick calculations in their head and think giving him the money is usually the rational thing to do. Or maybe they'd think the rational thing to do is to walk away.
Of course, there's nothing particularly Pascallian about that mugging. That's basically just a mugging. Let's call it Mugging 0.
So now consider Mugging 1.
Let's say someone who looks relatively harmless walks up from out of an alley and says that they will cause a lifetime of torture to THREE people if you don't give him some small amount of money.
In general, it seems safe to say that you are making slightly different calculations than in Mugging 0. Maybe you're more likely to give me the money. Maybe you're less likely to give me the money, but basically, your attempt to calculate the utility changes.
Now, you can take this out to a large number of powers of three: http://www.quadibloc.com/crypto/t3.htm And get Mugging 2 (9 people), Mugging 3 (27 people), Mugging 4 (81 people) etcetera
Which leads into what I'm thinking as a possible new approach.
For most rationalists who are bothered by Pascal's mugging, for some ranges of numbers you will give money, and for some ranges you won't give money. (If you will give money regardless of the number or not give money regardless of the number, you probably aren't the type of person bothered by Pascal's Mugging.)
As an example on one hand, Let's say that you personally can't stand the thought of being responsible for more than 100 lifetimes of torture. You don't want there to even be a 1 in 1 quadrillion chance of that happening. You might then rationally switch your behavior at Mugging 5 (Where he threatens 243 people) from "Don't give money" to "Give Money." because of your utility calculations.
As an example on the other hand, at Mugging 21 there might be a rather large boost in skepticism. That's 10,460,353,203, which is more people than on earth currently. How is he going to torture that many people? Let's say 3^21 is a point where it's rational to flip your behavior from "Give money" to "Don't give money."
But what I'm getting at is, if the idea that you should give in to Pascal's Mugging is rational, it seems like there must be a point or area where it is rational to flip from "Don't give Money" to "Give Money", and there must also not be any future points or areas where you switch from "Give Money" to "Don't give Money." Which makes my question, at approximately what order of magnitude is this final point or area which represents the smallest rational Pascal's mugging? It doesn't have to perfectly accurate, and will possibly vary from person to person anyway, which is why I am expecting something along the lines of a range of orders of magnitude, and not any individual number. If someone were to answer "Well, possibly somewhere between Mugging 40 and Mugging 50, it would be rational to switch to giving money at around there and not switch after that. Even if someone threatened a Mugging 100 with 3^100 lifetimes of torture, there aren't any new physically expressible rational reasons that apply."
2) Just to amplify point 1) a bit: you shouldn’t always maximize expected utility if you only live once. Expected values — in other words, averages — are very important when you make the same small bet over and over again. When the stakes get higher and you aren’t in a position to repeat the bet over and over, it may be wise to be risk averse.
You need to be a bit careful with your language here. Utility is by definition the thing whose expected value you are maximizing (which probably doesn't exist for humans). Your observation correctly shows that we should care about expected lives saved if the probabilities in question are large enough that we should expect the actual number of lives saved to be close to the expected number. And this is an argument for why utility scales linearly in number of lives on small scales, and why it does not on large scales.
So you reached the right conclusion here for the right reasons, but using slightly incorrect language (which is pretty understandable given how perversely the word utility often gets conflated on this site). You may want to edit your post though, to avoid triggering the reflex where people ignore you because you got a definition wrong.
Also, the answer to Pascal's mugging is that your utility function is bounded. This has been discussed before; while different people have offered different solutions, this is the one that feels right to me on a gut level. It is also the only solution that allows you to uniformly ignore small probabilities without making your utility function depend on your beliefs.
I don't see why people can't just bite the bullet about it and accept their intuitions are wrong like they do a myriad other highly counterintuitive things in math and science.
I don't see why people can't just bite the bullet about it and accept their intuitions are wrong...
I think that it is not clear enough how they are wrong in this case. That is why I wrote the OP, to hint at the possibility that risks from AI in and of itself is not the problem but something that has to do with the risk aversion and the discounting of low-probability events.
What do you think is the underlying reason for the disagreement of organisations like GiveWell or people like John Baez, Robin Hanson, Greg Egan, Douglas Hofstadter etc.?
Eliezer Yudkowsky wrote:
Where should you go in life? I don’t know exactly, but I think I’ll go ahead and say “not environmentalism”. There’s just no way that the product of scope, marginal impact, and John Baez’s comparative advantage is going to end up being maximal at that point. (...) Maybe if there were ten people working on environmentalism and millions of people working on Friendly AI, I could see sending the next marginal dollar to environmentalism. But with millions of people working on environmentalism, and major existential risks that are completely ignored…
Why don't they accept this line of reasoning? There must be a reason other than the existence of existential risks, because all of them agree that existential risks do exist.
Because they are irrational, or haven't been exposed to it?
If I remember correctly, even Eliezer himself had a hard time biting the bullet on the St. Petersburg'd version. Actually, come to think of it I'm not sure if he ever did...
Because they are irrational, or haven't been exposed to it?
They all have been exposed to it. John Baez, GiveWell, Robin Hanson, Katja Grace, Greg Egan, Douglas Hofstadter and many others. John Baez has interviewed Eliezer Yudkowsky (part 1, 2, 3). Greg Egan wrote a book where he disses the SIAI. GiveWell interviewed the SIAI. Katja Grace has been a visiting fellow. Robin Hanson started Overcoming Bias with Eliezer. And Douglas Hofstadter talked at the Singularity Summit. None of them believes that risks from AI are terrible important. And there are many other people. And those are just the few that even care to comment on it.
Are they all irrational? If so, how can we fix that?
All this shows that there seems to be a fundamental problem with the formalized version of rationality.
Only one? (emphasis added to quotation)
My take on it is that at least one of the axioms, in each formalized utility theory of the usual stripe, is rationally non-binding. See Wei_Dai2's response to the Allais Paradox thread, for an example of one suspect, the Axiom of Independence in Von Neumann-Morgenstern utility theory.
I think a lot of people (including very smart people) are kind of hazy about what it means to be risk averse in the context of existential risk.
On one hand there are risks which could cause very sudden human extinction, things like large asteroid strikes, supervolcanoes, encountering an alien civilization, gamma ray bursts, or unfriendly AI. These are always conjectural because if they had actually happened, we wouldn't be around to worry about them. On the other hand there are risks with more direct evidence but which probably won't completely wipe out civilization, like climate change or a pandemic.
If you're risk averse about the human species, you will disproportionately focus on the first group, the things that could not only harm humanity but completely eliminate it. If you're risk seeking about the human species, you will focus on the second group, hoping to get lucky on the smaller risks of complete extinction.
However, if you're risk averse about spending your money on something that could be unnecessary, you disproportionately focus on the second group. This seems to be the higher status option, more immune to criticism.
And I’d have to say:
1) Yes, there probably are such places, but it would take me a while to find the one that I trusted, and I haven’t put in the work. When you’re risk-averse and limited in the time you have to make decisions, you tend to put off weighing options that have a very low chance of success but a very high return if they succeed. This is sensible so I don’t feel bad about it.
2) Just to amplify point 1) a bit: you shouldn’t always maximize expected utility if you only live once. Expected values — in other words, averages — are very important when you make the same small bet over and over again. When the stakes get higher and you aren’t in a position to repeat the bet over and over, it may be wise to be risk averse.
Another issue is that people often use money for expected value calculations, when utility doesn't linearly scale with money.
For instance, me losing $20,000 now would be more than 20,000 times worse than losing a dollar. In the same way that a cut 8 inches deep through my chest is more than 64 times worse than a cut 1/8 of an inch deep.
The problem with "risks from AI" is nothing to do with probabilities. The problem is that by "raising awareness of" imaginary risks, you are poisoning the meme pool and thereby greatly exacerbating real risks.
You claim we should pay more attention to small probabilities? Okay. What's your assessment of the probability that I'm right? That's the probability that you are taking direct, focused action to bring about the extinction of humanity and snuff out the future of intelligent life. (I know that isn't your intent, but while human moral judgment cares about intent, at the end of the day the universe unfortunately doesn't.) What do you propose to do about that?
You claim we should pay more attention to small probabilities?
Not really, but I stopped talking about that openly because I simply have no reason besides my intuition not to take small probabilities into account. If you asked me for what I believe to be the right thing to do as you would ask me about what sort of ice cream I like the most, I would answer that risks from AI are something that one should keep at the back of one's mind until further notice. I would say that someone who tries to mitigate risks from AI now is like someone who would have tried to stop global warming back in the 16th century. But that's not a position I could support with evidence or arguments other than the existence of problems like 'The Infinitarian Challenge to Aggregative Ethics' or 'Pascal's Mugging'. Those problems hint at the possibility that something is very wrong with the whole business of low-probability risks. But is that enough? I have no idea.
In most of my submissions on LW I am trying to provoke feedback to learn about the underlying reasons and thought processes that led people to accept the framework of beliefs that are being supported here and why others, who are not associated with this community, think it is bogus. Sadly it always results in both sides calling each other "idiots".
2) Just to amplify point 1) a bit: you shouldn’t always maximize expected utility if you only live once.
I assume this is another way of saying that he rejects the Von Neumann-Morgenstern axioms?
No, his definition of utility is different from yours. Or rather, he is not using "utility" in any technical sense, just as an abstracted "amount of goodness". I am not sure of this, since I cannot read John Baez's mind, but my experience talking to people who are not regulars on LW is that this is what is generally meant when people say the word utility.
His definition of utility may well be different, as you say. But denial of the vN-M axioms is implied, even if not equivalent.
Hmm. On further reflection, I now think I was wrong. Maybe Baez translates "utility" as something like "disability-adjusted life years saved." And then he has a nonlinear function from DALYs saved to utility-in-the-technical-sense. Voila, risk aversion (for gambles on DALYs) makes sense.
This is roughly what I was thinking, though you have expressed it much more clearly than I did.
Over at overcomingbias Robin Hanson wrote:
The title of the paper is 'Moral Impossibility in the Petersburg Paradox : A Literature Survey and Experimental Evidence' (PDF):
I think that people who are interested to raise the awareness of risks from AI need to focus more strongly on this problem. Most discussions about how likely risks from AI are, or how seriously they should be taken, won't lead anywhere if the underlying reason for most of the superficial disagreement about risks from AI is that people discount anything under a certain threshold. There seems to be a point where things become vague enough that they get discounted completely.
The problem often doesn't seem to be that people doubt the possibility of artificial general intelligence. But most people would sooner question their grasp of “rationality” than give five dollars to a charity that tries to mitigate risks from AI because their calculations claim it was “rational” (those who have read the article by Eliezer Yudkowsky on 'Pascal's Mugging' know that I used a statement from that post and slightly rephrased it). The disagreement all comes down to a general averseness to options that have a low probability of being factual, even given that the stakes are high.
Nobody is so far able to beat arguments that bear resemblance to Pascal’s Mugging. At least not by showing that it is irrational to give in from the perspective of a utility maximizer. One can only reject it based on a strong gut feeling that something is wrong. And I think that is what many people are unknowingly doing when they argue against the SIAI or risks from AI. They are signaling that they are unable to take such risks into account. What most people mean when they doubt the reputation of people who claim that risks from AI need to be taken seriously, or who say that AGI might be far off, what those people mean is that risks from AI are too vague to be taken into account at this point, that nobody knows enough to make predictions about the topic right now.
When GiveWell, a charity evaluation service, interviewed the SIAI (PDF), they hinted at the possibility that one could consider the SIAI to be a sort of Pascal’s Mugging:
This shows that lot of people do not doubt the possibility of risks from AI but are simply not sure if they should really concentrate their efforts on such vague possibilities.
Technically, from the standpoint of maximizing expected utility, given the absence of other existential risks, the answer might very well be yes. But even though we believe to understand this technical viewpoint of rationality very well in principle, it does also lead to problems such as Pascal’s Mugging. But it doesn’t need a true Pascal’s Mugging scenario to make people feel deeply uncomfortable with what Bayes’ Theorem, the expected utility formula, and Solomonoff induction seem to suggest one should do.
Again, we currently have no rational way to reject arguments that are framed as predictions of worst case scenarios that need to be taken seriously even given a low probability of their occurrence due to the scale of negative consequences associated with them. Many people are nonetheless reluctant to accept this line of reasoning without further evidence supporting the strong claims and request for money made by organisations such as the SIAI.
Here is for example what mathematician and climate activist John Baez has to say:
All this shows that there seems to be a fundamental problem with the formalized version of rationality. The problem might be human nature itself, that some people are unable to accept what they should do if they want to maximize their expected utility. Or we are missing something else and our theories are flawed. Either way, to solve this problem we need to research those issues and thereby increase the confidence in the very methods used to decide what to do about risks from AI, or to increase the confidence in risks from AI directly, enough to make it look like a sensible option, a concrete and discernable problem that needs to be solved.
Many people perceive the whole world to be at stake, either due to climate change, war or engineered pathogens. Telling them about something like risks from AI, even though nobody seems to have any idea about the nature of intelligence, let alone general intelligence or the possibility of recursive self-improvement, seems like just another problem, one that is too vague to outweigh all the other risks. Most people feel like having a gun pointed to their heads, telling them about superhuman monsters that might turn them into paperclips then needs some really good arguments to outweigh the combined risk of all other problems.
(Note: I am not making claim about the possibility of risks from AI in and of itself but rather put forth some ideas about the underyling reasons for why some people seem to neglect existential risks even though they know all the arguments.)