Mahatma Armstrong: CEVed to death.
My main objection to Coherent Extrapolated Volition (CEV) is the "Extrapolated" part. I don't see any reason to trust the extrapolated volition of humanity - but this isn't just for self centred reasons. I don't see any reason to trust my own extrapolated volition. I think it's perfectly possible that my extrapolated volition would follow some scenario like this:
- It starts with me, Armstrong 1. I want to be more altruistic at the next level, valuing other humans more.
- The altruistic Armstrong 2 wants to be even more altruistic. He makes himself into a perfectly altruistic utilitarian towards humans, and increases his altruism towards animals.
- Armstrong 3 wonders about the difference between animals and humans, and why he should value one of them more. He decided to increase his altruism equally towards all sentient creatures.
- Armstrong 4 is worried about the fact that sentience isn't clearly defined, and seems arbitrary anyway. He increase his altruism towards all living things.
- Armstrong 5's problem is that the barrier between living and non-living things isn't clear either (e.g. viruses). He decides that he should solve this by valuing all worthwhile things - is not art and beauty worth something as well?
- But what makes a thing worthwhile? Is there not art in everything, beauty in the eye of the right beholder? Armstrong 6 will make himself value everything.
- Armstrong 7 is in turmoil: so many animals prey upon other animals, or destroy valuable rocks! To avoid this, he decides the most moral thing he can do is to try and destroy all life, and then create a world of stasis for the objects that remain.
There are many other ways this could go, maybe ending up as a negative utilitarian or completely indifferent, but that's enough to give the flavour. You might trust the person you want to be, to do the right things. But you can't trust them to want to be the right person - especially several levels in (compare with the argument in this post, and my very old chaining god idea). I'm not claiming that such a value drift is inevitable, just that it's possible - and so I'd want my initial values to dominate when there is a large conflict.
Nor do I give Armstrong 7's values any credit for having originated from mine. Under torture, I'm pretty sure I could be made to accept any system of values whatsoever; there are other ways that would provably alter my values, so I don't see any reason to privilege Armstrong 7's values in this way.
"But," says the objecting strawman, "this is completely different! Armstrong 7's values are the ones that you would reach by following the path you would want to follow anyway! That's where you would get to, if you started out wanting to be more altruistic, had control over you own motivational structure, and grew and learnt and knew more!"
"Thanks for pointing that out," I respond, "now that I know where that ends up, I must make sure to change the path I would want to follow! I'm not sure whether I shouldn't be more altruistic, or avoid touching my motivational structure, or not want to grow or learn or know more. Those all sound pretty good, but if they end up at Armstrong 7, something's going to have to give."
Earning to Give vs. Altruistic Career Choice Revisited
A commonly voiced sentiment in the effective altruist community is that the best way to do the most good is generally to make as much money as possible, with a view toward donating to the most cost-effective charities. This is often referred to as “earning to give.” In the article To save the world, don’t get a job at a charity; go work on Wall Street William MacAskill wrote:
Top undergraduates who want to “make a difference” are encouraged to forgo the allure of Wall Street and work in the charity sector ... while researching ethical career choice, I concluded that it’s in fact better to earn a lot of money and donate a good chunk of it to the most cost-effective charities, a path that I call “earning to give.” ... In general, the charitable sector is people-rich but money-poor. Adding another person to the labor pool just isn’t as valuable as providing more money, so that more workers can be hired.
In private correspondence, MacAskill clarified that he wasn’t arguing that “earning to give” is the best way to do good, only that it’s often better than working at a given nonprofit. In a recent comment MacAskill wrote
I think there's too much emphasis on “earning to give” as the *best* option rather than as the *baseline* option
and raises a number of counter-considerations against “earning to give.” Despite this, the idea that “earning to give” is optimal has caught on in the effective altruist community, and so it’s important to discuss it.
Over the past three years, I myself have shifted from the position that “earning to give” is philanthropically optimal, to the position that it’s generally the case that one can do more good by choosing a career with high direct social value than by choosing a lucrative career with a view toward donating as much as possible.
In this post I’ll outline some arguments in favor of this view.
A Rational Altruist Punch in The Stomach
Robin Hanson wrote, five years ago:
Very distant future times are ridiculously easy to help via investment. A 2% annual return adds up to a googol (10^100) return over 12,000 years, even if there is only a 1/1000 chance they will exist or receive it.
So if you are not incredibly eager to invest this way to help them, how can you claim to care the tiniest bit about them? How can you think anyone on Earth so cares? And if no one cares the tiniest bit, how can you say it is "moral" to care about them, not just somewhat, but almost equally to people now? Surely if you are representing a group, instead of spending your own wealth, you shouldn’t assume they care much.
So why do many people seem to care about policy that effects far future folk? I suspect our paternalistic itch pushes us to control the future, rather than to enrich it. We care that the future celebrates our foresight, not that they are happy.
In the comments some people gave counterarguments. For those in a rush, the best ones are Toby Ord's. But I didn't bite any of the counterarguments to the extent that it would be necessary to counter the 10^100. I have some trouble conceiving of what would beat a consistent argument a googol fold.
Things that changed my behavior significantly over the last few years have not been many, but I think I'm facing one of them. Understanding biological immortality was one, it meant 150 000 non-deaths per day. Understanding the posthuman potential was another. Then came the 10^52 potential lives lost in case of X-risk, or if you are conservative and think only biological stuff can have moral lives on it, 10^31. You can argue about which movie you'll watch, which teacher would be best to have, who should you marry. But (if consequentialist) you can't argue your way out of 10^31 or 10^52. You won't find a counteracting force that exactly matches, or really reduces the value of future stuff by
3 000 000 634 803 867 000 000 000 000 000 000 777 000 000 000 999 fold
Which is way less than 10^52
You may find a fundamental and qualitative counterargument "actually I'd rather future people didn't exist", but you won't find a quantitative one. Thus I spend a lot of time on X-risk related things.
Back to Robin's argument: so unless someone gives me a good argument against investing some money in the far future (and discovering some vague techniques of how to do it that will make it at least one in a millionth possibility) I'll set aside a block of money X, a block of time Y, and will invest in future people 12 thousand years from now. If you don't think you can beat 10^100, join me.
And if you are not in a rush, read this also, for a bright reflection on similar issues.
CEV: a utilitarian critique
I'm posting this article on behalf of Brian Tomasik, who authored it but is at present too busy to respond to comments.
Update from Brian: "As of 2013-2014, I have become more sympathetic to at least the spirit of CEV specifically and to the project of compromise among differing value systems more generally. I continue to think that pure CEV is unlikely to be implemented, though democracy and intellectual discussion can help approximate it. I also continues to feel apprehensive about the conclusions that a CEV might reach, but the best should not be the enemy of the good, and cooperation is inherently about not getting everything you want in order to avoid getting nothing at all."
Introduction
I'm often asked questions like the following: If wild-animal suffering, lab universes, sentient simulations, etc. are so bad, why can't we assume that Coherent Extrapolated Volition (CEV) will figure that out and do the right thing for us?
Disclaimer
Most of my knowledge of CEV is based on Yudkowsky's 2004 paper, which he admits is obsolete. I have not yet read most of the more recent literature on the subject.
Reason 1: CEV will (almost certainly) never happen
CEV is like a dream for a certain type of moral philosopher: Finally, the most ideal solution for discovering what we really want upon reflection!
The fact is, the real world is not decided by moral philosophers. It's decided by power politics, economics, and Darwinian selection. Moral philosophers can certainly have an impact through these channels, but they're unlikely to convince the world to rally behind CEV. Can you imagine the US military -- during its AGI development process -- deciding to adopt CEV? No way. It would adopt something that ensures the continued military and political dominance of the US, driven by mainstream American values. Same goes for China or any other country. If AGI is developed by a corporation, the values will reflect those of the corporation or the small group of developers and supervisors who hold the most power over the project. Unless that group is extremely enlightened, CEV is not what we'll get.
Anyway, this is assuming that the developers of AGI can even keep it under control. Most likely AGI will turn into a paperclipper or else evolve into some other kind of Darwinian force over which we lose control.
Objection 1: "Okay. Future military or corporate developers of AGI probably won't do CEV. But why do you think they'd care about wild-animal suffering, etc. either?"
Well, they might not, but if we make the wild-animal movement successful, then in ~50-100 years when AGI does come along, the notion of not spreading wild-animal suffering might be sufficiently mainstream that even military or corporate executives would care about it, at least to some degree.
If post-humanity does achieve astronomical power, it will only be through AGI, so there's high value for influencing the future developers of an AGI. For this reason I believe we should focus our meme-spreading on those targets. However, this doesn't mean they should be our only focus, for two reasons: (1) Future AGI developers will themselves be influenced by their friends, popular media, contemporary philosophical and cultural norms, etc., so if we can change those things, we will diffusely impact future AGI developers too. (2) We need to build our movement, and the lowest-hanging fruit for new supporters are those most interested in the cause (e.g., antispeciesists, environmental-ethics students, transhumanists). We should reach out to them to expand our base of support before going after the big targets.
Objection 2: "Fine. But just as we can advance values like preventing the spread of wild-animal suffering, couldn't we also increase the likelihood of CEV by promoting that idea?"
Sure, we could. The problem is, CEV is not an optimal thing to promote, IMHO. It's sufficiently general that lots of people would want it, so for ourselves, the higher leverage comes from advancing our particular, more idiosyncratic values. Promoting CEV is kind of like promoting democracy or free speech: It's fine to do, but if you have a particular cause that you think is more important than other people realize, it's probably going to be better to promote that specific cause than to jump on the bandwagon and do the same thing everyone else is doing, since the bandwagon's cause may not be what you yourself prefer.
Indeed, for myself, it's possible CEV could be a net bad thing, if it would reduce the likelihood of paperclipping -- a future which might (or might not) contain far less suffering than a future directed by humanity's extrapolated values.
Reason 2: CEV would lead to values we don't like
Some believe that morality is absolute, in which case a CEV's job would be to uncover what that is. This view is mistaken, for the following reasons: (1) Existence of a separate realm of reality where ethical truths reside violates Occam's razor, and (2) even if they did exist, why would we care what they were?
Yudkowsky and the LessWrong community agree that ethics is not absolute, so they have different motivations behind CEV. As far as I can gather, the following are two of them:
Motivation 1: Some believe CEV is genuinely the right thing to do
As Eliezer said in his 2004 paper (p. 29), "Implementing CEV is just my attempt not to be a jerk." Some may believe that CEV is the ideal meta-ethical way to resolve ethical disputes.
I have to differ. First, the set of minds included in CEV is totally arbitrary, and hence, so will be the output. Why include only humans? Why not animals? Why not dead humans? Why not humans that weren't born but might have been? Why not paperclip maximizers? Baby eaters? Pebble sorters? Suffering maximizers? Wherever you draw the line, there you're already inserting your values into the process.
And then once you've picked the set of minds to extrapolate, you still have astronomically many ways to do the extrapolation, each of which could give wildly different outputs. Humans have a thousand random shards of intuition about values that resulted from all kinds of little, arbitrary perturbations during evolution and environmental exposure. If the CEV algorithm happens to make some more salient than others, this will potentially change the outcome, perhaps drastically (butterfly effects).
Now, I would be in favor of a reasonable extrapolation of my own values. But humanity's values are not my values. There are people who want to spread life throughout the universe regardless of suffering, people who want to preserve nature free from human interference, people who want to create lab universes because it would be cool, people who oppose utilitronium and support retaining suffering in the world, people who want to send members of other religions to eternal torture, people who believe sinful children should burn forever in red-hot ovens, and on and on. I do not want these values to be part of the mix.
Maybe (hopefully) some of these beliefs would go away once people learned more about what these wishes really implied, but some would not. Take abortion, for example: Some non-religious people genuinely oppose it, and not for trivial, misinformed reasons. They have thought long and hard about abortion and still find it to be wrong. Others have thought long and hard and still find it to be not wrong. At some point, we have to admit that human intuitions are genuinely in conflict in an irreconcilable way. Some human intuitions are irreconcilably opposed to mine, and I don't want them in the extrapolation process.
Motivation 2: Some argue that even if CEV isn't ideal, it's the best game-theoretic approach because it amounts to cooperating on the prisoner's dilemma
I think the idea is that if you try to promote your specific values above everyone else's, then you're timelessly causing this to be the decision of other groups of people who want to push for their values instead. But if you decided to cooperate with everyone, you would timelessly influence others to do the same.
This seems worth considering, but I'm doubtful that the argument is compelling enough to take too seriously. I can almost guarantee that if I decided to start cooperating by working toward CEV, everyone else working to shape values of the future wouldn't suddenly jump on board and do the same.
Objection 1: "Suppose CEV did happen. Then spreading concern for wild animals and the like might have little value, because the CEV process would realize that you had tried to rig the system ahead of time by making more people care about the cause, and it would attempt to neutralize your efforts."
Well, first of all, CEV is (almost certainly) never going to happen, so I'm not too worried. Second of all, it's not clear to me that such a scheme would actually be put in place. If you're trying to undo pre-CEV influences that led to the distribution of opinions to that point, you're going to have a heck of a lot of undoing to do. Are you going to undo the abundance of Catholics because their religion discouraged birth control and so led to large numbers of supporters? Are you going to undo the over-representation of healthy humans because natural selection unfairly removed all those sickly ones? Are you going to undo the under-representation of dinosaurs because an arbitrary asteroid killed them off before CEV came around?
The fact is that who has power at the time of AGI will probably matter a lot. If we can improve the values of those who will have power in the future, this will in expectation lead to better outcomes -- regardless of whether the CEV fairy tale comes true.
Intelligence explosion in organizations, or why I'm not worried about the singularity
If I understand the Singularitarian argument espoused by many members of this community (eg. Muehlhauser and Salamon), it goes something like this:
- Machine intelligence is getting smarter.
- Once an intelligence becomes sufficiently supra-human, its instrumental rationality will drive it towards cognitive self-enhancement (Bostrom), so making it a super-powerful, resource hungry superintelligence.
- If a superintelligence isn't sufficiently human-like or 'friendly', that could be disastrous for humanity.
- Machine intelligence is unlikely to be human-like or friendly unless we take precautions.
I'm in danger of getting into politics. Since I understand that political arguments are not welcome here, I will refer to these potentially unfriendly human intelligences broadly as organizations.
Smart organizations
By "organization" I mean something commonplace, with a twist. It's commonplace because I'm talking about a bunch of people coordinated somehow. The twist is that I want to include the information technology infrastructure used by that bunch of people within the extension of "organization".
Do organizations have intelligence? I think so. Here's some of the reasons why:
- We can model human organizations as having preference functions. (Economists do this all the time)
- Human organizations have a lot of optimization power.
I talked with Mr. Muehlhauser about this specifically. I gather that at least at the time he thought human organizations should not be counted as intelligences (or at least as intelligences with the potential to become superintelligences) because they are not as versatile as human beings.
So when I am talking about super-human intelligence, I specifically mean an agent that is as good or better at humans at just about every skill set that humans possess for achieving their goals. So that would include things like not just mathematical ability or theorem proving and playing chess, but also things like social manipulation and composing music and so on, which are all functions of the brain not the kidneys
...and then...
It would be a kind of weird [organization] that was better than the best human or even the median human at all the things that humans do. [Organizations] aren’t usually the best in music and AI research and theory proving and stock markets and composing novels. And so there certainly are [Organizations] that are better than median humans at certain things, like digging oil wells, but I don’t think there are [Organizations] as good or better than humans at all things. More to the point, there is an interesting difference here because [Organizations] are made of lots of humans and so they have the sorts of limitations on activities and intelligence that humans have. For example, they are not particularly rational in the sense defined by cognitive science. And the brains of the people that make up organizations are limited to the size of skulls, whereas you can have an AI that is the size of a warehouse.
I think that Muehlhauser is slightly mistaken on a few subtle but important points. I'm going to assert my position on them without much argument because I think they are fairly sensible, but if any reader disagrees I will try to defend them in the comments.
- When judging whether an entity has intelligence, we should consider only the skills relevant to the entity's goals.
- So, if organizations are not as good at a human being at composing music, that shouldn't disqualify them from being considered broadly intelligent if that has nothing to do with their goals.
- Many organizations are quite good at AI research, or outsource their AI research to other organizations with which they are intertwined.
- The cognitive power of an organization is not limited to the size of skulls. The computational power is of many organizations is comprised of both the skulls of its members and possibly "warehouses" of digital computers.
- With the ubiquity of cloud computing, it's hard to say that a particular computational process has a static spatial bound at all.
Mean organizations
* My preferred standard of rationality is communicative rationality, a Habermasian ideal of a rationality aimed at consensus through principled communication. As a consequence, when I believe a position to be rational, I believe that it is possible and desirable to convince other rational agents of it.
Recognizing memetic infections and forging resistance memes
What does an memetic infection look like? Well, you would encounter something (probably on the internet) that seems very compelling. You think intensely about it for a while, and it spurs you to do something - probably to post something related on the internet. After a while, the meme may not seem that compelling to you anymore, and you wonder why you invested that time and energy. The meme has reproduced itself. For example, Bruce Sterling's response to the 'New Aesthetic' is a paradigmatic example of memetic infection: he encountered it, he found it compelling, he wrote about it, I read about it and now I know about it. (Note that the word 'infection' has a stigma to it, but I don't mean it to be necessarily a bad thing. I will use 'disease' to mean 'infection with bad consequences'.)
Now, let me jump to an apparently unrelated concept - Viral Eukaryogenesis. If I understand correctly, Viral Eukaryogenesis is the theory that eukaryotes (including you and me) are inheritors of a bargain between two kinds of life - metabolic life and viral life, something like the way lichens are a bargain between fungi and algae. The nucleus that characterizes eukaryotes is supposed to be descended from a virus protein shell, and the membrane-fusion proteins that we use for gamete fusion (crucial for sex) are supposed to be descended from viral infection proteins. I am not a biologist, but my understanding of the state of biology is that it is an interesting hypothesis, as yet neither proven nor disproven. However, I'm going to talk as if it were true, because I'm actually trying to make an analogy with memes.
Mitt Romney's $10,000 bet
For those who don't follow politics, Mitt Romney offered to bet Rick Perry $10,000 that Perry had misquoted Romney. (video)
Most political commenters see the move as a gaffe. They claim the bet made Romney look out of touch, because it reminded voters that Romney is rich enough to afford $10,000.
As a believer in prediction markets, I am disappointed in the public's reaction. Romney made a bold move by making his beliefs pay rent. Critics point out that $10,000 is "chump change" for Romney, but Romney still but himself at risk. If he had lost the bet, Perry could have made a production about cashing a $10,000 check from a disgraced Romney. Besides, if money were the issue, Perry could have countered with a non-monetary bet. "Loser has to attend the next debate in a clown suit" or something.
If politicians had to face real consequences every time they made a false statement, they would have a larger incentive to tell the truth. It's a shame Romney's bet probably won't catch on.
----
This post is not an endorsement of Mitt Romney or his politics. All I am endorsing is political betting.
It's not like anything to be a bat
...at least not if you accept a certain line of anthropic argument.
Thomas Nagel famously challenged the philosophical world to come to terms with qualia in his essay "What is it Like to Be a Bat?". Bats, with sensory systems so completely different from those of humans, must have exotic bat qualia that we could never imagine. Even if we deduce all the physical principles behind echolocation, even if we could specify the movement of every atom in a bat's senses and nervous system that represents its knowledge of where an echolocated insect is, we still have no idea what it's like to feel a subjective echolocation quale.
Anthropic reasoning is the idea that you can reason conditioning on your own existence. For example, the Doomsday Argument says that you would be more likely to exist in the present day if the overall number of future humans was medium-sized instead of humongous, therefore since you exist in the present day, there must be only a medium-sized number of future humans, and the apocalypse must be nigh, for values of nigh equal to "within a few hundred years or so".
The Buddhists have a parable to motivate young seekers after enlightenment. They say - there are zillions upon zillions of insects, trillions upon trillions of lesser animals, and only a relative handful of human beings. For a reincarnating soul to be born as a human being, then, is a rare and precious gift, and an opportunity that should be seized with great enthusiasm, as it will be endless eons before it comes around again.
Whatever one thinks of reincarnation, the parable raises an interesting point. Considering the vast number of non-human animals compared to humans, the probability of being a human is vanishingly low. Therefore, chances are that if I could be an animal, I would be. This makes a strong anthropic argument that it is impossible for me to be an animal.
'Newcomblike' Video Game: Frozen Synapse
Disregarding for the moment the question of whether video games are a rational use of one's time:
Frozen Synapse is a turn based strategy combat game that appears to be particularly interesting from a rationalist standpoint. I haven't played it, but according to the reviews, it's actually a combination of turn-based and real-time play. Each turn encompasses 5 seconds of realtime, but that 5 seconds of realtime doesn't happen until both players have constructed their moves, which they may take as long as they'd like to do. Constructing a move involves giving your several units and your opponent's several units commands, watching what happens when the units play out those commands, and repeating that process until one has a set of commands for one's units that one considers optimal given what one predicts one's opponent will do. This happens on a procedurally-generated battlefield; there are reports of this occasionally giving one player or the other an insurmountable advantage, but the reviews seem to indicate that being able to play on a fresh field each time and having to think about proper use of its layout on the fly outweighs this issue.
Also, the game came to my attention because there's a Humble Bundle available for it now, which means that it can be acquired very nearly for free; just ignore the 'beat the average to get more games' hook.
Too many cooks
I was in a game last weekend where, at one point, the players needed to solve an in-game problem. The mechanics for "solving" the problem were for us to assemble a 3D "jigsaw" puzzle. (One of those geometric shapes made by getting a lot of little shapes to fit together in just the right way.)
Three of us sat down to solve the puzzle together. Looking at the pieces, looking at the picture of the assembled figure, we made observations about what constraints we saw, what piece on the table might correspond to something in the picture, convinced whoever was holding the piece in question to put it in a particular place, and gradually assembled the puzzle cooperatively. We had an instruction sheet with pictures of the puzzle at 3 different stages of completion. It took us something under 10 minutes. (Several of those were taken up mutually deciding in which order to follow the pictures, as one person had started using the picture showing the first stage, one had started with the picture showing the last stage and was disassembling the semi-assembled first stage for parts, and one had no clear strategy.)
A few hours later, after the game ended, I sat at the table with the disassembled puzzle pieces, and put it together by myself, following the pictures from first stage to last. I was not aware of any memories of how it had been assembled the last time; and anyway, the instruction sheet was much more valuable than any memories I had. (It wasn't one of those symmetric 3D puzzles where there's a pattern or trick to it; it was a collection of oddly-shaped unique pieces assembled in three layers.) It took less than a minute.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)