I think Eliezer's meta-ethics is wrong because it's possible that we live in a world where Eliezer's "right" doesn't actually designate anything. That is, where a typical human's morality, when extrapolated, fails to be coherent. "Right" should still mean something in a world like that, but it doesn't under Eliezer's theory.
Also, to jump the gun a bit, your own meta-ethics, desirism, says:
Thus, morality is the practice of shaping malleable desires: promoting desires that tend to fulfill other desires, and discouraging desires that tend to thwart other desires.
What does this mean in the FAI context? To a super-intelligent AI, it's own desires, as well as those of everyone else on Earth, can be considered "malleable", in the sense that it can change all of them if it wanted to. But there might be some other super-intelligent AIs (created by aliens) whose desires it is powerless to change. I hope desirism doesn't imply that it should change my desires so as to fulfill the alien AIs' desires...
I haven't found a satisfactory meta-ethics yet, so I still don't know. But whatever the answer is, it has to be at least as good as "my current (unextrapolated) preferences". "Nothing" is worse than that, so it can't be the correct answer.
it follows that no human can know what they care about
This sounds weird, like you've driven off a cliff or something. A human mind is a computer of finite complexity. If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty which may or may not be reduced by applying powerful math. Or do I misunderstand you? Maybe the following two questions will help clarify things:
a) Can a paperclipper know what it cares about?
b) How is a human fundamentally different from a paperclipper with respect to (a)?
how is it possible that Eliezer's "right" doesn't designate anything
Eliezer identifies "right" with "the ideal morality that I would have if I heard all the arguments, to whatever extent such an extrapolation is coherent." It is possible that human morality, when extrapolated, shows no coherence, in which case Eliezer's "right" doesn't designate anything.
how could you arrive at such a strong conclusion based on his non-technical writings, since he could just mean something different, or could have insufficient precision in his own idea to determine this property
Are you saying that Eliezer's general approach might still turn out to be correct, if we substitute better definitions or understandings of "extrapolation" and/or "coherence"? If so, I agree, and I didn't mean to exclude this possibility with my original statement. Should I have made it clearer when I said "I think Eliezer's meta-ethics is wrong" that I meant "based on my understanding of Eliezer's current ideas"?
So let's say that you go around saying that philosophy has suddenly been struck by a SERIOUS problem, as in lives are at stake, and philosophers don't seem to pay any attention. Not to the problem itself, at any rate, though some of them may seem annoyed at outsiders infringing on their territory, and nonplussed at the thought of their field trying to arrive at answers to questions where the proper procedure is to go on coming up with new arguments and respectfully disputing them with other people who think differently, thus ensuring a steady flow of papers for all.
Let us say that this is what happens; which of your current beliefs, which seem to lead you to expect something else to happen, would you update?
No, that is exactly what I expect to happen with more than 99% of all philosophers. But we already have David Chalmers arguing it may be a serious problem. We have Nick Bostrom and the people at Oxford's Future of Humanity Institute. We probably can expect some work on SIAI's core concerns from philosophy grad students we haven't yet heard from because they haven't published much, for example Nick Beckstead, whose interests are formal epistemology and the normative ethics of global catastrophic risks.
As you've said before, any philosophy that would be useful to you and SIAI is hard to find. But it's out there, in tiny piles, and more of it is coming.
I don't remember the specifics, and so don't have the terms to do a proper search, but I think I recall being taught in one course about a philosopher who, based on the culmination of all his own arguments on ethics, came to the conclusion that being a philosopher was useless, and thus changed careers.
As a layman I'm still puzzled how the LW sequences do not fall into the category of philosophy. Bashing philosophy seems to be over the top, there is probably as much "useless" mathematics.
I think the problem is that philosophy has, as a field, done a shockingly bad job of evicting obsolete and incorrect ideas (not just useless ones). Someone who seeks a philosophy degree can expect to waste most of their time and potential on garbage. To use a mathematics analogy, it's as if mathematicians were still holding debates between binaryists, decimists, tallyists and nominalists.
Most of what's written on Less Wrong is philosophy, there's just so much garbage under philosophy's name that it made sense to invent a new name ("rationalism"), pretend it's unrelated, and guard that name so that people can use it as a way to find good philosophy without wading through the bad. It's the only reference class I know of for philosophy writings that's (a) larger than one author, (b) mostly sane, and (c) enumerable by someone who isn't an expert.
They do. (Many of EY's own posts are tagged "philosophy".) Indeed, FAI will require robust solutions to several standard big philosophical problems, not just metaethics; e.g. subjective experience (to make sure that CEV doesn't create any conscious persons while extrapolating, etc.), the ultimate nature of existence (to sort out some of the anthropic problems in decision theory), and so on. The difference isn't (just) in what questions are being asked, but in how we go about answering them. In traditional philosophy, you're usually working on problems you personally find interesting, and if you can convince a lot of other philosophers that you're right, write some books, and give a lot of lectures, then that counts as a successful career. LW-style philosophy (as in the "Reductionism" and "Mysterious Answers" sequences) is distinguished in that there is a deep need for precise right answers, with more important criteria for success than what anyone's academic peers think.
Basically, it's a computer science approach to philosophy: any progress on understanding a phenomenon is measured by how much closer it gets you to an algorithmic description of it. Academic philosophy occasionally generates insights on that level, but overall it doesn't operate with that ethic, and it's not set up to reward that kind of progress specifically; too much of it is about rhetoric, formality as an imitation of precision, and apparent impressiveness instead of usefulness.
I have mixed feelings about that. One big difference in style between the sciences and the humanities lies in the complete lack of respect for tradition in the sciences. The humanities deal in annotations and critical comparisons of received texts. The sciences deal with efficient pedagogy.
I think that the sequences are good in that they try to cover this philosophical material in the great-idea oriented style of the sciences rather than the great-thinker oriented style of the humanities. My only complaint about the sequences is that in some places the pedagogy is not really great - some technical ideas are not explained as clearly as they might be, some of the straw men are a little too easy to knock down, and in a few places Eliezer may have even reached the wrong conclusions.
So, rather than annotating The Sequences (in the tradition of the humanities), it might be better to re-present the material covered by the sequences (in the tradition of the sciences). Or, produce a mixed-mode presentation which (like Eliezer's) focuses on getting the ideas across, but adds some scholarship (unlike Eliezer) in that it provides the standard Googleable names to the ideas discussed - both the good ideas and the bad ones.
Just want to flag that it's not entirely obvious that we need to settle questions in meta-ethics in order to get the normative and applied ethics right. Why not just call for more work directly in the latter fields?
I like this post but I'd like a better idea of how it's meant to be taken to the concrete level.
Should SIAI try to hire or ask for contributions from the better academic philosophers? (SIAI honchos could do that.)
Should there be a concerted effort to motivate more research in "applied" meta-ethics, the kind that talks to neuroscience and linguistics and computer science? (Philosophers and philosophy students anywhere could do that.)
Should we LessWrong readers, and current or potential SIAI workers, educate ourselves about mainstream meta-ethics, so that we know more about it than just the Yudkowsky version, and be able to pick up on errors? (Anyone reading this site can do that.)
Note that the Future of Humanity Institute is currently hiring postdocs, either with backgrounds in philosophy or alternatively in math/cognitive science/computer science. There is close collaboration between FHI and SIAI, and the FHI is part of Oxford University, which is a bit less of a leap for a philosophy graduate student.
or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function
...this sentence makes me think that we really aren't on the same page at all with respect to naturalistic metaethics. What is a reason for action? How would a computer program enumerate them all?
Okay, see, this is why I have trouble talking to philosophers in their quote standard language unquote.
I'll ask again: How would a computer program enumerate all reasons for action?
I wonder, since it's important to stay pragmatic, if it would be good to design a "toy example" for this sort of ethics.
It seems like the hard problem here is to infer reasons for action, from an individual's actions. People do all sorts of things; but how can you tell from those choices what they really value? Can you infer a utility function from people's choices, or are there sets of choices that don't necessarily follow any utility function?
The sorts of "toy" examples I'm thinking of here are situations where the agent has a finite number of choices. Let's say you have Pac-Man in a maze. His choices are his movements in four cardinal directions. You watch Pac-Man play many games; you see what he does when he's attacked by a ghost; you see what he does when he can find something tasty to eat; you see when he's willing to risk the danger to get the food.
From this, I imagine you could do some hidden Markov stuff to infer a model of Pac-Man's behavior -- perhaps an if-then tree.
Could you guess from this tree that Pac-Man likes fruit and dislikes dying, and goes away from fruit only when he needs to avoid dying? Yeah, you could (though I don't know how to...
Eliezer,
I think the reason you're having trouble with the standard philosophical category of "reasons for action" is because you have the admirable quality of being confused by that which is confused. I think the "reasons for action" category is confused. At least, the only action-guiding norm I can make sense of is desire/preference/motive (let's call it motive). I should eat the ice cream because I have a motive to eat the ice cream. I should exercise more because I have many motives that will be fulfilled if I exercise. And so on. All this stuff about categorical imperatives or divine commands or intrinsic value just confuses things.
How would a computer program enumerate all motives (which according to me, is co-exensional with "all reasons for action")? It would have to roll up its sleeves and do science. As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems (as it had done already with us), and thereby enumerate all the motives it encounters in the universe, their strengths, the relations between them, and so on.
Bu...
As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems
Now, it's just a wild guess here, but I'm guessing that a lot of philosophers who use the language "reasons for action" would disagree that "knowing the Baby-eaters evolved to eat babies" is a reason to eat babies. Am I wrong?
I'm merely raising questions that need to be considered very carefully.
I tend to be a bit gruff around people who merely raise questions; I tend to view the kind of philosophy I do as the track where you need some answers for a specific reason, figure them out, move on, and dance back for repairs if a new insight makes it necessary; and this being a separate track from people who raise lots of questions and are uncomfortable with the notion of settling on an answer. I don't expect those two tracks to meet much.
Interesting. I always assumed that raising a question was the first step toward answering it
Only if you want an answer. There is no curiosity that does not want an answer. There are four very widespread failure modes around "raising questions" - the failure mode of paper-writers who regard unanswerable questions as a biscuit bag that never runs out of biscuits, the failure mode of the politically savvy who'd rather not offend people by disagreeing too strongly with any of them, the failure mode of the religious who don't want their questions to arrive at the obvious answer, the failure mode of technophobes who mean to spread fear by "raising questions" that are meant more to create anxiety by their raising than by being answered, and all of these easily sum up to an accustomed bad habit of thinking where nothing ever gets answered and true curiosity is dead.
So yes, if there's an interim solution on the table and someone says "Ah, but surely we must ask more questions" instead of "No, you idiot, can't you see that there's a better way" or "But it looks to me like the preponderance of evidence is actually pointing in this here other ...
Awesome. Now your reaction here makes complete sense to me. The way I worded my original article above looks very much like I'm in either the 1st category or the 4th category.
Let me, then, be very clear:
I do not want to raise questions so that I can make a living endlessly re-examining philosophical questions without arriving at answers.
I want me, and rationalists in general, to work aggressively enough on these problems so that we have answers by the time AI+ arrives. As for the fact that I don't have answers yet, please remember that I was a fundamentalist Christian 3 years ago, with no rationality training at all, and a horrendous science education. And I didn't discover the urgency of these problems until about 6 months ago. I've have had to make extremely rapid progress from that point to where I am today. If I can arrange to work on these problems full time, I think I can make valuable contributions to the project of dealing safely with Friendly AI. But if that doesn't happen, well, I hope to at least enable others who can work on this problem full time, like yourself.
I want to solve these problems in 15 years, not 20. This will make most academic philosophers, and most
Well, the part about you being a fundamentalist Christian three years ago is damned impressive and does a lot to convince me that you're moving at a reasonable clip.
On the other hand, a good metaethical answer to the question "What sort of stuff is morality made out of?" is essentially a matter of resolving confusion; and people can get stuck on confusions for decades, or they can breeze past confusions in seconds. Comprehending the most confusing secrets of the universe is more like realigning your car's wheels than like finding the Lost Ark. I'm not entirely sure what to do about the partial failure of the metaethics sequence, or what to do about the fact that it failed for you in particular. But it does sound like you're setting out to heroically resolve confusions that, um, I kinda already resolved, and then wrote up, and then only some people got the writeup... but it doesn't seem like the sort of thing where you spending years working on it is a good idea. 15 years to a piece of paper with the correct answer written on it is for solving really confusing problems from scratch; it doesn't seem like a good amount of time for absorbing someone else's solution. If y...
Have you considered applying to the SIAI Visiting Fellows program? It could be worth a month or 3 of having your living expenses taken care of while you research, and could lead to something longer term.
A 'reason for action' is the standard term in Anglophone philosophy for a source of normativity of any kind. For example, a desire is the source of normativity in a hypothetical imperative. Others have proposed that categorical imperatives exist, and provide reasons for action apart from desires. Some have proposed that divine commands exist, and are sources of normativity apart from desires. Others have proposed that certain objects or states of affairs can ground normativity intrinsically - i.e. that they have intrinsic value apart from being valued by an agent.
Okay, but all of those (to the extent that they're coherent) are observations about human axiology. Beware of committing the mind projection fallacy with respect to compellingness — you find those to be plausible sources of normativity because your brain is that of "a particular species of primate on planet Earth". If your AI were looking for "reasons for action" that would compel all agents, it would find nothing, and if it were looking for all of the "reasons for action" that would compel each possible agent, it would spend an infinite amount of time enumerating stupid pointless motivatio...
CEV also has the problem that nothing short of a superintelligence could actually use it, so unless AI has a really hard takeoff you're going to need something less complicated for your AI to use in the meantime.
Personally I've always thought EY places too much emphasis on solving the whole hard problem of ultimate AI morality all at once. It would be quite valuable to see more foundation-building work on moral systems for less extreme sots of AI, with an emphasis on avoiding bad failure modes rather than trying to get the best possible outcome. That’s the sort of research that could actually grow into an academic sub-discipline, and I’d expect it to generate insights that would help with attempts to solve the SI morality problem.
Of course, the last I heard EY was still predicting that dangerous levels of AI will come along in less time than it would take such a discipline to develop. The gradual approach could work if it takes 100 years to go from mechanical kittens to Skynet’s big brother, but not if it only takes 5.
I'd like to add the connection between the notions of "meta-ethics" and "decision theory" (of the kind we'd want a FAI/CEV to start out with). For the purpose of solving FAI, these seem to be the same, with "decision theory" emphasizing the outline of the target, and "meta-ethics" the source of correctness criteria for such theory in human intuition.
Would someone familiar with the topic be able to do a top level treatment similar to the recent one on self-help? A survey of the literature, etc.
I am a software engineer, but I don't know much about general artificial intelligence. The AI research I am familiar with is very different from what you are talking about here.
Who is currently leading the field in attempts at providing mathematical models for philosophical concepts? Are there simple models that demonstrate what is meant by computational meta-ethics? Is that a correct search term -- as in a term ...
How much thought has been given to hard coding an AI with a deontological framework rather than giving it some consequentialist function to maximize? Is there already a knockdown argument showing why that is a bad idea?
EDIT: I'm not talking about what ethical system to give an AI that has the potential to do the most good, but one that would be capable of the least bad.
Rather, my point is that we need lots of smart people working on these meta-ethical questions.
I'm curious if the SIAI shares that opinion. Is Michael Vassar trying to hire more people or is his opinion that a small team will be able to solve the problem? Can the problem be subdivided into parts, is it subject to taskification?
I'm curious if the SIAI shares that opinion.
I do. More people doing detailed moral psychology research (such as Jonathan Haidt's work), or moral philosophy with the aim of understanding what procedure we would actually want followed, would be amazing.
Research into how to build a powerful AI is probably best not done in public, because it makes it easier to make unsafe AI. But there's no reason not to engage as many good researchers as possible on moral psychology and meta-ethics.
I would ordinarily vote down a post that restated things that most people on LW should already know, but... LW is curiously devoid of discussion on this issue, whether criticism of CEV, or proposals of alternatives. And LP's post hits all the key points, very efficiently.
If LW has a single cultural blind spot, it is that LWers claim to be Bayesians, yet routinely analyze potential futures as if the single "most-likely" scenario, hypothesis, or approach accepted as dogma on LessWrong (fast takeoff, Friendly AI, multiple worlds, CEV, etc.) had probability 1.
It's not just a matter of pace; this perspective also implies a certain prioritization of the questions.
For example, as you say, it's important to conclude soon whether animal welfare is important. (1) (2) But if we preserve the genetic information that creates new animals, we preserve the ability to optimize animal welfare in the future, should we at that time conclude that it is important. (2) If we don't, then later concluding it's important doesn't get us much.
It seems to follow that preserving that information (either in the form of a breeding popula...
Am I correct in saying that there is not necessarily any satisfactory solution to this problem?
Also, this seems relevant: The Terrible, Horrible, No Good Truth About Morality.
Here is a simple moral rule that should make an AI much less likely to harm the interests of humanity:
Never take any action that would reduce the number of bits required to describe the universe by more than X.
where X is some number smaller than the number of bits needed to describe an infant human's brain. For information-reductions smaller than X, the AI should get some disutility, but other considerations could override. This 'information-based morality' assigns moral weight to anything that makes the universe a more information-filled or complex place,...
Barring a major collapse of human civilization (due to nuclear war, asteroid impact, etc.), many experts expect the intelligence explosion Singularity to occur within 50-200 years.
That fact means that many philosophical problems, about which philosophers have argued for millennia, are suddenly very urgent.
Those concerned with the fate of the galaxy must say to the philosophers: "Too slow! Stop screwing around with transcendental ethics and qualitative epistemologies! Start thinking with the precision of an AI researcher and solve these problems!"
If a near-future AI will determine the fate of the galaxy, we need to figure out what values we ought to give it. Should it ensure animal welfare? Is growing the human population a good thing?
But those are questions of applied ethics. More fundamental are the questions about which normative ethics to give the AI: How would the AI decide if animal welfare or large human populations were good? What rulebook should it use to answer novel moral questions that arise in the future?
But even more fundamental are the questions of meta-ethics. What do moral terms mean? Do moral facts exist? What justifies one normative rulebook over the other?
The answers to these meta-ethical questions will determine the answers to the questions of normative ethics, which, if we are successful in planning the intelligence explosion, will determine the fate of the galaxy.
Eliezer Yudkowsky has put forward one meta-ethical theory, which informs his plan for Friendly AI: Coherent Extrapolated Volition. But what if that meta-ethical theory is wrong? The galaxy is at stake.
Princeton philosopher Richard Chappell worries about how Eliezer's meta-ethical theory depends on rigid designation, which in this context may amount to something like a semantic "trick." Previously and independently, an Oxford philosopher expressed the same worry to me in private.
Eliezer's theory also employs something like the method of reflective equilibrium, about which there are many grave concerns from Eliezer's fellow naturalists, including Richard Brandt, Richard Hare, Robert Cummins, Stephen Stich, and others.
My point is not to beat up on Eliezer's meta-ethical views. I don't even know if they're wrong. Eliezer is wickedly smart. He is highly trained in the skills of overcoming biases and properly proportioning beliefs to the evidence. He thinks with the precision of an AI researcher. In my opinion, that gives him large advantages over most philosophers. When Eliezer states and defends a particular view, I take that as significant Bayesian evidence for reforming my beliefs.
Rather, my point is that we need lots of smart people working on these meta-ethical questions. We need to solve these problems, and quickly. The universe will not wait for the pace of traditional philosophy to catch up.