Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

The Urgent Meta-Ethics of Friendly Artificial Intelligence

43 Post author: lukeprog 01 February 2011 02:15PM

Barring a major collapse of human civilization (due to nuclear war, asteroid impact, etc.), many experts expect the intelligence explosion Singularity to occur within 50-200 years.

That fact means that many philosophical problems, about which philosophers have argued for millennia, are suddenly very urgent.

Those concerned with the fate of the galaxy must say to the philosophers: "Too slow! Stop screwing around with transcendental ethics and qualitative epistemologies! Start thinking with the precision of an AI researcher and solve these problems!"

If a near-future AI will determine the fate of the galaxy, we need to figure out what values we ought to give it. Should it ensure animal welfare? Is growing the human population a good thing?

But those are questions of applied ethics. More fundamental are the questions about which normative ethics to give the AI: How would the AI decide if animal welfare or large human populations were good? What rulebook should it use to answer novel moral questions that arise in the future?

But even more fundamental are the questions of meta-ethics. What do moral terms mean? Do moral facts exist? What justifies one normative rulebook over the other?

The answers to these meta-ethical questions will determine the answers to the questions of normative ethics, which, if we are successful in planning the intelligence explosion, will determine the fate of the galaxy.

Eliezer Yudkowsky has put forward one meta-ethical theory, which informs his plan for Friendly AI: Coherent Extrapolated Volition. But what if that meta-ethical theory is wrong? The galaxy is at stake.

Princeton philosopher Richard Chappell worries about how Eliezer's meta-ethical theory depends on rigid designation, which in this context may amount to something like a semantic "trick." Previously and independently, an Oxford philosopher expressed the same worry to me in private.

Eliezer's theory also employs something like the method of reflective equilibrium, about which there are many grave concerns from Eliezer's fellow naturalists, including Richard Brandt, Richard Hare, Robert Cummins, Stephen Stich, and others.

My point is not to beat up on Eliezer's meta-ethical views. I don't even know if they're wrong. Eliezer is wickedly smart. He is highly trained in the skills of overcoming biases and properly proportioning beliefs to the evidence. He thinks with the precision of an AI researcher. In my opinion, that gives him large advantages over most philosophers. When Eliezer states and defends a particular view, I take that as significant Bayesian evidence for reforming my beliefs.

Rather, my point is that we need lots of smart people working on these meta-ethical questions. We need to solve these problems, and quickly. The universe will not wait for the pace of traditional philosophy to catch up.

Comments (249)

Comment author: Wei_Dai 02 February 2011 12:44:01AM *  21 points [-]

I think Eliezer's meta-ethics is wrong because it's possible that we live in a world where Eliezer's "right" doesn't actually designate anything. That is, where a typical human's morality, when extrapolated, fails to be coherent. "Right" should still mean something in a world like that, but it doesn't under Eliezer's theory.

Also, to jump the gun a bit, your own meta-ethics, desirism, says:

Thus, morality is the practice of shaping malleable desires: promoting desires that tend to fulfill other desires, and discouraging desires that tend to thwart other desires.

What does this mean in the FAI context? To a super-intelligent AI, it's own desires, as well as those of everyone else on Earth, can be considered "malleable", in the sense that it can change all of them if it wanted to. But there might be some other super-intelligent AIs (created by aliens) whose desires it is powerless to change. I hope desirism doesn't imply that it should change my desires so as to fulfill the alien AIs' desires...

Comment author: Eliezer_Yudkowsky 02 February 2011 12:56:33AM 6 points [-]

What should it mean in a world like that?

Comment author: Wei_Dai 02 February 2011 01:31:23AM 16 points [-]

I haven't found a satisfactory meta-ethics yet, so I still don't know. But whatever the answer is, it has to be at least as good as "my current (unextrapolated) preferences". "Nothing" is worse than that, so it can't be the correct answer.

Comment author: Vladimir_Nesov 03 February 2011 05:27:03PM *  4 points [-]

This is actually a useful way of looking at what metaethics (decision theory) is: tools for self-improvement, explaining specific ways in which correctness of actions (or correctness of other tools of the same kind) can be judged. In this sense, useless metaethics is one that doesn't help you with determining what should be done, and wrong metaethics is one that's actively stupid, suggesting you to do things that you clearly shouldn't (for FAI based on that metaethics, correspondingly doing things that it shouldn't).

In this sense, the injunction of doing nothing in response to failed assumptions (i.e. no coherence actually present) in CEV is not stupid, since your own non-extrapolated mind is all you'll end up with in case CEV shuts down. It is a contingency plan for the case it turns out to be useless.

(I find new obvious things everywhere after the recent realization that any explicit consideration an agent knows is subject to whole agent's judgment, even "preference" or "logical correctness". This also explains a bit of our talking past each other in the other thread.)

Comment author: Wei_Dai 03 February 2011 08:51:24PM 6 points [-]

(I find new obvious things everywhere after the recent realization that any explicit consideration an agent knows is subject to whole agent's judgment, even "preference" or "logical correctness". This also explains a bit of our talking past each other in the other thread.)

I don't have much idea what you mean here. This seems important enough to write up as more than a parenthetical remark.

Comment author: Vladimir_Nesov 04 February 2011 12:39:14PM *  7 points [-]

I spent a lot of time laboring under the intuition that there's some "preference" thingie that summarizes all we care about, that we can "extract" from (define using a reference to) people and have an AI optimize it. In the lingo of meta-ethics, that would be "right" or "morality", and it distanced itself from the overly specific "utility" that also has the disadvantage of forgetting that prior is essential.

Then, over the last few months, as I was capitalizing on finally understanding UDT in May 2010 (despite having convinced a lot of people that I understood it long before that, I completely failed to get the essential aspect of controlling the referents of fixed definitions, and only recognized in retrospect that what I figured out by that time was actually UDT), I noticed that a decision problem requires many more essential parts than just preference, and so to specify what people care about, we need a whole human decision problem. But the intuition that linked to preference in particular, which was by then merely a part of the decision problem, still lingered, and so I failed to notice that now not preference, but the whole decision problem, is analogous to "right" and "morality" (but not quite, since that decision problem still won't be the definition of right, it can be judged in turn), and the whole agent that implements such decision problem is the best tool available to judge them.

This agent, in particular, can find itself judging its own preference, or its own inference system, or its whole architecture that might or might not specify an explicit inference system as its part, and so on. Whatever explicit consideration it's moved by, that is whatever module in the agent (decision problem) it considers, there's a decision problem of self-improvement where the agent replaces that module with something else, and things other than that module can have a hand in deciding.

Also, there's little point in distinguishing "decision problem" and "agent", even though there is a point in distinguishing a decision problem and what's right. Decision problem is merely a set of tricks that the agent is willing to use, as is agent's own implementation. What that set of tricks wants to do is not specified in any of the tricks, and the tricks can well fail the agent.

When we apply these considerations to humans, it follows that no human can know what they care about, they can only guess (and, indeed, design) heuristic rules for figuring out what they care about, and the same applies to when they construct FAIs. So extracting "preference" exactly is not possible, instead FAI should be seen as a heuristic, that would still be subject to moral judgment and probably won't capture it whole, just as humans themselves don't implement what's right reliably. Recognizing that FAI won't be perfect, and that things it does are merely ways of more reliably doing the right thing, looks like an important intuition.

(This is apparently very sketchy and I don't expect it to get significantly better for at least a few months. I could talk more (thus describing more of the intuition), but not clearer, because I don't understand this well myself. An alternative would have me write up some unfinished work that would clarify each particular intuition, but would be likely of no lasting value, and so should wait for a better rendition instead.)

Comment author: cousin_it 04 February 2011 12:58:02PM *  7 points [-]

it follows that no human can know what they care about

This sounds weird, like you've driven off a cliff or something. A human mind is a computer of finite complexity. If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty which may or may not be reduced by applying powerful math. Or do I misunderstand you? Maybe the following two questions will help clarify things:

a) Can a paperclipper know what it cares about?

b) How is a human fundamentally different from a paperclipper with respect to (a)?

Comment author: Vladimir_Nesov 04 February 2011 01:07:51PM *  1 point [-]

If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty.

Hence "explicit considerations", that is not up to logical uncertainty. Also, you need to know that you care about logic to talk of "up to logical uncertainty" as getting you closer to what you want.

Similarly (unhelpfully), everyone knows what they should do up to moral uncertainty.

Can a paperclipper know what it cares about?

No, at least while it's still an agent in the same sense, so that it still has the problem of self-improvement on its hands, and hasn't disassembled itself into actual paperclips. For a human, its philosophy of precise reasoning about paperclips won't look like an adequate activity to spend resources on, but for the paperclipper, understanding paperclips really well is important.

Comment author: cousin_it 04 February 2011 01:22:02PM *  3 points [-]

OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality? I doubt it.

ETA:

Also, you need to know that you care about logic to talk of "up to logical uncertainty" as getting you closer to what you want.

I defy the possibility that we may "not care about logic" in the sense that you suggest.

Comment author: Vladimir_Nesov 04 February 2011 01:29:06PM *  3 points [-]

OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality?

(Not "morality" here, of course, but its counterpart in the analogy.)

What is to guide its self-improvement? How is it to best convert the Sun into more computing machinery, in the face of logical uncertainty about consequences of such an action? What is meant by "actually proving it"? Does quantum suicide count as a method for achieving its goal? When should it risk performing an action in the environment, given that it could damage its own hardware as a result? When should it risk improving its inference system, given that there's a risk that this improvement will turn out to increase the time necessary to perform the proof, perhaps even eventually leading to moving this time outside what's physically available in our universe? Heuristics everywhere, no easy methods for deciding what should be done.

Comment author: Vladimir_Nesov 04 February 2011 01:33:00PM *  1 point [-]

I defy the possibility that we may "not care about logic" in the sense that you suggest.

In a decision between what's logical and what's right, you ought to choose what's right.

Comment author: Wei_Dai 04 February 2011 09:16:23PM 2 points [-]

I'm generally sympathetic towards these intuitions, but I have a few reservations:

  1. Isn't it possible that it only looks like "heuristics all the way down" because we haven't dug deep enough yet? Perhaps in the not too distant future, someone will come up with some insights that will make everything clear, and we can just implement that.
  2. What is the nature of morality according to your approach? You say that a human can't know what they care about (which I assume you use interchangeably with "right", correct me if I'm wrong here). Is it because they can't, in principle, fully unfold the logical definition of right, or is it that they can't even define "right" in any precise way?
  3. This part assumes that your answer to the last question is "the latter". Usually when someone says "heuristic" they have a fully precise theory or problem statement that the heuristic is supposed to be an approximate solution to. How is an agent supposed to design a set of heuristics without a such a precise definition to guide it? Also, if the agent itself uses the words "morality" or "right", what do they refer to?
  4. If the answer to the question in 2 is "the former", do you have any idea what the precise definition of "right" looks like?
Comment author: Vladimir_Nesov 04 February 2011 09:57:51PM *  2 points [-]

Isn't it possible that it only looks like "heuristics all the way down" because we haven't dug deep enough yet?

Everything's possible, but doesn't seem plausible at this point, and certainly not at human level. To conclude that something is not a heuristic, but the thing itself, one would need too much certainty to be expected of such a question.

What is the nature of morality according to your approach? You say that a human can't know what they care about (which I assume you use interchangeably with "right", correct me if I'm wrong here).

I did use that interchangeably.

Is it because they can't, in principle, fully unfold the logical definition of right, or is it that they can't even define "right" in any precise way?

Both (the latter). Having an explicit definition would correspond to "preference" which I discussed in the grandparent comment. But when we talk of merely "precise", at least in principle we could hope to obtain a significantly more precise description, maybe even on human level, which is what meta-ethics should strive to give us. Every useful heuristic is an element of such a description, and some of the heuristics, such as laws of physics, are very precise.

How is an agent supposed to design a set of heuristics without such a precise definition to guide it?

The current heuristics, its current implementation, which is understood to be fallible.

Also, if the agent itself uses the words "morality" or "right", what do they refer to?

Don't know (knowing would give a definition). To the extent it's known, see the current heuristics (long list), maybe brains.

Comment author: Wei_Dai 04 February 2011 10:18:36PM *  3 points [-]

Essentially, what you're describing is just the situation that we are actually faced with. I mean, when I use the word "right" I think I mean something but I don't know what. And I have to use my current heuristics, my current implementation without having a precise theory to guide me.

And you're saying that this situation is unlikely to change significantly by the time we build an FAI, so the best we can expect to do is equivalent to a group of uploads improving themselves to the best of their abilities.

I tend to agree with this (although I think I assign a higher probability that someone does make a breakthrough than you perhaps do), but it doesn't really constitute a meta-ethics, at least not in the sense that Eliezer and philosophers use that word.

Comment author: Vladimir_Nesov 04 February 2011 10:28:02PM *  2 points [-]

Essentially, what you're describing is just the situation that we are actually faced with.

I'm glad it all adds up to normality, given the amount of ink I spilled getting to this point.

And you're saying that you don't expect this situation to change significantly by the time we build an FAI, so the best we can do is equivalent to a group of uploads improving themselves to the best of their abilities.

Not necessarily. The uploads construct could in principle be made abstract, with efficient algorithms figuring out the result of the process much quickly than if it's actually simulated. More specific heuristics could be figured out that make use of computational resources to make better progress, maybe on early stages by the uploads construct.

it doesn't really constitute a meta-ethics, at least not in the sense that Eliezer and philosophers use that word.

I'm not sure about that. If it's indeed all we can say about morality right now, then that's what we have to say, even if it doesn't belong to the expected literary genre. It's too easy to invent fake explanations, and absence of conclusions invites that, where a negative conclusion could focus the effort elsewhere.

(Also, I don't remember particular points on which my current view disagrees with Eliezer's sequence, although I'd need to re-read it to have a better idea, which I really should, since I only read it as it was posted, when my understanding of the area was zilch.)

Comment author: Perplexed 03 February 2011 09:20:52PM 1 point [-]

I second this request. In particular, please clarify whether "preference" and "logical correctness" are presented here as examples of "explicit considerations". And whether whole agent should be parsed as including all the sub-agents? Or perhaps as extrapolated agent?

Comment author: aletheilia 04 February 2011 11:52:52AM 0 points [-]

Perhaps he's refering to the part of CEV that says "extrapolated as we wish that extrapolated, interpreted as we wish that interpreted". Even logical coherence becomes in this way a focus of extrapolation dynamics, and if this criterion should be changed to something else - as judged by the whole of our extrapolated morality in a strange-loopy way - well, so be it. The dynamics should reflect on itself and consider the foundational assumptions it was built upon, including the compelingness of basic logic we are currently so certain about - and of course, if it really should reflect on itself in this way.

Anyway, I'd really like to hear what Vladimir has to say about this. Even though it's often quite hard for me to parse his writings, he does seem to clear things up for me or at least direct my attention towards some new, unexplored areas...

Comment author: Vladimir_Nesov 03 February 2011 05:50:26PM *  2 points [-]

...and continuing from the other comment, the problem here is that one meta-ethical conclusion seems to be that no meta-ethics can actually define what "right" is. So any meta-ethics would only pour a limited amount of light on the question, and is expected to have failure modes, where the structure of the theory is not quite right. It's a virtue of a meta-ethical theory to point out explicitly some of its assumptions, which, if not right, would make the advice it gives incorrect. In this case, we have an assumption of reflective coherence in human value, and a meta-ethics that said that if it's not so, then it doesn't know anything. I'm pretty sure that Eliezer would disagree with the assertion that if any given meta-ethics, including some version of his own, would state that the notion of "right" is empty, then "right" is indeed empty (see "the moral void").

Comment author: timtyler 04 February 2011 09:18:48AM *  -2 points [-]

I haven't found a satisfactory meta-ethics yet, so I still don't know. But whatever the answer is, it has to be at least as good as "my current (unextrapolated) preferences". "Nothing" is worse than that, so it can't be the correct answer.

Better - for you, maybe. Under your hypothesis, what is good for you would be bad for others - so unless your meta-ethical system privileges you, this line of argument doesn't seem to follow.

Comment author: lukeprog 06 February 2011 07:18:19AM 2 points [-]

Wei_Dai,

Alonzo Fyfe and I are currently researching and writing a podcast on desirism, and we'll eventually cover this topic. The most important thing to note right now is that desirism is set up as a theory that explains things very specific things: human moral concepts like negligence, excuse, mens rea, and a dozen other things. You can still take the foundational meta-ethical principles of desirism - which are certainly not unique to desirism - and come up with implications for FAI. But they may have little in common with the bulk of desirism that Alonzo usually talks about.

But I'm not trying to avoid your question. These days, I'm inclined to do meta-ethics without using moral terms at all. Moral terms are so confused, and carry such heavy connotational weights, that using moral terms is probably the worst way to talk about morality. I would rather just talk about reasons and motives and counterfactuals and utility functions and so on.

Leaving out ethical terms, what implications do my own meta-ethical views have for Friendly AI? I don't know. I'm still catching up with the existing literature on Friendly AI.

Comment author: Wei_Dai 07 February 2011 03:29:34AM 2 points [-]

What are the foundational meta-ethical principles of desirism? Do you have a link?

Comment author: lukeprog 07 February 2011 04:39:05AM *  1 point [-]

Hard to explain. Alonzo Fyfe and I are currently developing a structured and technical presentation of the theory, so what you're asking for is coming but may not be ready for many months. It's a reasons-internalist view, and actually I'm not sure how much of the rest of it would be relevant to FAI.

Comment author: TheOtherDave 02 February 2011 01:01:43AM 1 point [-]

So if a system of ethics entails that "right" doesn't designate anything actual, you reject that system. Can you say more about why?

Comment author: Wei_Dai 02 February 2011 01:42:28AM 0 points [-]

Does my answer to Eliezer answer your question as well?

Comment author: TheOtherDave 02 February 2011 03:04:48AM 2 points [-]

I'm not sure.

Projecting that answer onto my question I get something like "Because ethical systems in which "right" has an actual referent are better, for unspecified reasons, than ones in which it doesn't, and Wei Dai's current unextrapolated preferences involve an actual though unspecified referent for "right," so we can at the very least reject all systems where "right" doesn't designate anything actual in favor of the system Wei Dai's current unextrapolated preferences implement, even if nothing better ever comes along."

Is that close enough to your answer?

Comment author: Wei_Dai 02 February 2011 03:50:06AM 0 points [-]

Yes, close enough.

Comment author: TheOtherDave 02 February 2011 03:57:54AM 2 points [-]

In that case, not really... what I was actually curious about is why "right" having a referent is important.

Comment author: Wei_Dai 02 February 2011 08:25:10PM 5 points [-]

I can make the practical case: If "right" refers to nothing, and we design an FAI to do what is right, then it will do nothing. We want the FAI to do something instead of nothing, so "right" having a referent is important.

Or the philosophical case: If "right" refers to nothing, then "it's right for me to save that child" would be equivalent to the null sentence. From introspection I think I must mean something non-empty when I say something like that.

Do either of these answer your question?

Comment author: XiXiDu 02 February 2011 08:47:02PM 7 points [-]

I can make the practical case: If "right" refers to nothing, and we design an FAI to do what is right, then it will do nothing.

Congratulations, you just solved the Fermi paradox.

Comment author: TheOtherDave 02 February 2011 10:09:14PM *  3 points [-]

(sigh) Sure, agreed... if our intention is to build an FAI to do what is right, it's important that "what is right" mean something. And I could ask why we should build an FAI that way, and you could tell me that that's what it means to be Friendly, and on and on.

I'm not trying to be pedantic here, but this does seem sort of pointlessly circular... a discussion about words rather than things.

When a Jewish theist says "God has commanded me to save that child," they may be entirely sincere, but that doesn't in and of itself constitute evidence that "God" has a referent, let alone that the referent of "God" (supposing it exists) actually so commanded them.

When you say "It's right for me to save that child," the situation may be different, but the mere fact that you can utter that sentence with sincerity doesn't constitute evidence of difference.

If we really want to save children, I would say we should talk about how most effectively to save children, and design our systems to save children, and that talking about whether God commanded us to save children or whether it's right to save children adds nothing of value to the process.

More generally, if we actually knew everything we wanted, as individuals and groups, then we could talk about how most effectively to achieve that and design our FAIs to achieve that and discussions about whether it's right would seem as extraneous as discussions about discussions about whether it's God-willed.

The problem is that we don't know what we want. So we attach labels to that-thing-we-don't-understand, and over time those labels adopt all kinds of connotations that make discussion difficult. The analogy to theism applies here as well.

At some point, it becomes useful to discard those labels.

A CEV-implementing FAI, supposing such a thing is possible, will do what we collectively want done, whatever that turns out to be. A FAI implementing some other strategy will do something else. Whether those things are right is just as useless to talk about as whether they are God's will; those terms add nothing to the conversation.

Comment author: Wei_Dai 02 February 2011 10:56:32PM 4 points [-]

TheOtherDave, I don't really want to argue about whether talking about "right" adds value. I suspect it might (i.e., I'm not so confident as you that it doesn't), but mainly I was trying to argue with Eliezer on his own terms. I do want to correct this:

A CEV-implementing FAI, supposing such a thing is possible, will do what we collectively want done, whatever that turns out to be.

CEV will not do "what we collectively want done", it will do what's "right" according to Eliezer's meta-ethics, which is whatever is coherent amongst the volitions it extrapolates from humanity, which as others and I have argued, might turn out to be "nothing". If you're proposing that we build an AI that does do "what we collectively want done", you'd have to define what that means first.

Comment author: TheOtherDave 02 February 2011 11:09:35PM 1 point [-]

I don't really want to argue about whether talking about "right" adds value.

OK. The question I started out with, way at the top of the chain, was precisely about why having a referent for "right" was important, so I will drop that question and everything that descends from it.

As for your correction, I actually don't understand the distinction you're drawing, but in any case I agree with you that it might turn out that human volition lacks a coherent core of any significance.

Comment author: Vladimir_Nesov 02 February 2011 01:35:57AM 1 point [-]

I think Eliezer's meta-ethics is wrong because it's possible that we live in a world where Eliezer's "right" doesn't actually designate anything.

In what way? Since the idea hasn't been given much technical clarity, even if it moves conceptual understanding a long way, it's hard for me to imagine how one can arrive at confidence in a strong statement like that.

Comment author: Wei_Dai 02 February 2011 01:41:43AM 3 points [-]

I'm not sure what you're asking. Are you asking how it is possible that Eliezer's "right" doesn't designate anything, or how that implies Eliezer's meta-ethics is wrong?

Comment author: Vladimir_Nesov 02 February 2011 01:44:18AM *  1 point [-]

I'm asking (1) how is it possible that Eliezer's "right" doesn't designate anything, and (2) how could you arrive at such a strong conclusion based on his non-technical writings, since he could just mean something different, or could have insufficient precision in his own idea to determine this property (this is a meta-point possibly subsumed by the first point).

Comment author: Wei_Dai 02 February 2011 03:50:02AM 13 points [-]

how is it possible that Eliezer's "right" doesn't designate anything

Eliezer identifies "right" with "the ideal morality that I would have if I heard all the arguments, to whatever extent such an extrapolation is coherent." It is possible that human morality, when extrapolated, shows no coherence, in which case Eliezer's "right" doesn't designate anything.

how could you arrive at such a strong conclusion based on his non-technical writings, since he could just mean something different, or could have insufficient precision in his own idea to determine this property

Are you saying that Eliezer's general approach might still turn out to be correct, if we substitute better definitions or understandings of "extrapolation" and/or "coherence"? If so, I agree, and I didn't mean to exclude this possibility with my original statement. Should I have made it clearer when I said "I think Eliezer's meta-ethics is wrong" that I meant "based on my understanding of Eliezer's current ideas"?

Comment author: Vladimir_Nesov 02 February 2011 04:28:44AM 3 points [-]

It is possible that human morality, when extrapolated, shows no coherence

For example, I have no idea what this means. I don't know what "extrapolated" means, apart from some vague intuitions, and even what "coherent" means.

Are you saying that Eliezer's general approach might still turn out to be correct, if we substitute better definitions or understandings of "extrapolation" and/or "coherence"?

Better than what? I have no specific adequate candidates, only a direction of research.

Comment author: Wei_Dai 02 February 2011 08:52:38PM *  2 points [-]

It is possible that human morality, when extrapolated, shows no coherence

For example, I have no idea what this means.

Did you read the thread I linked to in my opening comment, where Marcello and I argued in more detail why we think that? Perhaps we can move the discussion there, so you can point out where you disagree with or not understand us?

Comment author: Vladimir_Nesov 02 February 2011 09:38:31PM *  3 points [-]

To respond to that particular argument, which I don't see how substantiates the point that morality according to Eliezer's meta-ethics could be void.

When you're considering what a human mind would conclude upon considering certain new arguments, you're thinking of ways to improve it. A natural heuristic is to add opportunity for reflection, but obviously exposing one to "unbalanced" argument can lead a human mind anywhere. So you suggest a heuristic of looking for areas of "coherence" in conclusions reached upon exploration of different ways of reflecting.

But this "coherence" is also merely a heuristic. What you want is to improve the mind in the right way, not in coherent way, or balanced way. So you let the mind reflect on strategies for exposing itself to more reflection, and then on strategies for reflecting on reflecting on strategies for getting more reflection, and so on, in any way deemed appropriate by the current implementation. There's probably no escaping this unguided stage, for the most right guide available is the agent itself (unfortunately).

What you end up with won't have opportunity to "regret" past mistakes, for every regret is recognition of an error, and any error can be corrected (for the most part). What's wrong with "incoherent" future growth? Does lack of coherence indicate a particular error, something not done right? If it does, that could be corrected. If it doesn't, everything is fine.

(By the way, this argument could potentially place advanced human rationality and human understanding of decision theory and meta-ethics directly on track to a FAI, with the only way of making a FAI using a human (upload) group self-improvement.)

Comment author: Wei_Dai 02 February 2011 10:56:02PM 4 points [-]

I believe that in Eliezer's meta-ethics, both the extrapolation procedure and the coherence property are to be given fixed logical definitions as part of the meta-ethics, and are not just "heuristics" to be freely chosen by the subject being extrapolated. You seem to be describing your own ideas, which are perhaps similar enough to Eliezer's to be said to fall under his general approach, but I don't think can be said to be Eliezer's meta-ethics.

making a FAI using a human (upload) group self-improvement

Seems like a reasonable idea, but again, almost surely not what Eliezer intended.

Comment author: Vladimir_Nesov 02 February 2011 11:03:47PM 1 point [-]

I believe that in Eliezer's meta-ethics, both the extrapolation procedure and the coherence property are to be given fixed logical definitions as part of the meta-ethics, and are not just "heuristics" to be freely chosen by the subject being extrapolated.

Why "part of meta-ethics"? That would make sense as part of FAI design. Surely the details are not to be chosen "freely", but still there's only one criterion for anything, and that's full morality. For any fixed logical definition, any element of any design, there's a question of what could improve it, make the consequences better.

Comment author: Blueberry 02 February 2011 05:00:16AM 1 point [-]

For example, I have no idea what this means. I don't know what "extrapolated" means, apart from some vague intuitions, and even what "coherent" means.

It means, for instance, that segments of the population who have different ideas on controversial moral questions like abortion or capital punishment actually have different moralities and different sets of values, and that we as a species will never agree on what answers are right, regardless of how much debate or discussion or additional information we have. I strongly believe this to be true.

Comment author: Vladimir_Nesov 02 February 2011 10:48:42AM 2 points [-]

Clearly, I know all this stuff, so I meant something else. Like not having more precise understanding (that could also easily collapse this surface philosophizing).

Comment author: Blueberry 02 February 2011 07:21:23PM 1 point [-]

Well, yes, I know you know all this stuff. Are you saying we can't meaningfully discuss it unless we have a precise algorithmic definition of CEV? People's desires and values are not that precise. I suspect we can only discuss it in vague terms until we come up with some sort of iterative procedure that fits our intuition of what CEV should be, at which point we'll have to operationally define CEV as that procedure.

Comment author: ewbrownv 01 February 2011 04:56:14PM 9 points [-]

CEV also has the problem that nothing short of a superintelligence could actually use it, so unless AI has a really hard takeoff you're going to need something less complicated for your AI to use in the meantime.

Personally I've always thought EY places too much emphasis on solving the whole hard problem of ultimate AI morality all at once. It would be quite valuable to see more foundation-building work on moral systems for less extreme sots of AI, with an emphasis on avoiding bad failure modes rather than trying to get the best possible outcome. That’s the sort of research that could actually grow into an academic sub-discipline, and I’d expect it to generate insights that would help with attempts to solve the SI morality problem.

Of course, the last I heard EY was still predicting that dangerous levels of AI will come along in less time than it would take such a discipline to develop. The gradual approach could work if it takes 100 years to go from mechanical kittens to Skynet’s big brother, but not if it only takes 5.

Comment author: jacob_cannell 02 February 2011 07:02:07AM 1 point [-]

Agreed.

Also note that a really hard fast takeoff is even more of a reason to shift emphasis away from distant uncomputable impracticable problems and focus on the vastly smaller set of actual practical choices that we can make now.

Comment author: RichardChappell 01 February 2011 04:54:10PM 9 points [-]

Just want to flag that it's not entirely obvious that we need to settle questions in meta-ethics in order to get the normative and applied ethics right. Why not just call for more work directly in the latter fields?

Comment author: lukeprog 01 February 2011 06:18:25PM 1 point [-]

Yes, that's a claim that in my experience, most philosophers disagree with. It's one I'll need to argue for. But I do think one's meta-ethical views have large implications for one's normative views that are often missed.

Comment author: utilitymonster 01 February 2011 09:38:59PM 1 point [-]

Even if we grant that one's meta-ethical position will determine one's normative theory (which is very contentious), one would like some evidence that it would be easier to find the correct meta-ethical view than it would be to find the correct (or appropriate, or whatever) normative ethical view. Otherwise, why not just do normative ethics?

Comment author: lukeprog 01 February 2011 09:49:30PM 3 points [-]

My own thought is that doing meta-ethics may illuminate normative theory, but I could be wrong about that. For example, I think doing meta-ethics right seals the deal for consequentialism, but not utilitarianism.

Comment author: Vladimir_Nesov 01 February 2011 11:14:55PM 1 point [-]

My own thought is that doing meta-ethics may illuminate normative theory, but I could be wrong about that.

Since nobody understands these topics with enough clarity, and they seem related, I don't see how anyone can claim with confidence that they actually aren't related. So you saying that you "could be wrong about that" doesn't communicate anything about your understanding.

Comment author: torekp 02 February 2011 01:05:37AM 0 points [-]

Many attempts to map out normative ethics wander substantially into meta-ethics, and vice versa. Especially the better ones. So I doubt it matters all that much where one starts - the whole kit and caboodle soon will figure into the discussion.

Comment author: [deleted] 01 February 2011 10:01:18PM 8 points [-]

I like this post but I'd like a better idea of how it's meant to be taken to the concrete level.

Should SIAI try to hire or ask for contributions from the better academic philosophers? (SIAI honchos could do that.)

Should there be a concerted effort to motivate more research in "applied" meta-ethics, the kind that talks to neuroscience and linguistics and computer science? (Philosophers and philosophy students anywhere could do that.)

Should we LessWrong readers, and current or potential SIAI workers, educate ourselves about mainstream meta-ethics, so that we know more about it than just the Yudkowsky version, and be able to pick up on errors? (Anyone reading this site can do that.)

Comment author: CarlShulman 02 February 2011 12:39:41AM *  6 points [-]

Note that the Future of Humanity Institute is currently hiring postdocs, either with backgrounds in philosophy or alternatively in math/cognitive science/computer science. There is close collaboration between FHI and SIAI, and the FHI is part of Oxford University, which is a bit less of a leap for a philosophy graduate student.

Comment author: AnnaSalamon 01 February 2011 10:04:23PM 3 points [-]

Folks who are anyhow heading into graduate school, and who have strengths and interests in social science, should perhaps consider focusing on moral psychology research.

But I'm not at all sure of that -- if someone is aiming at existential risk reduction, there are many other useful paths to consider, and a high opportunity cost to choosing one and not others.

Comment author: [deleted] 01 February 2011 10:21:10PM 2 points [-]

That's true -- I'm just trying to get a sense of what lukeprog is aiming at.

Just thinking out loud, for a moment: if AI really is an imminent possibility, AI strong enough that what it chooses to do is a serious issue for humanity's safety, and if we think that we can lessen the probability of disaster by defining and building moral machines, then it's very, very important to get our analysis right before anyone starts programming. (This is just my impression of what I've read from the site, please correct me if I misunderstood.) In which case, more moral psychology research (or research in other fields related to metaethics) is really important, unless you think that there's no further work to be done. Is it the best possible use of any one person's time? I'd say, probably not, except if you are already in an unusual position. There are not many top students or academics in these fields, and even fewer who have heard of existential risk; if you are one, and you want to, this doesn't seem like a terrible plan.

Comment author: lukeprog 01 February 2011 11:04:08PM 9 points [-]

I don't yet have much of an opinion on what the best way to do it is, I'm just saying it needs doing. We need more brains on the problem. Eliezer's meta-ethics is, I think, far from obviously correct. Moving toward normative ethics, CEV is also not obviously the correct solution for Friendly AI, though it is a good research proposal. The fate of the galaxy cannot rest on Eliezer's moral philosophy alone.

We need critically-minded people to say, "I don't think that's right, and here are four arguments why." And then Eliezer can argue back, or change his position. And then the others can argue back, or change their positions. This is standard procedure for solving difficult problems, but as of yet I haven't seen much published dialectic like this in trying to figure out the normative foundations for the Friendly AI project.

Let me give you an explicit example. CEV takes extrapolated human values as the source of an AI's eventually-constructed utility function. Is that the right way to go about things, or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function, whether or not they happen to be reasons for action arising from the brains of a particular species of primate on planet Earth? What if there are 5 other intelligent species in the galaxy who interests will not at all be served when our Friendly AI takes over the galaxy? Is that really the right thing to do? How would we go about answering questions like that?

Comment author: Eliezer_Yudkowsky 02 February 2011 05:48:02PM 8 points [-]

or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function

...this sentence makes me think that we really aren't on the same page at all with respect to naturalistic metaethics. What is a reason for action? How would a computer program enumerate them all?

Comment author: lukeprog 02 February 2011 08:47:41PM *  6 points [-]

A 'reason for action' is the standard term in Anglophone philosophy for a source of normativity of any kind. For example, a desire is the source of normativity in a hypothetical imperative. Others have proposed that categorical imperatives exist, and provide reasons for action apart from desires. Some have proposed that divine commands exist, and are sources of normativity apart from desires. Others have proposed that certain objects or states of affairs can ground normativity intrinsically - i.e. that they have intrinsic value apart from being valued by an agent.

A source of normativity (a reason for action) is anything that grounds/justifies an 'ought' or 'should' statement. Why should I look both ways before crossing the street? Presumably, this 'should' is justified by reference to my desires, which could be gravely thwarted if I do not look both ways before crossing the street. If I strongly desired to be run over by cars, the 'should' statement might no longer be justified. Some people might say I should look both ways anyway, because God's command to always look before crossing a street provides me with reason for action to do that even if it doesn't help fulfill my desires. But I don't believe that proposed reason for action exists.

Comment author: ata 02 February 2011 10:39:22PM *  7 points [-]

A 'reason for action' is the standard term in Anglophone philosophy for a source of normativity of any kind. For example, a desire is the source of normativity in a hypothetical imperative. Others have proposed that categorical imperatives exist, and provide reasons for action apart from desires. Some have proposed that divine commands exist, and are sources of normativity apart from desires. Others have proposed that certain objects or states of affairs can ground normativity intrinsically - i.e. that they have intrinsic value apart from being valued by an agent.

Okay, but all of those (to the extent that they're coherent) are observations about human axiology. Beware of committing the mind projection fallacy with respect to compellingness — you find those to be plausible sources of normativity because your brain is that of "a particular species of primate on planet Earth". If your AI were looking for "reasons for action" that would compel all agents, it would find nothing, and if it were looking for all of the "reasons for action" that would compel each possible agent, it would spend an infinite amount of time enumerating stupid pointless motivations. It would eventually notice categorical imperatives, fairness, compassion, etc. but it would also notice drives based on the phase of the moon, based on the extrapolated desires of submarines (according to any number of possible submarine-volition-extrapolating dynamics), based on looking at how people would want to be treated and reversing that, based on the number of living cats in the world modulo 241, based on modeling people as potted plants and considering the direction their leaves are waving...

Comment author: Eliezer_Yudkowsky 03 February 2011 12:45:55AM *  6 points [-]

Okay, see, this is why I have trouble talking to philosophers in their quote standard language unquote.

I'll ask again: How would a computer program enumerate all reasons for action?

Comment author: lukeprog 03 February 2011 03:43:56AM 8 points [-]

Eliezer,

I think the reason you're having trouble with the standard philosophical category of "reasons for action" is because you have the admirable quality of being confused by that which is confused. I think the "reasons for action" category is confused. At least, the only action-guiding norm I can make sense of is desire/preference/motive (let's call it motive). I should eat the ice cream because I have a motive to eat the ice cream. I should exercise more because I have many motives that will be fulfilled if I exercise. And so on. All this stuff about categorical imperatives or divine commands or intrinsic value just confuses things.

How would a computer program enumerate all motives (which according to me, is co-exensional with "all reasons for action")? It would have to roll up its sleeves and do science. As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems (as it had done already with us), and thereby enumerate all the motives it encounters in the universe, their strengths, the relations between them, and so on.

But really, I'm not yet proposing a solution. What I've described above doesn't even reflect my own meta-ethics. It's just an example. I'm merely raising questions that need to be considered very carefully.

And of course I'm not the only one to do so. Others have raised concerns about CEV and its underlying meta-ethical assumptions. Will Newsome raised some common worries about CEV and proposed computational axiology instead. Tarleton's 2010 paper compares CEV to an alternative proposed by Wallach & Collin.

The philosophical foundations of the Friendly AI project need more philosophical examination, I think. Perhaps you are very confident about your meta-ethical views and about CEV; I don't know. But I'm not confident about them. And as you say, we've only got one shot at this. We need to make sure we get it right. Right?

Comment author: Eliezer_Yudkowsky 03 February 2011 02:38:30PM *  13 points [-]

As it expands across the galaxy, perhaps encountering other creatures, it could do some behavioral psychology and neuroscience on these creatures to decode their intentional action systems

Now, it's just a wild guess here, but I'm guessing that a lot of philosophers who use the language "reasons for action" would disagree that "knowing the Baby-eaters evolved to eat babies" is a reason to eat babies. Am I wrong?

I'm merely raising questions that need to be considered very carefully.

I tend to be a bit gruff around people who merely raise questions; I tend to view the kind of philosophy I do as the track where you need some answers for a specific reason, figure them out, move on, and dance back for repairs if a new insight makes it necessary; and this being a separate track from people who raise lots of questions and are uncomfortable with the notion of settling on an answer. I don't expect those two tracks to meet much.

Comment author: lukeprog 03 February 2011 03:09:44PM 6 points [-]

I count myself among the philosophers who would say that "knowing the Baby-eaters want to eat babies" is not a reason (for me) to eat babies. Some philosophers don't even think that the Baby-eaters' desires to eat babies are reasons for them to eat babies, not even defeasible reasons.

I tend to be a bit gruff around people who merely raise questions

Interesting. I always assumed that raising a question was the first step toward answering it - especially if you don't want yourself to be the only person who tries to answer it. The point of a post like the one we're commenting on is that hopefully one or more people will say, "Huh, yeah, it's important that we get this issue right," and devote some brain energy to getting it right.

I'm sure the "figure it out and move on" track doesn't meet much with the "I'm uncomfortable settling on an answer" track, but what about the "pose important questions so we can work together to settle on an answer" track? I see myself on that third track, engaging in both the 'pose important questions' and the 'settle on an answer' projects.

Comment author: lukeprog 24 February 2011 11:38:40PM 3 points [-]

For the record, I currently think CEV is the most promising path towards solving the Friendly AI problem, I'm just not very confident about any solutions yet, and am researching the possibilities as quickly as possible, using my outline for Ethics and Superintelligence as a guide to research. I have no idea what the conclusions in Ethics and Superintelligence will end up being.

Comment author: lukeprog 14 February 2011 06:48:00AM *  3 points [-]

Here's an interesting juxtaposition...

Eliezer-2011 writes:

I tend to be a bit gruff around people who merely raise questions; I tend to view the kind of philosophy I do as the track where you need some answers for a specific reason, figure them out, move on, and dance back for repairs if a new insight makes it necessary; and this being a separate track from people who raise lots of questions and are uncomfortable with the notion of settling on an answer. I don't expect those two tracks to meet much.

Eliezer-2007 quotes Robyn Dawes, saying that the below is "so true it's not even funny":

Norman R. F. Maier noted that when a group faces a problem, the natural tendency of its members is to propose possible solutions as they begin to discuss the problem. Consequently, the group interaction focuses on the merits and problems of the proposed solutions, people become emotionally attached to the ones they have suggested, and superior solutions are not suggested. Maier enacted an edict to enhance group problem solving: "Do not propose solutions until the problem has been discussed as thoroughly as possible without suggesting any."

...

I have often used this edict with groups I have led - particularly when they face a very tough problem, which is when group members are most apt to propose solutions immediately. While I have no objective criterion on which to judge the quality of the problem solving of the groups, Maier's edict appears to foster better solutions to problems.

Is this a change of attitude, or am I just not finding the synthesis?

Eliezer-2011 seems to want to propose solutions very quickly, move on, and come back for repairs if necessary. Eliezer-2007 advises that for difficult problems (one would think that FAI qualifies) we take our time to understand the relevant issues, questions, and problems before proposing solutions.

Comment author: Perplexed 03 February 2011 03:04:08PM *  1 point [-]

If you don't spend much time on the track where people just raise questions, how do you encounter the new insights that make it necessary to dance back for repairs on your track?

Just asking. :)

Though I do tend to admire your attitude of pragmatism and impatience with those who dither forever.

Comment author: utilitymonster 03 February 2011 03:02:23PM *  1 point [-]

I can see that you might question the usefulness of the notion of a "reason for action" as something over and above the notion of "ought", but I don't see a better case for thinking that "reason for action" is confused.

The main worry here seems to have to do with categorical reasons for action. Diagnostic question: are these more troubling/confused than categorical "ought" statements? If so, why?

Perhaps I should note that philosophers talking this way make a distinction between "motivating reasons" and "normative reasons". A normative reason to do A is a good reason to do A, something that would help explain why you ought to do A, or something that counts in favor of doing A. A motivating reason just helps explain why someone did, in fact, do A. One of my motivating reasons for killing my mother might be to prevent her from being happy. By saying this, I do not suggest that this is a normative reason to kill my mother. It could also be that R would be a normative reason for me to A, but R does not motivate my to do A. (ata seems to assume otherwise, since ata is getting caught up with who these considerations would motivate. Whether reasons could work like this is a matter of philosophical controversy. Saying this more for others than you, Luke.)

Back to the main point, I am puzzled largely because the most natural ways of getting categorical oughts can get you categorical reasons. Example: simple total utilitarianism. On this view, R is a reason to do A if R is the fact that doing A would cause someone's well-being to increase. The strength of R is the extent to which that person's well-being increases. One weighs one's reasons by adding up all of their strengths. On then does the thing that one has most reason to do. (It's pretty clear in this case that the notion of a reason plays an inessential role in the theory. We can get by just fine with well-being, ought, causal notions, and addition.)

Utilitarianism, as always, is a simple case. But it seems like many categorical oughts can be thought of as being determined by weighing factors that count in favor of and count against the course of action in question. In these cases, we should be able to do something like what we did for util (though sometimes that method of weighing the reasons will be different/more complicated; in some bad cases, this might make the detour through reasons pointless).

The reasons framework seems a bit more natural in non-consequentialist cases. Imagine I try to maximize aggregate well-being, but I hate lying to do it. I might count the fact that an action would involve lying as a reason not to do it, but not believe that my lying makes the world worse. To get oughts out of a utility function instead, you might model my utility function as the result of adding up aggregate well-being and subtracting a factor that scales with the number of lies I would have to tell if I took the action in question. Again, it's pretty clear that you don't HAVE to think about things this way, but it is far from clear that this is confused/incoherent.

Perhaps the LW crowd is perplexed because people here take utility functions as primitive, whereas philosophers talking this way tend to take reasons as primitive and derive ought statements (and, on a very lucky day, utility functions) from them. This paper, which tries to help reasons folks and utility function folks understand/communicate with each other, might be helpful for anyone who cares much about this. My impression is that we clearly need utility functions, but don't necessarily need the reason talk. The main advantage to getting up on the reason talk would be trying to understand philosophers who talk that way, if that's important to you. (Much of the recent work in meta-ethics relies heavily on the notion of a normative reason, as I'm sure Luke knows.)

Comment author: lukeprog 03 February 2011 03:25:37PM *  1 point [-]

utilitymonster,

For the record, as a good old Humean I'm currently an internalist about reasons, which leaves me unable (I think) to endorse any form of utilitarianism, where utilitarianism is the view that we ought to maximize X. Why? Because internal reasons don't always, and perhaps rarely, support maximizing X, and I don't think external reasons for maximizing X exist. For example, I don't think X has intrinsic value (in Korsgaard's sense of "intrinsic value").

Thanks for the link to that paper on rational choice theories and decision theories!

Comment author: Nick_Tarleton 03 February 2011 09:19:01AM 1 point [-]

Tarleton's 2010 paper compares CEV to an alternative proposed by Wallach & Collin.

Nitpick: Wallach & Collin are cited only for the term 'artificial moral agents' (and the paper is by myself and Roko Mijic). The comparison in the paper is mostly just to the idea of specifying object-level moral principles.

Comment author: lukeprog 03 February 2011 02:57:13PM 0 points [-]

Oops. Thanks for the correction.

Comment author: [deleted] 04 February 2011 02:18:35AM 7 points [-]

I wonder, since it's important to stay pragmatic, if it would be good to design a "toy example" for this sort of ethics.

It seems like the hard problem here is to infer reasons for action, from an individual's actions. People do all sorts of things; but how can you tell from those choices what they really value? Can you infer a utility function from people's choices, or are there sets of choices that don't necessarily follow any utility function?

The sorts of "toy" examples I'm thinking of here are situations where the agent has a finite number of choices. Let's say you have Pac-Man in a maze. His choices are his movements in four cardinal directions. You watch Pac-Man play many games; you see what he does when he's attacked by a ghost; you see what he does when he can find something tasty to eat; you see when he's willing to risk the danger to get the food.

From this, I imagine you could do some hidden Markov stuff to infer a model of Pac-Man's behavior -- perhaps an if-then tree.

Could you guess from this tree that Pac-Man likes fruit and dislikes dying, and goes away from fruit only when he needs to avoid dying? Yeah, you could (though I don't know how to systematize that more broadly.)

From this, could you do an "extrapolated" model of what Pac-Man would do if he knew when and where the ghosts were coming? Sure -- and that would be, if I've understood correctly, CEV for Pac-Man.

It seems to me that, more subtle philosophy aside, this is what we're trying to do. I haven't read the literature lukeprog has, but it seems to me that Pac-Man's "reasons for actions" are completely described by that if-then tree of his behavior. Why didn't he go left that time? Because there was a ghost there. Why does that matter? Because Pac-Man always goes away from ghosts. (You could say: Pac-Man desires to avoid ghosts.)

It also seems to me, not that I really know this line of work, that one incremental thing that can be done towards CEV (or some other sort of practical metaethics) is this kind of toy model. Yes, ultimately understanding human motivation is a huge psychology and neuroscience problem, but before we can assimilate those quantities of data we may want to make sure we know what to do in the simple cases.

Comment author: whpearson 04 February 2011 02:46:25AM 4 points [-]

Could you guess from this tree that Pac-Man likes fruit and dislikes dying, and goes away from fruit only when he needs to avoid dying? Yeah, you could (though I don't know how to systematize that more broadly.)

Something like:

Run simulations of agents that can chose randomly out of the same actions as the agent has. Look for regularities in the world state that occur more or less frequently in the sensible agent compared to random agent. Those things could be said to be what it likes and dislikes respectively.

To determine terminal vs instrumental values look at the decision tree and see which of the states gets chosen when a choice is forced.

Comment author: [deleted] 04 February 2011 02:48:17AM *  0 points [-]

Thanks. Come to think of it that's exactly the right answer.

Comment author: Nisan 04 February 2011 05:57:04PM 0 points [-]

Perhaps the next step would be to add to the model a notion of second-order desire, or analyze a Pac-Man whose apparent terminal values can change when they're exposed to certain experiences or moral arguments.

Comment author: bigjeff5 03 February 2011 12:50:56AM 5 points [-]

If you want to be run over by cars, you should still look both ways.

You might miss otherwise!

Comment author: Benquo 03 February 2011 02:51:03AM 0 points [-]

One way might be enough, in that case.

Comment author: bigjeff5 03 February 2011 05:31:48AM 0 points [-]

That depends entirely on the street, and the direction you choose to look. ;)

Comment author: Sniffnoy 03 February 2011 12:57:29AM 0 points [-]

Depends on how soon you insist it happen.

Comment author: lukeprog 02 February 2011 08:58:20PM 1 point [-]

Sorry... what I said above is not quite right. There are norms that are not reasons for action. For example, epistemological norms might be called 'reasons to believe.' 'Reasons for action' are the norms relevant to, for example, prudential normativity and moral normativity.

Comment author: jimrandomh 02 February 2011 10:21:41PM 3 points [-]

This is either horribly confusing, or horribly confused. I think that what's going on here is that you (or the sources you're getting this from) have taken a bundle of incompatible moral theories, identified a role that each of them has a part playing, and generalized a term from one of those theories inappropriately.

The same thing can be a reason for action, a reason for inaction, a reason for belief and a reason for disbelief all at once, in different contexts depending on what consequences these things will have. This makes me think that "reason for action" does not carve reality, or morality, at the joints.

Comment author: utilitymonster 03 February 2011 02:31:22PM *  0 points [-]

I'm sort of surprised by how people are taking the notion of "reason for action". Isn't this a familiar process when making a decision?

  1. For all courses of action you're thinking of taking, identify the features (consequences if you that's you think about things) that count in favor of taking that course of action and those that count against it.

  2. Consider how those considerations weigh against each other. (Do the pros outweigh the cons, by how much, etc.)

  3. Then choose the thing that does best in this weighing process.

The same thing can be a reason for action, a reason for inaction, a reason for belief and a reason for disbelief all at once, in different contexts depending on what consequences these things will have. This makes me think that "reason for action" does not carve reality, or morality, at the joints.

It is not a presupposition of the people talking this way that if R is a reason to do A in a context C, then R is a reason to do in all contexts.

The people talking this way also understand that a single R might be both a reason to do A and a reason to believe X at the same time. You could also have R be a reason to believe X and a reason to cause yourself to not believe X. Why do you think these things make the discourse incoherent/non-perspicuous? This seems no more puzzling than the familiar fact that believing a certain thing could be epistemically irrational but prudentially rational to (cause yourself) to believe.

Comment author: ata 01 February 2011 11:39:44PM *  6 points [-]

or should we instead program an AI to figure out all the reasons for action that exist and account for them in its utility function, whether or not they happen to be reasons for action arising from the brains of a particular species of primate on planet Earth?

All the reasons for action that exist? Like, the preferences of all possible minds? I'm not sure that utility function would be computable...

Edit: Actually, if we suppose that all minds are computable, then there's only a countably infinite number of possible minds, and for any mind with a utility function U(x), there is a mind somewhere in that set with the utility function -U(x). So, depending on how you weight the various possible utility functions, it may be that they'd all cancel out.

What if there are 5 other intelligent species in the galaxy who interests will not at all be served when our Friendly AI takes over the galaxy? Is that really the right thing to do? How would we go about answering questions like that?

Notice that you're a human but you care about that. If there weren't something in human axiology that could lead to sufficiently smart and reflective people concluding that nonhuman intelligent life is valuable, you wouldn't have even thought of that — and, indeed, it seems that in general as you look at smarter, more informed, and more thoughtful people, you see less provincialism and more universal views of ethics. And that's exactly the sort of thing that CEV is designed to take into account. Don't you think that there would be (at least) strong support for caring about the interests of other intelligent life, if all humans were far more intelligent, knowledgeable, rational, and consistent, and heard all the arguments for and against it?

And if we were all much smarter and still largely didn't think it was a good idea to care about the interests of other intelligent species... I really don't think that'll happen, but honestly, I'll have to defer to the judgment of our extrapolated selves. They're smarter and wiser than me, and they've heard more of the arguments and evidence than I have. :)

Comment author: Vladimir_Nesov 02 February 2011 12:06:12AM *  6 points [-]

Notice that you're a human but you care about that. If there weren't something in human axiology that could lead to sufficiently smart and reflective people concluding that nonhuman intelligent life is valuable, you wouldn't have even thought of that — and, indeed, it seems that in general as you look at smarter, more informed, and more thoughtful people, you see less provincialism and more universal views of ethics. And that's exactly the sort of thing that CEV is designed to take into account.

The same argument applies to just using one person as the template and saying that their preference already includes caring about all the other people.

The reason CEV might be preferable to starting from your own preference (I now begin to realize) is that the decision to privilege yourself vs. grant other people fair influence is also subject to morality, so to the extent you can be certain about this being more moral, it's what you should do. Fairness, also being merely a heuristic, is subject to further improvement, as can be inclusion of volition of aliens in the original definition.

Of course, you might want to fall back to a "reflective injunction" of not inventing overly elaborate plans, since you haven't had the capability of examining them well enough to rule them superior to more straightforward plans, such as using volition of a single human. But this is still a decision point, and the correct answer is not obvious.

Comment author: Nisan 04 February 2011 07:04:56PM 0 points [-]

The reason CEV might be preferable to starting from your own preference (I now begin to realize) is that the decision to privilege yourself vs. grant other people fair influence is also subject to morality, so to the extent you can be certain about this being more moral, it's what you should do.

This reminds me of the story of the people who encounter a cake, one of whom claims that what's "fair" is that they get all the cake for themself. It would be a mistake for us to come to a compromise with them on the meaning of "fair".

Does the argument for including everyone in CEV also argue for including everyone in a discussion of what fairness is?

Comment author: XiXiDu 02 February 2011 06:24:18PM *  2 points [-]

Don't you think that there would be (at least) strong support for caring about the interests of other intelligent life, if all humans were far more intelligent, knowledgeable, rational, and consistent, and heard all the arguments for and against it?

But making humans more intelligent, more rational would mean to alter their volition. An FAI that would proactively make people become more educated would be similar to one that altered the desires of humans directly. If it told them that the holy Qur'an is not the word of God it would dramatically change their desires. But what if people actually don't want to learn that truth? In other words, any superhuman intelligence will have a very strong observer effect and will cause a subsequent feedback loop that will shape the future according to the original seed AI, or the influence of its creators. You can't expect to create a God and still be able to extrapolate the natural desires of human beings. Human desires are not just a fact about their evolutionary history but also a mixture of superstructural parts like environmental and cultural influences. If you have some AI God leading humans into the future then at some point you have altered all those structures and consequently changed human volition. The smallest bias in the original seed AI will be maximized over time by the feedback between the FAI and its human pets.

ETA You could argue that all that matters is the evolutionary template for the human brain. The best way to satisfy it maximally is what we want, what is right. But leaving aside the evolution of culture and the environment seems drastic. Why not go a step further and create a new better mind as well?

I also think it is a mistake to generalize from the people you currently know to be intelligent and reasonable as they might be outliers. Since I am a vegetarian I am used to people telling me that they understand what it means to eat meat but that they don't care. We should not rule out the possibility that the extrapolated volition of humanity is actually something that would appear horrible and selfish to us "freaks".

I really don't think that'll happen, but honestly, I'll have to defer to the judgment of our extrapolated selves. They're smarter and wiser than me, and they've heard more of the arguments and evidence than I have.

That is only reasonable if matters of taste are really subject to rational argumentation and judgement. If it really doesn't matter if we desire pleasure or pain then focusing on smarts might either lead to an infinite regress or nihilism.

Comment author: TheOtherDave 02 February 2011 12:23:39AM 3 points [-]

Judging from his posts and comments here, I conclude that EY is less interested in dialectic than in laying out his arguments so that other people can learn from them and build on them. So I wouldn't expect critically-minded people to necessarily trigger such a dialectic.

That said, perhaps that's an artifact of discussion happening with a self-selected crowd of Internet denizens... that can exhaust anybody. So perhaps a different result would emerge if a different group of critically-minded people, people EY sees as peers, got involved. The Hanson/Yudkowsky debate about FOOMing had more of a dialectic structure, for example.

With respect to your example, the discussion here might be a starting place for that discussion, btw. The discussions here and here and here might also be salient.

Incidentally: the anticipated relationship between what humans want, what various subsets of humans want, and what various supersets including humans want, is one of the first questions I asked when I encountered the CEV notion.

I haven't gotten an explicit answer, but it does seem (based on other posts/discussions) that on EY's view a nonhuman intelligent species valuing something isn't something that should motivate our behavior at all, one way or another. We might prefer to satisfy that species' preferences, or we might not, but either way what should be motivating our behavior on EY's view is our preferences, not theirs. What matters on this view is what matters to humans; what doesn't matter to humans doesn't matter.

I'm not sure if I buy that, but satisfying "all the reasons for action that exist" does seem to be a step in the wrong direction.

Comment author: lukeprog 02 February 2011 01:17:52AM 0 points [-]

TheOtherDave,

Thanks for the links! I don't know what "satisfying all the reasons for action that exist" is the solution, but I listed it as an example alternative to Eliezer's theory. Do you have a preferred solution?

Comment author: TheOtherDave 02 February 2011 02:42:56AM 1 point [-]

Not really.

Rolling back to fundamentals: reducing questions about right actions to questions about likely and preferred results seems reasonable. So does treating the likely results of an action as an empirical question. So does approaching an individual's interests empirically, and as distinct from their beliefs about their interests, assuming they have any. The latter also allows for taking into account the interests of non-sapient and non-sentient individuals, which seems like a worthwhile goal.

Extrapolating a group's collective interests from the individual interests of its members is still unpleasantly mysterious to me, except in the fortuitous special case where individual interests happen to align neatly. Treating this as an optimization problem with multiple weighted goals is the best approach I know of, but I'm not happy with it; it has lots of problems I don't know how to resolve.

Much to my chagrin, some method for doing this seems necessary if we are to account for individual interests in groups whose members aren't peers (e.g., children, infants, fetuses, animals, sufferers of various impairments, minority groups, etc., etc., etc.), which seems good to address.

It's also at least useful to addressing groups of peers whose interests don't neatly align... though I'm more sanguine about marketplace competition as an alternative way of addressing that.

Something like this may also turn out to be critical for fully accounting for even an individual human's interests, if it turns out that the interests of the various sub-agents of a typical human don't align neatly, which seems plausible.

Accounting for the probable interests of probable entities (e.g., aliens) I'm even more uncertain about. I don't discount them a priori, but without a clearer understanding of such an accounting would actually look like I really don't know what to say about them. I guess if we have grounds for reliably estimating the probability of a particular interest being had by a particular entity, then it's just a subset of the general weighting problem, but... I dunno.

I reject accounting for the posited interests of counterfactual entities, although I can see where the line between that and probabilistic entities as above is hard to specify.

Does that answer your question?

Comment author: JGWeissman 01 February 2011 11:26:31PM 2 points [-]

To respond to your example (while agreeing that it is good to have more intelligent people evaluating things like CEV and the meta-ethics that motivates it):

I think the CEV approach is sufficiently meta that if we would conclude on meeting and learning about the aliens, and considering their moral significance, that the right thing to do involves giving weight to their preferences, then an FAI constructed from our current CEV would give weight to their preferences once it discovers them.

Comment author: Vladimir_Nesov 02 February 2011 01:06:10AM 2 points [-]

then an FAI constructed from our current CEV would give weight to their preferences once it discovers them.

If they are to be given weight at all, then this could as well be done in advance, so prior to observing aliens we give weight to preferences of all possible aliens, conditionally on future observations of which ones turn out to actually exist.

Comment author: JGWeissman 02 February 2011 01:46:21AM 0 points [-]

From a perspective of pure math, I think that is the same thing, but in considering practical computability, it does not seem like a good use of computing power to figure what weight to give the preference of a particular alien civilization out of a vast space of possible civilizations, until observing that the particular civilization exists.

Comment author: Vladimir_Nesov 02 February 2011 01:54:30AM 1 point [-]

Such considerations could have some regularities even across all the diverse possibilities, which are easy to notice with a Saturn-sized mind.

Comment author: jimrandomh 02 February 2011 07:07:06PM 0 points [-]

One such regularity comes to mind: most aliens would rather be discovered by a superintelligence that was friendly to them than not be discovered, so spreading and searching would optimize their preferences.

Comment author: Vladimir_Nesov 01 February 2011 05:30:00PM 8 points [-]

I'd like to add the connection between the notions of "meta-ethics" and "decision theory" (of the kind we'd want a FAI/CEV to start out with). For the purpose of solving FAI, these seem to be the same, with "decision theory" emphasizing the outline of the target, and "meta-ethics" the source of correctness criteria for such theory in human intuition.

Comment author: cousin_it 01 February 2011 05:40:22PM *  4 points [-]

Hmm. I thought metaethics was about specifying a utility function, and decision theory was about algorithms for achieving the optimum of a given utility function. Or do you have a different perspective on this?

Comment author: Vladimir_Nesov 01 February 2011 06:02:16PM *  3 points [-]

Even if we assume that "utility function" has anything to do with FAI-grade decision problems, you'd agree that prior is also part of specification of which decisions should be made. Then there's the way in which one should respond to observations, the way one handles logical uncertainty and decides that given amount of reflection is sufficient to suspend an ethical injunction (such as "don't act yet"), the way one finds particular statements first in thinking about counterfactuals (what forms agent-provability), which can be generalized to non-standard inference systems, and on and on this list goes. This list is as long as morality, and it is morality, but it parses it in a specific way that extracts the outline of its architecture and not just individual pieces of data.

When you consider methods of more optimally solving a decision problem, how do you set criteria of optimality? Some things are intuitively obvious, and very robust to further reflection, but ultimately you'd want the decision problem itself to decide what counts as an improvement in the methods of solving it. For example, obtaining superintelligent ability to generate convincing arguments for a wrong statement can easily ruin your day. So efficient algorithms are, too, a subject of meta-ethics, but of course in the same sense as we can conclude that we can include an "action-definition" as a part of general decision problems, we can conclude that "more computational resources" is an improvement. And as you know from agent-simulates-predictor, that is not universally the case.

Comment author: lukeprog 01 February 2011 06:23:39PM 3 points [-]

Vladimir_Nesov,

Gary Drescher has an interesting way of grabbing deontological normative theory from his meta-ethics, coupled with decision theory and game theory. He explains it in Good and Real, though I haven't had time to evaluate it much yet.

Comment author: Vladimir_Nesov 01 February 2011 06:43:50PM 1 point [-]

Given your interest, you probably should read it (if not having read it is included in what you mean by not having had time to evaluate it). Although I still haven't, I know Gary is right on most things he talks about, and expresses himself clearly.

Comment author: Perplexed 01 February 2011 05:55:05PM *  3 points [-]

I think it is important to keep in mind that the approach currently favored here, in which your choice of meta-ethics guides your choice of decision theory, and in which your decision theory justifies your metaethics (in a kind of ouroborean epiphany of reflective equilibrium) - that approach is only one possible research direction.

There are other approaches that might be fruitful. In fact, it is far from clear to many people that the problem of preventing uFAI involves moral philosophy at all. (ETA: Or decision theory.)

To a small group, it sometimes appears that the only way of making progress is to maintain a narrow focus and to ruthlessly prune research subtrees as soon as they fall out of favor. But pruning in this way is gambling - it is an act of desperation by people who are made frantic by the ticking of the clock.

My preference (which may turn out to be a gamble too), is to ignore the ticking and to search the tree carefully with the help of a large, well-trained army of researchers.

Comment author: jacob_cannell 02 February 2011 07:09:51AM 2 points [-]

Much depends of course on the quantity of time we have available. If the market progresses to AGI on it's own in 10 years, our energies are probably best spent focused on a narrow set of practical alternatives.

If we have a hundred years, then perhaps we can afford to entertain several new generations of philosophers.

Comment author: Vladimir_Nesov 04 February 2011 10:37:08AM *  1 point [-]

If the market progresses to AGI on it's own in 10 years, our energies are probably best spent focused on a narrow set of practical alternatives.

But the problem itself seems to suggest that if you don't solve it on its own terms, and instead try to mitigate the practical difficulties, you still lose completely. AGI is a universe-exploding A-Bomb which the mad scientists are about to test experimentally in a few decades, you can't improve the outcome by building better shelters (or better casing for the bomb).

Comment author: timtyler 04 February 2011 09:33:26AM *  0 points [-]

Yudkowsky apparently councils ignoring the ticking as well - here:

Until you can turn your back on your rivals and the ticking clock, blank them completely out of your mind, you will not be able to see what the problem itself is asking of you. In theory, you should be able to see both at the same time. In practice, you won't.

I have argued repeatedly that the ticking is a fundamental part of the problem - and that if you ignore it, you just lose (with high probability) to those who are paying their clocks more attention. The "blank them completely out of your mind" advice seems to be an obviously-bad way of approaching the whole area.

It is unfortunate that getting more time looks very challenging. If we can't do that, we can't afford to dally around very much.

Comment author: Perplexed 04 February 2011 11:41:17AM 1 point [-]

Yudkowsky apparently councils ignoring the ticking as well

Yes, and that comment may be the best thing he has ever written. It is a dilemma. Go too slow and the bad guys may win. Go too fast, and you may become the bad guys. For this problem, the difference between "good" and "bad" has nothing to do with good intentions.

Comment author: timtyler 04 February 2011 09:36:16PM *  2 points [-]

Another analyis is that there are at least two types of possible problem:

  • One is the "runaway superintelligence" problem - which the SIAI seems focused on;

  • Another type of problem involves the preferences of only a small subset of human being respected.

The former problem has potentially more severe consequences (astronomical waste), but an engineering error like that seems pretty unlikely - at least to me.

The latter problem could still have some pretty bad consequences for many people, and seems much more probable - at least to me.

In a resource-limited world, too much attention on the first problem could easily contribute to running into the second problem.

Comment author: Will_Newsome 02 February 2011 09:29:20AM *  -2 points [-]

Right; you can extend decision theory to include reasoning about which computations the decision theoretic 'agent' is situated in and how that matters for which decisions to make, shaper/anchor semantics-style.

Meta-ethics per se is just the set of (hopefully mathematical-ish) intuitions we draw on and that guide how humans go about reasoning about what is right, and that we kind of expect to align somewhat with what a good situated decision theory would do, at least before the AI starts trading with other superintelligences that represented different contexts. If meta-level contextual/situated decision theory is convergent, the only differences between superintelligences are differences about what kind of world they're in. Meta-ethics is thus kind of superfluous except as a vague source of intuitions that should probably be founded in math, whereas practical axiology (fueled by evolutionary psychology, evolutionary game theory, social psychology, etc) is indicative of the parts of humanity that (arguably) aren't just filled in by the decision theory.

Comment author: BrandonReinhart 02 February 2011 05:57:25AM 5 points [-]

Would someone familiar with the topic be able to do a top level treatment similar to the recent one on self-help? A survey of the literature, etc.

I am a software engineer, but I don't know much about general artificial intelligence. The AI research I am familiar with is very different from what you are talking about here.

Who is currently leading the field in attempts at providing mathematical models for philosophical concepts? Are there simple models that demonstrate what is meant by computational meta-ethics? Is that a correct search term -- as in a term that I could key into google and get meaningful results? What are the correct search terms? I see lots of weird and varied results when I search on "computational meta-ethics" and "algorithmic epistemology" and other combinations. There is no popular set of references for this (like there would be for psychological terminology) so I don't even have a pop-culture-like values to use to evaluate the meaning of search results.

Does the language above represent the current approach to the problem? In other words, is the process currently employed to try and reduce via linguistic methods human ethical concepts into algorithmic models? That seems incredibly dangerous. Are there other approaches? Can someone detail the known methods that are being used, their most successful implementors, and references to examples?

I have read that intelligence can be viewed as a type of optimization process. There is a set of detailed research around optimization processes (genetic algorithms, etc). Is there AI research in this area? I don't have the language to describe what I think I mean.

I do know that there is very high quality, useful research being done on directed optimization processes. I could sit down with this research, read it, and formulate new research plans to progress it in a very rigorous and detailed way. Much of this stuff is accessible to moderately capable software engineers (Haupt, 2004). I don't see that with general AI because I don't know where to look. A post that helps me know where to look would be great.

What are the works that I need to internalize in order to begin to attack this problem in a meaningful way?

Comment author: lukeprog 06 February 2011 07:22:43AM 4 points [-]

It sounds like you're asking for something broader than this, but I did just post a bibliography on Friendly AI, which would make for a good start.

Unfortunately, meta-ethics is one of the worst subjects to try to "dive into," because it depends heavily on so many other fields. I was chatting with Stephen Finlay, a meta-ethicist at USC, and he said something like: "It's hard to have credibility as a professor teaching meta-ethics, because meta-ethics depends on so many fields, and in most of the graduate courses I teach on meta-ethics, I know that every one of my students knows more about one of those fields than I do."

Comment author: XiXiDu 01 February 2011 02:44:18PM 5 points [-]

Rather, my point is that we need lots of smart people working on these meta-ethical questions.

I'm curious if the SIAI shares that opinion. Is Michael Vassar trying to hire more people or is his opinion that a small team will be able to solve the problem? Can the problem be subdivided into parts, is it subject to taskification?

Comment author: AnnaSalamon 01 February 2011 03:31:27PM 11 points [-]

I'm curious if the SIAI shares that opinion.

I do. More people doing detailed moral psychology research (such as Jonathan Haidt's work), or moral philosophy with the aim of understanding what procedure we would actually want followed, would be amazing.

Research into how to build a powerful AI is probably best not done in public, because it makes it easier to make unsafe AI. But there's no reason not to engage as many good researchers as possible on moral psychology and meta-ethics.

Comment author: XiXiDu 01 February 2011 06:24:58PM *  5 points [-]

Research into how to build a powerful AI is probably best not done in public...

Is the SIAI concerned with the data security of its research? Is the latest research saved unencrypted on EY's laptop and shared between all SIAI members? Could a visiting fellow just walk into the SIAI house, plug-in a USB stick and run with the draft for a seed AI? Those questions arise when you make a distinction between you and the "public".

But there's no reason not to engage as many good researchers as possible on moral psychology and meta-ethics.

Can that research be detached from decision theory? Since you're working on solutions applicable to AGI, is it actually possible to differentiate between the mathematical formalism of an AGI's utility function and the fields of moral psychology and meta-ethics. In other words, can you learn a lot by engaging with researchers if you don't share the math? That is why I asked if the work can effectively be subdivided if you are concerned with security.

Comment author: jacob_cannell 02 February 2011 06:58:26AM 2 points [-]

Research into how to build a powerful AI is probably best not done in public

I find this dubious - has this belief been explored in public on this site?

If AI research is completely open and public, then more minds and computational resources will be available to analyze safety. In addition, in the event that a design actually does work, it is far less likely to have any significant first mover advantage.

Making SIAI's research public and open also appears to be nearly mandatory for proving progress and joining the larger scientific community.

Comment author: lukeprog 01 February 2011 02:57:32PM 4 points [-]

I tend to think that the broadest issues, such as those of meta-ethics, may be discussed by a wider professional community, though of course most meta-ethicists will have little to contribute (divine command theorists, for example). It may still be the case that a small team is best for solving more specific technical problems in programming the AI's utility function and proving that it will not reprogram its own terminal values.

But I don't know what SIAI's position is.

Comment author: Perplexed 01 February 2011 04:45:44PM 7 points [-]

It may still be the case that a small team is best for solving more specific technical problems in programming the AI's utility function and proving that it will not reprogram its own terminal values.

My intuition leads me to disagree with the suggestion that a small team might be better. The only conceivable (to me) advantage of keeping the team small would be to minimize the pedagogical effort of educating a large team on the subtleties of the problem and the technicalities of the chosen jargon. But my experience has been that investment in clarity of pedagogy yields dividends in your own understanding of the problem, even if you never get any work or useful ideas out of the yahoos you have trained. And, of course, you probably will get some useful ideas from those people. There are plenty of smart folks out there. Whole generations of them.

Comment author: XiXiDu 01 February 2011 08:12:11PM *  2 points [-]

My intuition leads me to disagree with the suggestion that a small team might be better.

People argue that small is better for security reasons. But one could as well argue that large is better for security reasons because there exist more supervision and competition. Do you rather trust 5 people (likely friends) or a hundred strangers working for fame and money? After all we're talking about a project that will result in the implementation of a superhuman AI to destine the future of the universe. A handful of people might do anything, regardless of what they are signaling. But a hundred people are much harder to control. So the security argument runs both ways. The question is what will maximize the chance of success. Here I agree that it will take many more people than are currently working on the various problems.

Comment author: Perplexed 01 February 2011 08:23:40PM 1 point [-]

I agree. But, with Luke, I am assuming that the problem of AGI Friendliness can be addressed independently of the question of actually achieving AGI. Only the second of those two questions requires security - there is no reason not to pursue Friendliness theory openly.

Comment author: timtyler 02 February 2011 12:03:33AM *  0 points [-]

I am assuming that the problem of AGI Friendliness can be addressed independently of the question of actually achieving AGI.

That is probably not true. There may well be some differences, though. For instance, it is hard to see how the corner cases in decision theory that are so discussed around here have much relevance to the problem of actually constructing a machine intelligence - UNLESS you want to prove things about how its goal system behaves under iterative self-modification.

Comment author: timtyler 02 February 2011 12:07:33AM *  -1 points [-]

People argue that small is better for security reasons. But one could as well argue that large is better for security reasons because there exist more supervision and competition.

The "smaller is better" idea seems linked to "security through obscurity" - a common term of ridicule in computer security circles.

The NSA manage to get away with some security through obscurity - but they are hardly a very small team.

Comment author: JGWeissman 01 February 2011 06:14:11PM 3 points [-]

Is Michael Vassar trying to hire more people or is his opinion that a small team will be able to solve the problem?

On this sort of problem that can be safely researched publicly, SIAI need not hire people to get them to work on the problem. They can also engage the larger academic community to get them interested in finding practical answers to these questions.

Comment author: knb 01 February 2011 06:58:16PM -3 points [-]

Has there ever been a time when philosophical questions were solved by a committee?

Comment author: Perplexed 01 February 2011 07:40:58PM 3 points [-]

Has there ever been a time when philosophical questions were solved by a committee?

It could be argued that they are never usefully answered in any other way. Isolated great thinkers are rare in most fields. They are practically non-existent in philosophy.

Comment author: Eliezer_Yudkowsky 01 February 2011 05:53:39PM 10 points [-]

So let's say that you go around saying that philosophy has suddenly been struck by a SERIOUS problem, as in lives are at stake, and philosophers don't seem to pay any attention. Not to the problem itself, at any rate, though some of them may seem annoyed at outsiders infringing on their territory, and nonplussed at the thought of their field trying to arrive at answers to questions where the proper procedure is to go on coming up with new arguments and respectfully disputing them with other people who think differently, thus ensuring a steady flow of papers for all.

Let us say that this is what happens; which of your current beliefs, which seem to lead you to expect something else to happen, would you update?

Comment author: Perplexed 01 February 2011 06:53:44PM 5 points [-]

At the risk of repeating myself, or worse, sounding like an organizational skills guru rambling on about win-win opportunities, might it not be possible to change the environment so that philosophers can do both - publish a steady flow of papers containing respectful disputation AND work on a serious problem?

Comment author: lukeprog 01 February 2011 06:22:29PM 12 points [-]

No, that is exactly what I expect to happen with more than 99% of all philosophers. But we already have David Chalmers arguing it may be a serious problem. We have Nick Bostrom and the people at Oxford's Future of Humanity Institute. We probably can expect some work on SIAI's core concerns from philosophy grad students we haven't yet heard from because they haven't published much, for example Nick Beckstead, whose interests are formal epistemology and the normative ethics of global catastrophic risks.

As you've said before, any philosophy that would be useful to you and SIAI is hard to find. But it's out there, in tiny piles, and more of it is coming.

Comment author: Desrtopa 01 February 2011 08:03:41PM 3 points [-]

The problems appear to be urgent, and in need of actual solutions, not simply further debate, but it's not at all clear to me that people who currently identify as philosophers are, as a group, those most suited to work on them.

Comment author: lukeprog 01 February 2011 10:53:22PM 1 point [-]

I'm not saying they are 'most suited to work on them', either. But I think they can contribute. Do you think that Chalmers and Bostrom have not already contributed, in small ways?

Comment author: Desrtopa 01 February 2011 11:14:56PM 1 point [-]

Bostrom, yes, Chalmers, I have to admit that I haven't followed his work enough to issue an opinion.

Comment author: Vladimir_Nesov 01 February 2011 06:23:42PM 4 points [-]

"Most philosophers" is not necessarily the target audience of such argument.

Comment author: benelliott 01 February 2011 06:37:33PM *  4 points [-]

I might be wrong here, but I wonder if at least some philosophers have a niggling little worry that they are wasting their considerable intellectual gifts (no, I don't think that all philosophers are stupid) on something useless. If these people exist they might be pleased rather than annoyed to hear that the problems they are thinking about were actually important, and this might spur them to rise to the challenge.

This all sounds hideously optimistic of course, but it suggests a line of attack if we really do want their help.

Comment author: Desrtopa 01 February 2011 07:59:36PM *  10 points [-]

I don't remember the specifics, and so don't have the terms to do a proper search, but I think I recall being taught in one course about a philosopher who, based on the culmination of all his own arguments on ethics, came to the conclusion that being a philosopher was useless, and thus changed careers.

Comment author: katydee 01 February 2011 08:08:16PM 3 points [-]

I know of a philosopher who claimed to have finished a grand theory he was working on, concluded that all life was meaningless, and thus withdrew from society and lived on a boat for many years fishing to live and practicing lucid dreaming. His doctrine was that we can't control reality, so we might as well withdraw to dreams, where complete control can be exercised by the trained.

I also remember reading about a philosopher who finished some sort of ultra-nihilist theory, concluded that life was indeed completely meaningless, and committed suicide-- getting wound up too tightly in a theory can be hazardous to your physical as well as epistemic health!

Comment author: MBlume 04 February 2011 01:38:07AM 1 point [-]

getting wound up too tightly in a theory can be hazardous to your physical as well as epistemic health!

This doesn't automatically follow unless you first prove he was wrong =P

Comment author: XiXiDu 01 February 2011 07:43:05PM 6 points [-]

As a layman I'm still puzzled how the LW sequences do not fall into the category of philosophy. Bashing philosophy seems to be over the top, there is probably as much "useless" mathematics.

Comment author: jimrandomh 01 February 2011 08:47:03PM 20 points [-]

I think the problem is that philosophy has, as a field, done a shockingly bad job of evicting obsolete and incorrect ideas (not just useless ones). Someone who seeks a philosophy degree can expect to waste most of their time and potential on garbage. To use a mathematics analogy, it's as if mathematicians were still holding debates between binaryists, decimists, tallyists and nominalists.

Most of what's written on Less Wrong is philosophy, there's just so much garbage under philosophy's name that it made sense to invent a new name ("rationalism"), pretend it's unrelated, and guard that name so that people can use it as a way to find good philosophy without wading through the bad. It's the only reference class I know of for philosophy writings that's (a) larger than one author, (b) mostly sane, and (c) enumerable by someone who isn't an expert.

Comment author: Jack 03 February 2011 06:39:22PM *  3 points [-]

I think the problem is that philosophy has, as a field, done a shockingly bad job of evicting obsolete and incorrect ideas (not just useless ones).

Totally agree.

Someone who seeks a philosophy degree can expect to waste most of their time and potential on garbage.

Not exactly. The subfields are more than specialized enough to make it pretty easy to avoid garbage. Once you're in the field it isn't hard to locate the good stuff. For institutional and political reasons the sane philosophers tend to ignore the insane philosophers and vice versa, with just the occasional flare up. It is a problem.

It's the only reference class I know of for philosophy writings that's (a) larger than one author, (b) mostly sane, and (c) enumerable by someone who isn't an expert.

Er, I suspect the majority of "naturalistic philosophy in the analytic tradition" would meet the sanity waterline of Less Wrong, particularly the sub-fields of epistemology and philosophy of science.

Comment author: ata 01 February 2011 09:10:08PM *  13 points [-]

They do. (Many of EY's own posts are tagged "philosophy".) Indeed, FAI will require robust solutions to several standard big philosophical problems, not just metaethics; e.g. subjective experience (to make sure that CEV doesn't create any conscious persons while extrapolating, etc.), the ultimate nature of existence (to sort out some of the anthropic problems in decision theory), and so on. The difference isn't (just) in what questions are being asked, but in how we go about answering them. In traditional philosophy, you're usually working on problems you personally find interesting, and if you can convince a lot of other philosophers that you're right, write some books, and give a lot of lectures, then that counts as a successful career. LW-style philosophy (as in the "Reductionism" and "Mysterious Answers" sequences) is distinguished in that there is a deep need for precise right answers, with more important criteria for success than what anyone's academic peers think.

Basically, it's a computer science approach to philosophy: any progress on understanding a phenomenon is measured by how much closer it gets you to an algorithmic description of it. Academic philosophy occasionally generates insights on that level, but overall it doesn't operate with that ethic, and it's not set up to reward that kind of progress specifically; too much of it is about rhetoric, formality as an imitation of precision, and apparent impressiveness instead of usefulness.

Comment author: NancyLebovitz 01 February 2011 09:33:30PM 4 points [-]

e.g. subjective experience (to make sure that CEV doesn't create any conscious persons while extrapolating, etc.),

Also, to figure out whether particular uploads have qualia, and whether those qualia resemble pre-upload qualia, it that's wanted.

Comment author: jacob_cannell 02 February 2011 07:12:36AM 1 point [-]

I should just point out that these two goals (researching uploads, and not creating conscious persons) are starkly antagonistic.

Comment author: shokwave 02 February 2011 07:40:56AM 3 points [-]

are starkly antagonistic.

Not in the slightest. First, uploads are continuing conscious persons. Second, creating conscious persons is a problem if they might be created in uncomfortable or possibly hellish conditions - if, say, the AI was brute-forcing every decision it would simulate countless numbers of humans in pain before it found the least painful world. I do not think we would have a problem with the AI creating conscious persons in a good environment. I mean, we don't have that problem with parenthood.

Comment author: NancyLebovitz 02 February 2011 10:14:07PM 1 point [-]

What if it's researching pain qualia at ordinary levels because it wants to understand the default human experience?

I don't know if we're getting into eye-speck territory, but what are the ethics of simulating an adult human who's just stubbed their toe, and then ending the simulation?

Comment author: shokwave 03 February 2011 07:39:05AM 1 point [-]

I feel like the consequences are net positive, but I don't trust my human brain to correctly determine this question. I would feel uncomfortable with an FAI deciding it, but I would also feel uncomfortable with a person deciding it. It's just a hard question.

Comment author: DSimon 02 February 2011 02:54:58PM *  0 points [-]

What if they were created in a good environment and then abruptly destroyed because the AI only needed to simulate them for a few moments to get whatever information it needed?

Comment author: shokwave 02 February 2011 03:40:58PM 2 points [-]

Well - what if a real person went through the same thing? What does your moral intuition say?

Comment author: DSimon 02 February 2011 06:01:55PM *  1 point [-]

That it would be wrong. If I had the ability to spontaneously create fully-formed adult people, it would be wrong to subsequently kill them, even if I did so painlessly and in an instant. Whether a person lives or dies should be under the control of that person, and exceptions to this rule should lean towards preventing death, not encouraging it.

Comment author: johnlawrenceaspden 30 October 2012 06:38:51PM 1 point [-]

What if they were created in a good environment, (20) stopped, and then restarted (goto 20) ?

Is that one happy immortal life or an infinite series of murders?

Comment author: DSimon 02 November 2012 05:53:37AM 0 points [-]

I think closer to the latter. Starting a simulated person, running them for a while, and then ending and discarding the resulting state effectively murders the person. If you then start another copy of that person, then depending on how you think about identity, that goes two ways:

Option A: The new person, being a separate running copy, is unrelated to the first person identity-wise, and therefore the act of starting the second person does not change the moral status of ending the first. Result: Infinite series of murders.

Option B: The new person, since they are running identically to the old person, is therefore actually the same person identity-wise. Thus, you could in a sense un-murder them by letting the simulation continue to run after the reset point. If you do the reset again, however, you're just recreating the original murder as it was. Result: Single murder.

Neither way is a desirable immortal life, which I think is a more useful way to look at it then "happy".

Comment author: lukeprog 01 February 2011 08:49:40PM 7 points [-]

The sequences are definitely philosophy, but written (mostly) without referencing the philosophers who have given (roughly) the same arguments or defended (roughly) the same positions.

I really like Eliezer's way of covering many of these classic debates in philosophy. In other cases, for example in the meta-ethics sequence, I found EY's presentation unnecessarily difficult.

Comment author: Blueberry 02 February 2011 03:46:40AM 6 points [-]

I'd appreciate an annotation to EY's writings that includes such references, as I'm not aware of philosophers who have given similar arguments (except Dennett and Drescher).

Comment author: lukeprog 02 February 2011 05:06:39AM 2 points [-]

That would make for a very interesting project! If I find the time, maybe I'll do this for a post here or there. It would integrate Less Wrong into the broader philosophical discussion, in a way.

Comment author: Perplexed 02 February 2011 02:37:10PM *  10 points [-]

I have mixed feelings about that. One big difference in style between the sciences and the humanities lies in the complete lack of respect for tradition in the sciences. The humanities deal in annotations and critical comparisons of received texts. The sciences deal with efficient pedagogy.

I think that the sequences are good in that they try to cover this philosophical material in the great-idea oriented style of the sciences rather than the great-thinker oriented style of the humanities. My only complaint about the sequences is that in some places the pedagogy is not really great - some technical ideas are not explained as clearly as they might be, some of the straw men are a little too easy to knock down, and in a few places Eliezer may have even reached the wrong conclusions.

So, rather than annotating The Sequences (in the tradition of the humanities), it might be better to re-present the material covered by the sequences (in the tradition of the sciences). Or, produce a mixed-mode presentation which (like Eliezer's) focuses on getting the ideas across, but adds some scholarship (unlike Eliezer) in that it provides the standard Googleable names to the ideas discussed - both the good ideas and the bad ones.

Comment author: lukeprog 02 February 2011 04:25:38PM 1 point [-]

I like this idea.

Comment author: TheOtherDave 02 February 2011 05:56:44AM 1 point [-]

You and EY might find it particularly useful to provide such an annotation as an appendix for the material that he's assembling into his book.

Or not.

Comment author: lukeprog 02 February 2011 06:19:11AM 5 points [-]

I certainly think that positioning the philosophical foundations assumed by the quest for Friendly AI would give SIAI more credibility in academic circles. But right now SIAI seems to be very anti-academia in some ways, which I think is unfortunate.

Comment author: komponisto 02 February 2011 06:34:59AM *  3 points [-]

But right now SIAI seems to be very anti-academia in some ways,

I really don't think it is, as a whole. Vassar and Yudkowsky are somewhat, but there are other people within and closely associated with the organization who are actively trying to get papers published, etc. And EY himself just gave a couple of talks at Oxford, so I understand.

(In fact it would probably be more accurate to say that academia is somewhat more anti-SIAI than the other way around, at the moment.)

As for EY's book, my understanding is that it is targeted at popular rather than academic audiences, so it presumably won't be appropriate for it to trace the philosophical history of all the ideas contained therein, at least not in detail. But there's no reason it can't be done elsewhere.

Comment author: TheOtherDave 02 February 2011 01:48:56PM 2 points [-]

I'm thinking of what Dennett did in Consciousness Explained, where he put all the academic-philosophy stuff in an appendix so that people interested in how his stuff relates to the broader philosophical discourse can follow that, and people not interested in it can ignore it.

Comment author: rhollerith_dot_com 02 February 2011 09:18:07AM *  0 points [-]

Near the end of the meta-ethics sequence, Eliezer wrote that he chose to postpone reading Good and Real until he finished writing about meta-ethics because otherwise he might not finish it. For most of his life, writing for public consumption was slow and tedious, and he often got stuck. That seemed to change after he started blogging daily on Overcoming Bias, but the change was recent enough that he probably questioned its permanence.

Comment author: Vladimir_Nesov 01 February 2011 07:49:31PM *  2 points [-]

Why, they do fall in the category of philosophy (for the most part). You can imagine that bashing bad math is just as rewarding.

Comment author: benelliott 01 February 2011 08:21:17PM 0 points [-]

There definitely is, and I would suspect that many pure mathematicians have the same worry (in fact I don't need to suspect it, sources like A Mathematicians Apology provide clear evidence of this). These people might be another good source of thinkers for a different side of the problem, although I do wonder if anything they can do to help couldn't be done better by an above average computer programmer.

I would say the difference between the sequences and most philosophy is one of approach rather than content.

Comment author: Jayson_Virissimo 01 February 2011 03:38:57PM *  5 points [-]

How much thought has been given to hard coding an AI with a deontological framework rather than giving it some consequentialist function to maximize? Is there already a knockdown argument showing why that is a bad idea?

EDIT: I'm not talking about what ethical system to give an AI that has the potential to do the most good, but one that would be capable of the least bad.

Comment author: endoself 01 February 2011 08:28:34PM 1 point [-]

It is very hard to get an AI to understand the relevant deontological rules. Once you have accomplished that, there is no obvious next step easier and safer than CEV.

Comment author: Jayson_Virissimo 01 February 2011 11:55:26PM -1 points [-]

Intuitively, it seems easier to determine if a given act violates the rule "do not lie" than the rule "maximize the expected average utility of population x". Doesn't this mean that I understand the first rule better than the second?

Comment author: ata 01 February 2011 11:59:36PM *  6 points [-]

Yes, but you're a human, not an AI. Your brain comes factory-equipped with lots of machinery for understanding deontological injunctions, and no (specific) machinery for understanding the concept of expected utility maximization.

Programming each of those concepts into an AI and conveying them to a human are entirely different tasks.

Comment author: Vladimir_Nesov 02 February 2011 01:12:34AM *  3 points [-]

Logical uncertainty, which is unavoidable no matter how smart you are, blurs the line. AI won't "understand" expected utility maximization completely either, it won't see all the implications no matter how much computational resources it has. And so it needs more heuristics to guide its decisions where it can't figure out all the implications. Those are the counterparts of deontological injunctions, although of course they must be subject to revision on sufficient reflection (and what "sufficient" means is one of these injunctions, also subject to revision). Some of then will even have normative implications, in fact that's once reason preference is not utility function.

Comment author: jacob_cannell 02 February 2011 12:09:18AM *  -1 points [-]

You are making a huge number of assumptions here:

Your brain comes factory-equipped with lots of machinery for understanding deontological injunctions

Such as? Where is this machinery?

no (general) machinery for understanding the concept of expected utility maximization

How do you understand the concept of expected utility maximization? Is it not through the highly general machinery of your cortex?

And how can we expect that the algorithm of "expected utility maximization" actually represents our best outcome?

Programming each of those concepts into an AI and conveying them to a human are entirely different tasks.

debatable

Comment author: ata 02 February 2011 01:25:58AM 1 point [-]

Such as? Where is this machinery?

"Machinery" was a figure of speech, I'm not saying we're going to find a deontology lobe. I was referring, for instance, to the point that there are evolutionary reasons why we'd expect to find (as we do) that an understanding of deontological injunctions is fairly universal among humans.

How do you understand the concept of expected utility maximization? Is it not through the highly general machinery of your cortex?

Oops, sorry, I accidentally used the opposite of the word I meant. That should have been "specific", not "general". Yes, we understand expected utility maximization with highly general machinery, and in very abstract terms.

Comment author: jacob_cannell 02 February 2011 06:08:32AM *  0 points [-]

Such as? Where is this [deontological] machinery?

I was referring, for instance, to the point that there are evolutionary reasons why we'd expect to find (as we do) that an understanding of deontological injunctions is fairly universal among humans.

EY's theory linked in the 1st post that deontological injunctions evolved as some sort of additional defense against black swan events does not appear especially convincing to me. The cortex is intrinsically predictive consequentialist at a low level, but simple deontological rules are vast computational shortcuts.

An animal brain learns the hard way, the way AIXI does, thoroughly consequentialist at first, but once predictable pattern matches are learned at higher levels they can be sometimes simplified down to simpler rules for quick decisions.

Even non-verbal animals find ways to pass down some knowledge to their offspring, but in humans this is vastly amplified through language.

Every time a parent tells a child what to do, the parent is transmitting complex consequentualist results down to the younger mind in the form of simpler cached deontological behaviors. Ex: It would be painful for the child to learn a firsthand consequentualist account of why stealing is detrimental (the tribe will punish you).

Once this machinery was in place, it could extend over generations and develop into more complex cultural and religious deontologies. All of this can be accomplished through cortical reinforcement learning as the child develops.

Feral children, for all intents and purposes, act like feral animals. Human minds are cultural/linguistic software phenomena.

Not to mention that conveying a concept to a human carries no instructions; programming concepts into an AI is all instructions

I'm not aware of any practical approach to AI which consists of programming concepts directly into an AI. All modern approaches program only the equivalent of an empty brain, the concepts and resulting mind forms through learning.

Humans concepts are expressed in natural language, and for an AGI to compete with humans it will need to learn extant human knowledge. Learning natural language thus seems like the most practical approach.

"Expected utility maximisation" is, by definition what actually represents our best outcome. To the extent that it doesn't, it is a failure of our ability to grasp and apply the concept, not a failure in the concept itself.

The problem is this: if we define an algorithm to represent our best outcome and use that as the standard of rationality, and the algorithm's predictions then differ significantly from actual human decisions: is it a problem with the algorithm or the human mind?

If we had an algorithm that represented a human mind perfectly, then that mind would always be rational by that definition.

Comment author: Lightwave 02 February 2011 09:25:47AM *  0 points [-]

Even if deontological injunctions are only transmitted through language, they are based on human predispositions (read brain wiring) to act morally and cooperate, which has evolved.

This somewhat applies to animals too, there's been research on altruism in animals.

Comment author: shokwave 02 February 2011 12:57:54AM *  1 point [-]

That he makes assumptions is no point against him; the question is do those assumptions hold.

To support the first one: the popularity and success of the fallacy of appealing to authority, Milgram's comments on his experiment, the "hole-shaped God" theory (well supported).

For the second one: First, it's not entirely clear we do understand expected utility maximisation. Certainly, I know of no-one who acts as though they are maximising their expected utility. Second, to the extent that we do understand it, I would draw the metaphor of a Turing tarpit - I would say that we understand it only in the sense that we can hack together a bunch of neural processes that do other things, in such a way that they produce the words "expected utility maximisation" and the concept "act to get the most of what you really want". This is still an understanding, of course, but in no way do we have machinery for that purpose like how we have machinery for orders from authority / deontological injunctions.

"Expected utility maximisation" is, by definition what actually represents our best outcome. To the extent that it doesn't, it is a failure of our ability to grasp and apply the concept, not a failure in the concept itself.

As for the third, and for your claim of debatable: Yes, you could debate it. You would have to stand on some very wide definitions of entirely and different, and you'd lose the debate. For example: speaking aloud to an AI and speaking aloud to a human are entirely different tasks. Not to mention that conveying a concept to a human carries no instructions; programming concepts into an AI is all instructions. Another entire difference.

Comment author: Vladimir_Nesov 02 February 2011 02:25:58AM 0 points [-]

"Expected utility maximisation" is, by definition what actually represents our best outcome.

No, it's based on certain axioms that are not unbreakable in strange contexts, which in turn assume a certain conceptual framework (where you can, say, enumerate possibilities in a certain way).

Comment author: endoself 02 February 2011 02:38:26AM 0 points [-]

Name one exception to any axiom other than the third or to the general conceptual framework.

Comment author: Vladimir_Nesov 02 February 2011 02:53:50AM *  0 points [-]

There's no point in assuming completeness, being able to compare events that you won't be choosing between (in the context of utility function having possible worlds as domain). Updateless analysis says that you never actually choose between observational events. And there are only so many counterfactuals to consider (which in this setting are more about high-level logical properties of a fixed collection of worlds, which lead to their different utility, and not presence/absence of any given possible world, so in one sense even counterfactuals don't give you nontrivial events).

Comment author: endoself 02 February 2011 03:23:56AM 0 points [-]

There's no point in assuming completeness, being able to compare events that you won't be choosing between (in the context of utility function having possible worlds as domain).

Is there ever actually a two events for which this would not hold if you did need to make such a choice?

Updateless analysis says that you never actually choose between observational events.

I'm not sure what you mean. Outcomes do not have to be observed in order to be chosen between.

And there are only so many counterfactuals to consider (which in this setting are more about high-level logical properties of a fixed collection of worlds, which lead to their different utility, and not presence/absence of any given possible world, so in one sense even counterfactuals don't give you nontrivial events).

Isn't this just seperating degrees of freedom and assuming that some don't affect others? It can be derived from the utility axioms.

Comment author: Will_Newsome 02 February 2011 09:26:54AM *  0 points [-]

That said, it's hard to reason about what preferences/morality/meta-ethics/etc. an AI actually converges to if you give it vague deontological injunctions like "be nice" or "produce paperclips". It'd be really cool if more people were thinking about likely attractors on top of or instead of the recognized universal AI drives.

(Also I'll note that I agree with Nesov that logical uncertainty / the grounding problem / no low level language etc. problems pose similar difficulties to the 'you can't just do ethical injunctions' problem. That said, humans are able to do moral reasoning somehow, so it can't be crazy difficult.)

Comment author: PhilGoetz 06 February 2011 05:17:25AM *  4 points [-]

I would ordinarily vote down a post that restated things that most people on LW should already know, but... LW is curiously devoid of discussion on this issue, whether criticism of CEV, or proposals of alternatives. And LP's post hits all the key points, very efficiently.

If LW has a single cultural blind spot, it is that LWers claim to be Bayesians, yet routinely analyze potential futures as if the single "most-likely" scenario, hypothesis, or approach accepted as dogma on LessWrong (fast takeoff, Friendly AI, multiple worlds, CEV, etc.) had probability 1.

Comment author: wedrifid 06 February 2011 06:58:03AM 4 points [-]

If LW has a single cultural blind spot, it is that LWers claim to be Bayesians, yet routinely analyze potential futures as if the single "most-likely" scenario, hypothesis, or approach accepted as dogma on LessWrong (fast takeoff, Friendly AI, multiple worlds, CEV, etc.) had probability 1.

Eliezer has stated that he will not give his probability for the successful creation of Friendly AI. Presumably because people would get confused about why working desperately towards it is the rational thing to do despite a low probability.

As for CEV 'having a probability of 1', that doesn't even make sense. But an awful lot of people have said that CEV as described in Eliezer's document would be undesirable even assuming the undeveloped parts were made into more than hand wavy verbal references.

Comment author: katydee 06 February 2011 09:21:01AM 2 points [-]

I dunno, I perceive a lot of criticism of CEV here-- if I recall correctly there have been multiple top-level posts expressing skepticism of it. And doesn't Robin Hanson (among others) disagree with the hard takeoff scenario?

Comment author: PhilGoetz 06 February 2011 04:57:32PM *  0 points [-]

That's true. (Although notice that not one of those posts has ever gotten the green button.)

CEV does not fit well into my second paragraph, since it is not a prerequisite for anything else, and therefore not a point of dependency in an analysis.

Comment author: TheOtherDave 01 February 2011 04:59:33PM 4 points [-]

It's not just a matter of pace; this perspective also implies a certain prioritization of the questions.

For example, as you say, it's important to conclude soon whether animal welfare is important. (1) (2) But if we preserve the genetic information that creates new animals, we preserve the ability to optimize animal welfare in the future, should we at that time conclude that it is important. (2) If we don't, then later concluding it's important doesn't get us much.

It seems to follow that preserving that information (either in the form of a breeding population, or some other form) is a higher priority, on this view, than proving that animal welfare is important. That is, for the next century, genetics research might be more relevant to maximizing long-term animal welfare than ethical philosophy research.

Of course, killing off animals is only one way to (hypothetically) irreversibly fail to optimize the future. Building an optimizing system that is incapable of correcting its initially mistaken terminal values -- either because it isn't designed to alter its programming, or because it has already converted all the mass-energy in the universe into waste heat, or whatever -- is another. There are many more.

In other words, there are two classes of questions: the ones where a wrong answer is irreversible, and the ones where it isn't. Philosophical work to determine which is which, and to get a non-wrong answer to the former ones, seems like the highest priority on this view.

===

(1) Not least because humans are already having an impact on it, but that's beside your point.

(2) By "conclude that it's important" I don't mean adopting a new value, I mean become aware of an implication of our existing values. I don't reject adopting new values, either, but I'm explicitly not talking about that here.

Comment author: John_Maxwell_IV 03 February 2011 02:40:50AM 2 points [-]

Am I correct in saying that there is not necessarily any satisfactory solution to this problem?

Also, this seems relevant: The Terrible, Horrible, No Good Truth About Morality.

Comment author: TheOtherDave 03 February 2011 04:02:02AM 1 point [-]

Depends on what would satisfy us, I suppose.

I mean, for example, if it turns out that implementing CEV creates a future that everyone living in desires and is made happy and fulfilled and satisfied by and continues to do so indefinitely, and that everyone living now would if informed of the details of also desire and etc., but we are never able to confirm that any of that is right... or worse yet, later philosophical analysis somehow reveals that it isn't right, despite being desirable and fulfilling and satisfying and so forth... well, OK, we can decide at that time whether we want to give up what is desirable and etc. in exchange for what is right, but in the meantime I might well be satisfied by that result. Maybe it's OK to leave future generations some important tasks to implement.

Or, if it turns out that EY's approach is all wrong because nobody agrees on anything important to anyone, so that extrapolating humanity's coherent volition leaves out everything that's important to everyone, so that implementing it doesn't do anything important... in that case, coming up with an alternate plan that has results as above would satisfy me.

Etc.

Comment author: XiXiDu 03 February 2011 12:03:56PM 2 points [-]

Depends on what would satisfy us, I suppose.

It might turn out that what does satisfy us is to be "free", to do what we want, even if that means that we will mess up our own future. It might turn out that humans are only satisfied if they can work on existential problems, "no risk no fun". Or we might simply want to learn about the nature of reality. The mere existence of an FAI might spoil all of it. Would you care to do science if there was some AI-God that already knew all the answers? Would you be satisfied if it didn't tell you the answers or made you forget that it does exist so that you'd try to invent AGI without ever succeeding?

But there is another possible end. Even today many people are really bored and don't particularly enjoy life. What if it turns out that there is no "right" out there or that it can be reached fairly easily without any way to maximize it further. In other words, what if fun is something that isn't infinite but a goal that can be reached? What if it all turns out to be wireheading, the only difference between 10 minutes of wireheading or 10^1000 years being the number enumerating the elapsed time? Think about it, would you care about 10^1000 years of inaction? What would you do if that was the optimum? Maybe we'll just decide to choose the void instead.

Comment author: TheOtherDave 03 February 2011 05:25:34PM 2 points [-]

This is a different context for satisfaction, but to answer your questions:

  • yes, I often find satisfying working through problems that have already been solved, though I appreciate that not everyone does;

  • no, I would not want to be denied the solutions if I asked (assuming there isn't some other reason why giving me the solution is harmful), or kept in ignorance of the existence of those solutions (ibid);

  • if it turns out that all of my desires as they currently exist are fully implemented, leaving me with no room for progress and no future prospects better than endless joy, fulfillment and satisfaction, I'd be satisfied and fulfilled and joyful.

  • Admittedly, I might eventually become unsatisfied with that and desire something else, at which point I would devote efforts to satisfying that new desire. It doesn't seem terribly likely that my non-existence would be the best possible way of doing so, but I suppose it's possible, and if it happened I would cease to exist.

Comment author: shokwave 03 February 2011 12:20:49PM 1 point [-]

It might turn out that

It might indeed.

Comment author: Wei_Dai 03 February 2011 06:20:33PM *  0 points [-]

OK, we can decide at that time whether we want to give up what is desirable and etc. in exchange for what is right, but in the meantime I might well be satisfied by that result

No, once ostensibly-Friendly AI has run CEV and knows what it wants, it won't matter if we eventually realize that CEV was wrong after all. The OFAI will go on to do what CEV says it should do, and we won't have a say in the matter.

Comment author: TheOtherDave 03 February 2011 06:30:04PM 2 points [-]

Agreed: avoiding irreversible steps is desirable.

Comment author: [deleted] 01 February 2011 03:31:22PM -3 points [-]

So what are you doing about it? If you think these problems need to be solved, what are you doing to solve them? This reads like a call for other people ('philosophers') to do this rather than doing it yourself. Pointing out the existence of problems without at least suggesting a line of attack is not especially productive.

Comment author: lukeprog 01 February 2011 06:25:34PM 23 points [-]

AndrewHickey,

What am I doing about it? I'm making the meta-ethics relevant to Friendly AI the central research program of my career.

Even if that weren't the case, I think that pointing out the problem without immediately, in the same blog post suggesting possible solutions is productive.

Comment author: grouchymusicologist 01 February 2011 04:15:59PM 12 points [-]

I don't read it as a call-for-action-by-others at all. It seems that with this post and the previous one lukeprog is responsibly trying to verify that a problem exists, presumably as a necessary prologue to working on the problem or at least trying to strategize about how one might go about working on it. And I also don't agree that it's generally unproductive to point out the existence of problems without simultaneously offering suggestions about how to solve them: sometimes the first step really is admitting (or realizing, or verifying) you have a problem.

Comment author: khafra 02 February 2011 02:37:37PM 6 points [-]

Andrew, I congratulate you on your intestinal fortitude in leaving that comment up, unedited, for others to learn from. Reminds me of http://news.ycombinator.com/item?id=35079

Comment author: Richard_Bruns 06 February 2011 09:52:17PM -2 points [-]

Here is a simple moral rule that should make an AI much less likely to harm the interests of humanity:

Never take any action that would reduce the number of bits required to describe the universe by more than X.

where X is some number smaller than the number of bits needed to describe an infant human's brain. For information-reductions smaller than X, the AI should get some disutility, but other considerations could override. This 'information-based morality' assigns moral weight to anything that makes the universe a more information-filled or complex place, and it does so without any need to program complex human morality into the thing. It is just information theory, which is pretty fundamental. Obviously actions are evaluated based on how they alter the expected net present value of the information in the universe, and not just the immediate consequences.

This rule, by itself, prevents the AI from doing many of the things we fear. It will not kill people; a human's brain is the most complex known structure in the universe and killing a person reduces it to a pile of fat and protein. It will not hook people up to experience machines; doing so would dramatically reduce the uniqueness of each individual and make the universe a much simpler place.

Human society is extraordinarily complex. The information needed to describe a collection of interacting humans is much greater than the information needed to describe isolated humans. Breaking up a society of humans destroys information, just like breaking up a human brain into individual neurons. Thus an AI guided by this rule would not do anything to threaten human civilization.

This rule also prevents the AI from making species extinct or destroying ecosystems and other complex natural systems. It ensures that the future will continue to be inhabited by a society of unique humans interacting in a system where nature has been somewhat preserved. As a first approximation, that is all we really care about.

Clearly this rule is not complete, nor is it symmetric. The AI should not be solely devoted to increasing information. If I break a window in your house, it takes more information to describe your house. More seriously, a human body infected with diseases and parasites requires more information to describe than a healthy body. The AI should not prevent humans from reducing the information content of the universe if we choose to do so, and it should assign some weight to human happiness.

The worst-case scenario is that this rule generates an AI that is an extreme pacifist and conservationist, one that refuses to end disease or alter the natural world to fit our needs. I can live with that. I'd rather have to deal with my own illnesses than be turned into paperclips.

One final note: I generally agree with Robin Hanson that rule-following is more important than values. If we program an AI with an absolute respect for property rights, such that it refuses to use or alter anything that it has not been given ownership of, we should be safe no matter what its values or desires are. But I'd like information-based morality in there as well.

Comment author: jimrandomh 06 February 2011 10:09:08PM 2 points [-]

This doesn't work, because the universe could require many bits to describe while those bits were allocated to describing things we don't care about. Most of the information in the universe is in non-morally-significant aspects of the arrangement of molecules, such that things like simple combustion increase the number of bits required to describe the universe (aka the entropy) by a large amount while tiling the universe with paperclips only decreases it by a small amount.