Comment Permalink

Will_Newsome14y-40

Sounds good. I sort of feel obligated to point out that CEV is about policy, public relations, and abstract philosophy significantly more than it is about the real problem of FAI. Thus I'm a little worried about what "working on CEV" might look like if the optimization targets aren't very clear from the start.

Bringing CEV up-to-date and ideally emphasizing that whatever line of reasoning you are using to object to some imagined CEV scenario, because that line of reasoning is contained within you, CEV will by its very nature also take into account that line of reasoning sounds more straight-forwardly good. (Actually, Steve had some analysis about why even smart people so consistently miss this point (besides the typical diagnosis of 'insufficient Hofstadter during adolescence syndrome') which should really go into a future CEV doc. A huge part of the common confusion about CEV is due to people not really noticing or understanding the whole "if you can think of a failure mode, the AI can think of it" thing.)

Vladimir_Nesov14y90

whatever line of reasoning you are using to object to some imagined CEV scenario, because that line of reasoning is contained within you, CEV will by its very nature also take into account that line of reasoning

This assumes that CEV actually works as intended (and the intention was the right one), which would be exactly the question under discussion (hopefully), so in that context you aren't allowed to make that assumption.

The adequate response is not that it's "correct by definition" (because it isn't, it's a constructed artifact that could ... (read more)

2lukeprog14y

I'm not sure what you mean by the first paragraph. CEV is a plan for friendliness content. That is one of the real problems with FAI, along with the problem of reflective decision theory, the problem of goal stability over self-modification, and others. Your bolded words do indeed need to be emphasized, but people can rightly worry that the particular line of reasoning that leads them to a failure scenario will not be taken into account if, for example, their brains are not accounted for by CEV either because nobody with that objection is scanned for their values, or because extrapolated values do not converge cleanly and the value that leads to the supposed failure scenario will not survive a required 'voting' process (or whatever) in the extrapolation process.

See in context

21 Beginning resources for CEV research

by lukeprog

7th May 2011

2 min read

21

I've been working on metaethics/CEV research for a couple months now (publishing mostly prerequisite material) and figured I'd share some of the sources I've been using.

CEV sources.

Yudkowsky, Metaethics sequence
Yudkowsky, 'Coherent Extrapolated Volition'
Tarleton, 'Coherent extrapolated volition: A meta-level approach to machine ethics'

Motivation. CEV extrapolates human motivations/desires/values/volition. As such, it will help to understand how human motivation works.

Neuroeconomics studies motivation as a driver of action under uncertainty. Start with Neuroeconomics: Decision Making and the Brain (2008) and Foundations of Neuroeconomic Analysis (2010), and see my bibliography here.
Affective neuroscience studies motivation as an emotion. Start with Pleasures of the Brain (2009) and my bibliography here.
Motivation science integrates psychological approaches to studying motivation. Start with The Psychology of Goals (2009), Oxford Handbook of Human Action (2008), and Handbook of Motivation Science (2007).

Extrapolation. Is it plausible to think that some kind of extrapolation of human motivations will converge on a single motivational set? How would extrapolation work, exactly?

Reflective equilibrium. Yudkowsky's proposed extrapolation works analogously to what philosophers call 'reflective equilibrium.' The most thorough work here is the 1996 book by Daniels, and there have been lots of papers, but this genre is only barely relevant for CEV. Basically, an entirely new literature on volition-extrapolation algorithms needs to be created.
Full-information accounts of value and ideal observer theories. This is what philosophers call theories of value that talk about 'what we would want if we were fully informed, etc.' or 'what a perfectly informed agent would want' like CEV does. There's some literature on this, but it's only marginally relevant to CEV. Again, an entirely new literature needs to be written to solve this problem.

Metaethics. Should we use CEV, or something else? What does 'should' mean?

Yudkowsky, Metaethics sequence
An Introduction to Contemporary Metaethics is a good introduction to mainstream metaethics. Unfortunately, nearly all of mainstream metaethics is horribly misguided, but the book will at least give you a good sense of the questions involved and what some of the wrong answers are. The chapter on moral reductionism is the most profitable.
Also see 'Which Consequentialism? Machine ethics and moral divergence.'

Building the utility function. How can a seed AI be built? How can it read what to value?

Dewey, 'Learning What to Value'
Yudkowsky, 'Coherent Extrapolated Volition'
Yudkowsky, 'Artificial Intelligence as a Positive and Negative Factor in Global Risk'

Preserving the utility function. How can the motivations we put into a superintelligence be preserved over time and self-modifcation?

Yudkowsky, 'Coherent Extrapolated Volition'
De Blanc, 'Ontological Crises in Artificial Agents' Value Systems'
Omohundro, 'Basic AI Drives' and 'The Nature of Self-Improving Artificial Intelligence' (instrumental drives to watch out for, and more)

Reflective decision theory. Current decision theories tell us little about software agents that make decisions to modify their own decision-making mechanisms.

See the Less Wrong wiki page on decision theory.
Wei Dai's Updateless Decision Theory
Yudkowsky's Timeless Decision Theory

Additional suggestions welcome. I'll try to keep this page up-to-date.

Coherent Extrapolated Volition

Personal Blog

21

Mentioned in

15Superintelligence 23: Coherent extrapolated volition

New Comment

32 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:48 AM

[-]snarles14y50

Very basic question on CEV. Supposing humans have fundamentally disagreeing 'reflective equilibria,' does CEV attempt find a game-theoretic equilibrium (presumably which all humans would 'reflectively' agree to?)

[-]lukeprog14y50

Right, it depends on which extrapolation process is used. One of the open problems of CEV is the question of which extrapolation process to use, and why.

[-]Will_Newsome14y-10

I guess you could call it that, but that doesn't necessarily mean it's a correct question, or that anyone is necessarily thinking about the problem of FAI with that implied conceptual framework.

Mostly unrelated idea: It'd be really cool if someone who'd thought a decent amount about FAI could moderate a single web page where people with some FAI/rationality experience could post (by emailing the moderator or whatever) somewhat cogent advice about how whoever's reading the site could perhaps make a small amount of progress towards FAI conceptual development. Restricting it to advice would keep each contributor's section from bloating to become a jargon-filled description of their personal approach/project. Being somewhat elitist/selective about allowed contributors/contributions would be important. Advice shouldn't just be LW-obvious applause lights. The contributors should assume (because a notice at the top says so) that their audience is, or can easily become without guidance, pretty damn familiar with FAI dreams and their patterns of failure and thus doesn't need those arguments repeated. Basically, the advice should be novel to the easily accessible web, though it's okay to emphasize e.g. specific ways of doing analysis found in LOGI. But basically such restrictions are just hypotheses about optimal tone. If the moderator is selective about contributors then it'd probably naturally self-optimize.

Such a site sounds pretty easy to set up. It's just an HTML document with a description and lots of external links and book suggestions at the top, and neat sections below. Potential hard parts: seducing people (e.g. Mitchell Porter, Wei Dai) to seed it, and choosing a moderator who's willing to be choosy about what gets published, and is willing to implement edits according to some sane policy for editing. (And maybe some other moderators with access too.)

I guess it's possible that LW wiki is sort of almost okay, but really, I don't like it. It's not a url I can just type into my address bar, it requires extra moderation which is socially and technically awkward, LW wiki is not about FAI, and in general it doesn't have the clean simplicity which is both attractive and expandable in many ways.

I'm not sure how to stably point people at it, but it'd be easy to link to when someone professes interested in learning more about FAI stuff. Also it's probable a fair bit of benefit would come from current FAI-interested folk getting a chance to learn from each other, and depending on the site structure (like whether or not a paragraph or another page just about current research interests of all contributors is a good idea) it could easily provide an affordance for people to actually bother constructively criticizing others' approaches and emphases. I suspect that lukeprog's future efforts could be sharpened by Vladimir Nesov's advice, as a completely speculative example. And I'd like to have a better idea of what Mitchell Porter thinks I might be missing, as a non-speculative example.

What do you think, Luke? Worth an experiment?

[-]Mitchell_Porter14y10

Do you think a LW subreddit devoted to FAI could work? If not, then we probably aren't ready for the site you suggest, and the default venue for such dialogues should continue to be LW Discussion.

[-]wedrifid14y00

Do you think a LW subreddit devoted to FAI could work?

Probably not. There are too many things that can't be said about FAI in a SIAI affiliated blog for political reasons. It would be lame.

[-]Jordan14y00

What if the subreddit was an actual reddit subreddit?

[-]Will_Newsome14y-20

I think a LW subreddit devoted to FAI could potentially be very frustrating. The majority of FAI-related posts that I've seen on LW Discussion are pretty bad and get upvoted anyway (though not much). Do you think Discussion is an adequate forum for now?

[-]Mitchell_Porter14y20

I should use this opportunity to quit LW for a whiie.

[-]Mitchell_Porter14y10

A new forum devoted to FAI risks rapidly running out of quality material, if it just recruits a few people from LW. It needs outsiders from relevant fields, like AGI, non-SIAI machine ethics, and "decision neuroscience", to have a chance of sustainability, and these new recruits will be at risk of fleeing the project if it comes packaged with the standard LW eschatology of immortality and a utilitronium cosmos, which will sound simultaneously fanatical and frivolous to someone engaged in hard expert work. I don't think we're ready for this; it sounds like at least six months' work to develop a clear intention for the site, decide who to invite and how to invite them, and otherwise settle into the necessary sobriety of outlook.

Meanwhile, you could make a post like Luke has done, explaining your objective and the proposed ingredients.

[-]lukeprog14y10

Not a project that I have time for right now. But I certainly would like to collaborate with others working on CEV. My hope is to get through my metaethics sequence to get my own thoughts clear and communicate them to others, and also so that we all have a more up-to-date starting point than Eliezer's 2004 CEV paper.

[-]Will_Newsome14y-40

[-]Vladimir_Nesov14y90

whatever line of reasoning you are using to object to some imagined CEV scenario, because that line of reasoning is contained within you, CEV will by its very nature also take into account that line of reasoning

The adequate response is not that it's "correct by definition" (because it isn't, it's a constructed artifact that could well be a wrong thing to construct), but an (abstract) explanation of why it will still make that correct decision under the given circumstances. An explanation of why exactly it's true that CEV will also take into account that line of reasoning, why do you believe that it is its nature to do so, for example. And it aren't that simple, say it won't take into account that line of reasoning if it's wrong, but it's again not clear how it decides what's wrong.

[-]Will_Newsome14y-20

This assumes that CEV actually works as intended (and the intention was the right one), which would be exactly the question under discussion (hopefully), so in that context you aren't allowed to make that assumption.

Right, I am talking about the scenario not covered by your "(hopefully)" clause where people accept for the sake of argument that CEV would work as intended/written but still imagine failure modes. Or subtler cases where you think up something horrible that CEV might do but don't use your sense of horribleness as evidence against CEV actually doing it (e.g. Rokogate). It seems to me you are talking about people who are afraid CEV wouldn't be implemented correctly, which is a different group of people that includes basically everyone, no? (I should probably note again that I do not think of CEV as something you'd work on implementing so much as a piece of philosophy and public relations that you should take into account when thinking up FAI research plans. I am definitely not going around saying "CEV is right by definition!"...)

[-]lukeprog14y20

Your bolded words do indeed need to be emphasized, but people can rightly worry that the particular line of reasoning that leads them to a failure scenario will not be taken into account if, for example, their brains are not accounted for by CEV either because nobody with that objection is scanned for their values, or because extrapolated values do not converge cleanly and the value that leads to the supposed failure scenario will not survive a required 'voting' process (or whatever) in the extrapolation process.

[-]wedrifid14y-10

I'm not sure what you mean by the first paragraph. CEV is a plan for friendliness content.

More of a partial plan. I would call it a plan once an approximate mechanism for aggregation is specified. Without the aggregation method the outcome is basically undefined.

or because extrapolated values do not converge cleanly and the value that leads to the supposed failure scenario will not survive a required 'voting' process (or whatever) in the extrapolation process.

The 'people are assholes' failure mode. :)

[-]Will_Newsome14y-30

My impression and my worry is that calling CEV a 'plan for Friendliness content', while true in a sense, is giving CEV-as-written too much credit as a stable conceptual framework. My default vision of someone working on CEV from my intuitive knee-jerk interpretation of your phrasing is of a person thinking hard for many hours about how to design a really clever meta level extrapolation process. This would probably be useful work, compared to many other research methods. But I would be kind of surprised if such research was at all eventually useful before the development of a significantly more thorough notion of preference, preferences as bounded computations, approximately embodied computation, overlapping computations, et cetera. I may well be underestimating the amount of creative juices you can get from informal models of something like extrapolation. It could be that you don't have to get A.I.-precise to get an abstract theory whose implementation details aren't necessarily prohibitively arbitrary, complex, or model-breaking. But I don't think C.E.V. is at the correct level of abstraction to start such reasoning, and I'm worried that the first step of research on it wouldn't involve an immediate and total conceptual reframing on a more precise/technical level. That said, there is assuredly less technical but still theoretical research to be done on existent systems of morality and moral reasoning, so I am not advocating against all research that isn't exploring the foundations of computer science or anything.

I should note that the above are my impressions and I intend them as evidence more than advice. Someone who has experience jumping between original research on condensed matter physics and macroscopic complex systems modeling (as an example of a huge set of people) would know a lot more about the right way to tackle such problems.

Your second paragraph is of course valid and worth noting though it perhaps unfortunately doesn't describe the folk I'm talking about, who are normally thinking on the humanity and not individual level. I should have stated that specifically. I should note for posterity that I am incredibly tired and (legally) drugged, and also was in my previous message, so although I feel sane I may not think so upon reflection.

[-]Will_Newsome14y-40

(Deleted this minor comment as no longer relevant, so instead: how do you add line breaks with iOS 4? 20 seconds of Google didn't help me.)

[-]arundelo14y10

Type a space.
Type a letter (doesn't matter which).
Erase the letter.
Type another space.
Press "return".

Steps 2 and 3 are to defeat the auto-complete rule that in certain cases turns two consecutive spaces into a period and one space. The other steps are the same as what you would do on a regular computer.

Note that you should only do this if you are typing a poem or something else where you would use the HTML
element. Normally you should use paragraph breaks, which you get by pressing "return" twice, so that a blank line is between the paragraphs (same as on a regular computer).

[-]Will_Newsome14y-20

The problem is, I don't think I have a return button? Ah well, it's not a big deal at all. I might try HTML breaks next time.

[-]arundelo14y00

What web browser are you using? I have a "return" button in Safari (on an iPhone 3G running iOS 4.2.1).

I might try HTML breaks next time.

Won't work; the LW Markdown implementation doesn't do raw HTML. (In other words, when I typed "
" in my previous comment and this one, I didn't need to do any escaping to get it to show up rather than turn into a line break.)

If you don't mind some hassle, it would probably work to write your comment in the "Notes" app, then copy and paste it.

[-][anonymous]14y30

Is 'Ontological Crises in Artificial Agents' Value Systems' available online somewhere?

[-]lukeprog14y00

It is not. I haven't heard back from Peter as to whether or not he wants it to be available online.

[-][anonymous]14y00

Here is the link.

[-]lukeprog14y00

Discovered the Oxford Handbook of Human Action today, a great source for review articles on how motivation works in humans. I've added it to the original post, above.

And of course, there is no end to new action in the journals. For example, see this 2011 paper on conflicts between unconscious goals.

[-]Will_Newsome14y00

Do you think this research will ever lead to practical and significantly more effective methods of motivational system re-engineering?

[-]lukeprog14y40

Motivation research? It already has. Did you have something more specific in mind?

I'm mostly researching motivation for the purposes of metaethics, though.

[-]Will_Newsome14y10

That's the kind of thing I was talking about I suppose; I think I might not have seen the generality of that post the first time I saw it because it claims to be about procrastination and I don't normally think of 'akrasia' (gah I hate that word) as procrastination.

[-]lukeprog14y00

Also, this recent book chapter by Bargh et al. summarizes many of the basic findings in conscious and unconscious goal pursuit and gives concrete practical advice throughout.

[-]Hul-Gil14y00

Thanks for this! This page will keep me busy for a while. Ethics is my favorite branch of philosophy, which is my favorite hobby (having abandoned the idea of philosophizing for money); and until this page, pondering the use of ethics in the development of Friendly AI was not on my mental radar.

What is meant by that question about "should"? If it's a general inquiry, I have always considered it like so: If it is said you "should" do action Y, then action Y is thought to cause some outcome X, and this outcome X is thought to be desirable.

[-]snarles14y30

In common usage, there's weak 'should' and strong 'should'.

Weak should is simply a suggestion.

A: "What should I do?" (Please suggest a course of action.)

B: "You should do X." (Have you considered X?)

Strong should argues for a new course of action instead of the one the listener originally proposed. It comes with the subtext that the listener may initially disagree with the suggested course of action, but suggests that the listener re-evaluate their disagreement in light of the speaker's conviction.