You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

hg00 comments on Agential Risks: A Topic that Almost No One is Talking About - Less Wrong Discussion

6 Post author: philosophytorres 15 October 2016 06:41PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (30)

You are viewing a single comment's thread. Show more comments above.

Comment author: hg00 17 October 2016 10:25:59PM *  2 points [-]

I'm familiar with lots of the things Eliezer Yudkowsky has said about AI. That doesn't mean I agree with them. Less Wrong has an unfortunate culture of not discussing topics once the Great Teacher has made a pronouncement.

Plus, I don't think philosophytorres' claim is obvious even if you accept Yudkowsky's arguments.

Fragility of value thesis. Getting a goal system 90% right does not give you 90% of the value, any more than correctly dialing 9 out of 10 digits of my phone number will connect you to somebody who’s 90% similar to Eliezer Yudkowsky. There are multiple dimensions for which eliminating that dimension of value would eliminate almost all value from the future. For example an alien species which shared almost all of human value except that their parameter setting for “boredom” was much lower, might devote most of their computational power to replaying a single peak, optimal experience over and over again with slightly different pixel colors (or the equivalent thereof). Friendly AI is more like a satisficing threshold than something where we’re trying to eke out successive 10% improvements. See: Yudkowsky (2009, 2011).

From here.

OK, so do my best friend's values constitute a 90% match? A 99.9% match? Do they pass the satisficing threshold?

Also, Eliezer's boredom-free scenario sounds like a pretty good outcome to me, all things considered. If an AGI modified me so I could no longer get bored, and then replayed a peak experience for me for millions of years, I'd consider that a positive singularity. Certainly not a "catastrophe" in the sense that an earthquake is a catastrophe. (Well, perhaps a catastrophe of opportunity cost, but basically every outcome is a catastrophe of opportunity cost on a long enough timescale, so that's not a very interesting objection.) The utility function is not up for grabs--I am the expert on my values, not the Great Teacher.

Here's the abstract from his 2011 paper:

A common reaction to first encountering the problem statement of Friendly AI (“Ensure that the creation of a generally intelligent, self-improving, eventually superintelligent system realizes a positive outcome”) is to propose a single moral value which allegedly suffices; or to reject the problem by replying that “constraining” our creations is undesirable or unnecessary. This paper makes the case that a criterion for describing a “positive outcome,” despite the shortness of the English phrase, contains considerable complexity hidden from us by our own thought processes, which only search positive-value parts of the action space, and implicitly think as if code is interpreted by an anthropomorphic ghost-in-the-machine. Abandoning inheritance from human value (at least as a basis for renormalizing to reflective equilibria) will yield futures worthless even from the standpoint of AGI researchers who consider themselves to have cosmopolitan values not tied to the exact forms or desires of humanity.

It sounds to me like Eliezer's point is more about the complexity of values, not the need to prevent slight misalignment. In other words, Eliezer seems to argue here that a naively programmed definition of "positive value" constitutes a gross misalignment, NOT that a slight misalignment constitutes a catastrophic outcome.

Please think critically.

Comment author: turchin 20 October 2016 10:42:02PM 0 points [-]

I think that small error inside a value description could result in bad result, but it is not so, if we have a list of independent values.

In phone example if I lose one digit from someone number, I will not get 90 per cent of him, but if I lose 1 phone number from my phone book, it will be 90 per cent intact.

Humans tend to have many somewhat independent values, like some may like fishing, snorkeling, girls, clouds, etc. If he lost one of them it is not a big deal, it is almost him and it happens all the time with real humans, as their predispositions could change overnight.