You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

hg00 comments on Agential Risks: A Topic that Almost No One is Talking About - Less Wrong Discussion

6 Post author: philosophytorres 15 October 2016 06:41PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (30)

You are viewing a single comment's thread.

Comment author: hg00 17 October 2016 09:03:50AM 0 points [-]

Good post!

While not all sociopaths are violent, a disproportionate number of criminals and dictators have (or very likely have) had the condition.

Luckily sociopaths tend to have poor impulse control.

It follows that some radical environmentalists in the future could attempt to use technology to cause human extinction, thereby “solving” the environmental crisis.

Reminds me of Derrick Jensen. He doesn't talk about human extinction, but he does talk about bringing down civilization.

Fortunately, this version of negative utilitarianism is not a position that many non-academics tend to hold, and even among academic philosophers it is not especially widespread.

For details see http://www.amirrorclear.net/academic/ideas/negative-utilitarianism/

This is worrisome because recent research shows that even slight misalignments between our values and those motivating a superintelligence could have existentially catastrophic consequences.

Citation? This is commonly asserted by AI risk proponents, but I'm not sure I believe it. My best friend's values are slightly misaligned relative to my own, but if my best friend became superintelligent, that seems to me like it'd be a pretty good outcome.

Comment author: SithLord13 17 October 2016 12:10:19PM 0 points [-]

Citation? This is commonly asserted by AI risk proponents, but I'm not sure I believe it. My best friend's values are slightly misaligned relative to my own, but if my best friend became superintelligent, that seems to me like it'd be a pretty good outcome.

I highly recommend reading this.

Comment author: hg00 17 October 2016 10:25:59PM *  2 points [-]

I'm familiar with lots of the things Eliezer Yudkowsky has said about AI. That doesn't mean I agree with them. Less Wrong has an unfortunate culture of not discussing topics once the Great Teacher has made a pronouncement.

Plus, I don't think philosophytorres' claim is obvious even if you accept Yudkowsky's arguments.

Fragility of value thesis. Getting a goal system 90% right does not give you 90% of the value, any more than correctly dialing 9 out of 10 digits of my phone number will connect you to somebody who’s 90% similar to Eliezer Yudkowsky. There are multiple dimensions for which eliminating that dimension of value would eliminate almost all value from the future. For example an alien species which shared almost all of human value except that their parameter setting for “boredom” was much lower, might devote most of their computational power to replaying a single peak, optimal experience over and over again with slightly different pixel colors (or the equivalent thereof). Friendly AI is more like a satisficing threshold than something where we’re trying to eke out successive 10% improvements. See: Yudkowsky (2009, 2011).

From here.

OK, so do my best friend's values constitute a 90% match? A 99.9% match? Do they pass the satisficing threshold?

Also, Eliezer's boredom-free scenario sounds like a pretty good outcome to me, all things considered. If an AGI modified me so I could no longer get bored, and then replayed a peak experience for me for millions of years, I'd consider that a positive singularity. Certainly not a "catastrophe" in the sense that an earthquake is a catastrophe. (Well, perhaps a catastrophe of opportunity cost, but basically every outcome is a catastrophe of opportunity cost on a long enough timescale, so that's not a very interesting objection.) The utility function is not up for grabs--I am the expert on my values, not the Great Teacher.

Here's the abstract from his 2011 paper:

A common reaction to first encountering the problem statement of Friendly AI (“Ensure that the creation of a generally intelligent, self-improving, eventually superintelligent system realizes a positive outcome”) is to propose a single moral value which allegedly suffices; or to reject the problem by replying that “constraining” our creations is undesirable or unnecessary. This paper makes the case that a criterion for describing a “positive outcome,” despite the shortness of the English phrase, contains considerable complexity hidden from us by our own thought processes, which only search positive-value parts of the action space, and implicitly think as if code is interpreted by an anthropomorphic ghost-in-the-machine. Abandoning inheritance from human value (at least as a basis for renormalizing to reflective equilibria) will yield futures worthless even from the standpoint of AGI researchers who consider themselves to have cosmopolitan values not tied to the exact forms or desires of humanity.

It sounds to me like Eliezer's point is more about the complexity of values, not the need to prevent slight misalignment. In other words, Eliezer seems to argue here that a naively programmed definition of "positive value" constitutes a gross misalignment, NOT that a slight misalignment constitutes a catastrophic outcome.

Please think critically.

Comment author: turchin 20 October 2016 10:42:02PM 0 points [-]

I think that small error inside a value description could result in bad result, but it is not so, if we have a list of independent values.

In phone example if I lose one digit from someone number, I will not get 90 per cent of him, but if I lose 1 phone number from my phone book, it will be 90 per cent intact.

Humans tend to have many somewhat independent values, like some may like fishing, snorkeling, girls, clouds, etc. If he lost one of them it is not a big deal, it is almost him and it happens all the time with real humans, as their predispositions could change overnight.