eli_sennesh comments on Why I haven't signed up for cryonics - Less Wrong

29 Post author: Swimmer963 12 January 2014 05:16AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (249)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 15 January 2014 09:20:12AM 1 point [-]

I'm thinking of cases where the programmers tried to write a FAI but they did something slightly wrong.

I'm having trouble coming up with a realistic model of what that would look like. I'm also wondering why aspiring FAI designers didn't bother to test-run their utility function before actually "running" it in a real optimization process.

Comment author: Lumifer 15 January 2014 03:39:18PM 1 point [-]

I'm also wondering why aspiring FAI designers didn't bother to test-run their utility function before actually "running" it in a real optimization process.

Because if you don't construct a FAI but only construct a seed out of which a FAI will build itself, it's not obvious that you'll have the ability to do test runs.

Comment author: [deleted] 15 January 2014 05:55:16PM 0 points [-]

Well, that sounds like a new area of AI safety engineering to explore, no? How to check your work before doing something potentially dangerous?

Comment author: Eugine_Nier 16 January 2014 06:10:18AM 0 points [-]

I believe that is MIRI's stated purpose.

Comment author: [deleted] 16 January 2014 08:06:44AM 2 points [-]

Quite so, which is why I support MIRI despite their marketing techniques being much too fearmongering-laden, in my opinion.

Even though I do understand why they are: Eliezer believes he was dangerously close to actually building an AI before he realized it would destroy the human race, back in the SIAI days. Fair enough on him, being afraid of what all the other People Like Eliezer might do, but without being able to see his AI designs from that period, there's really no way for the rest of us to judge whether it would have destroyed the human race or just gone kaput like so many other supposed AGI designs. Private experience, however, does not serve as persuasive marketing material.

Comment author: Kaj_Sotala 15 January 2014 04:08:52PM 0 points [-]

Have you read Failed Utopia #4-2?

Comment author: [deleted] 15 January 2014 05:53:44PM *  0 points [-]

I have, but it's running with the dramatic-but-unrealistic "genie model" of AI, in which you could simply command the machine, "Be a Friendly AI!" or "Be the CEV of humanity!", and it would do it. In real life, verbal descriptions are mere shorthand for actual mental structures, and porting the necessary mental structures for even the slightest act of direct normativity over from one mind-architecture to another is (I believe) actually harder than just using some form of indirect normativity.

(That doesn't mean any form of indirect normativity will work rightly, but it does mean that Evil Genie AI is a generalization from fictional evidence.)

Hence my saying I have trouble coming up with a realistic model.

Comment author: MugaSofer 15 January 2014 09:50:25AM -1 points [-]

Perhaps it had implications that only became clear to a superintelligence?

Comment author: [deleted] 15 January 2014 12:57:47PM 2 points [-]

Hmmm... Upon thinking it over in my spare brain-cycles for a few hours, I'd say the most likely failure mode of an attempted FAI is to extrapolate from the wrong valuation machinery in humans. For instance, you could end up with a world full of things people want and like, but don't approve. You would thus end up having a lot of fun while simultaneously knowing that everything about it is all wrong and it's never, ever going to stop.

Of course, that's just one cell in a 2^3-cell grid, and that's assuming Yvain's model of human motivations is accurate enough that FAI designers actually tried to use it, and then hit a very wrong square out of 8 possible squares.

Within that model, I'd say "approving" is what we're calling the motivational system that imposes moral limits on our behavior, so I would say if you manage to combine wanting and/or liking with a definite +approving, you've got a solid shot at something people would consider moral. Ideally, I'd say Friendliness should shoot for +liking/+approving while letting wanting vary. That is, an AI should do things people both like and approve of without regard to whether those people would actually feel motivated enough to do them.

Comment author: Eliezer_Yudkowsky 16 January 2014 06:32:09AM 0 points [-]

You would thus end up having a lot of fun while simultaneously knowing that everything about it is all wrong and it's never, ever going to stop.

Are we totally sure this is not what utopia initially feels like from the inside? Because I have to say, that sentence sounded kinda attractive for a second.

Comment author: [deleted] 16 January 2014 07:37:10AM 3 points [-]

It's what an ill-designed "utopia" might feel like. Note the link to Yvain's posting: I'm referring to a "utopia" that basically consists of enforced heroin usage, or its equivalent. Surely you can come up with better things to do than that in five minutes' thinking.

Comment author: MugaSofer 16 January 2014 11:45:58PM *  1 point [-]

What kinds of wierdtopias are you imagining that would fulfill those criteria?

Because the ones that first sprung to mind for me (this might make an interesting exercise for people, actually) were all emphatically, well, wrong. Bad. Unethical. Evil... could you give some examples?

Comment author: TheOtherDave 17 January 2014 12:59:57AM -1 points [-]

I of course don't speak for EY, but what I would mean if I made a similar comment would hinge on expecting my experience of "I know that everything about this is all wrong" to correlate with anything that's radically different from what I was expecting and am accustomed to, whether or not they are bad, unethical, or evil, and even if I would endorse it (on sufficient reflection) more than any alternatives.

Given that I expect my ideal utopia to be radically different from what I was expecting and am accustomed to (because, really, how likely is the opposite?), I should therefore expect to react that way to it initially.

Comment author: MugaSofer 17 January 2014 01:27:20AM -1 points [-]

Although I don't usually include a description of the various models of the other speaker I'm juggling during conversation, that's my current best guess. However, principle of charity and so forth.

(Plus Eliezer is very good at coming up with wierdtopias - probably better than I am.)