christopherj comments on AI risk, new executive summary - Less Wrong

12 Post author: Stuart_Armstrong 18 April 2014 10:45AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (76)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 20 April 2014 04:08:45PM *  1 point [-]

What are your reasons for thinking this? I find myself disagreeing: one big disanalogy is that while we have language and chimps do not, we and the AGI both have language. I find it implausible that the AGI could not in principle communicate to us its goals: give the AGI and ourselves an arbitrarily large amount of time and resources to talk, do you really think we'd never come to a common understanding? Because even if we don't, the AGI effectively does have such resources by which it might, I donno, choose its words with care.

I'm also not sure why we should think it would even be particularly challenging to understand the goals of an AGI. It's not easy even with other humans, but why would it be much harder with AGI? Do we have some reason to expect its goals to be more complex than ours? It's been my experience that the more sophisticated and intelligent someone is, the more intelligible their behavior tends to be. My prejudice therefore says that the goals of an AGI would be much easier to understand than, say, my own.

Comment author: christopherj 27 April 2014 02:59:59PM 0 points [-]

Do we have some reason to expect [an AGI's] goals to be more complex than ours?

I find myself agreeing with you -- human goals are a complex mess, which we seldom understand ourselves. We don't come with clear inherent goals, and what goals we do have we abuse by using things like sugar and condoms instead of eating healthy and reproducing like we were "supposed" to. People have been asking about the meaning of life for thousands of years, and we still have no answer.

An AI on the other hand, could have very simple goals -- make paperclips, for example. An AI's goals might be completely specified in two words. It's the AI's sub-goals and plans to reach its goals that I doubt I could comprehend. It's the very single-mindedness of an AI's goals and our inability to comprehend our own goals, plus the prospect of an AI being both smarter and better at goal-hacking than us, that has many of us fearing that we will accidentally kill ourselves via non-friendly AI. Not everyone will think to clarify "make paperclips" with, "don't exterminate humanity", "don't enslave humanity", "don't destroy the environment", "don't reprogram humans to desire only to make paperclips", and various other disclaimers that wouldn't be necessary if you were addressing a human (and we don't know the full disclaimer list either).