orthonormal comments on The scourge of perverse-mindedness - Less Wrong

95 Post author: simplicio 21 March 2010 07:08AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (249)

You are viewing a single comment's thread. Show more comments above.

Comment author: orthonormal 22 March 2010 12:07:37AM 3 points [-]

What Phil said, and also:

Taboo "fairly"— this is another word the specification of which requires the whole of human values. Proving that the AI understands what we mean by fairness and wants to pass the test fairly is no easier than proving it Friendly in the first place.

Comment author: Strange7 22 March 2010 01:33:55AM 0 points [-]

"Fairly" was the wrong word in this context. Better might be 'honest' or 'truthful.' A truthful piece of information is one which increases the recipient's ability to make accurate predictions; an honest speaker is one whose statements contain only truthful information.

Comment author: RobinZ 22 March 2010 02:23:10AM *  2 points [-]

the recipient's ability to make accurate predictions

About what? Anything? That sounds very easy.

Remember Goodhart's Law - what we want is G, Good, not any particular G* normally correlated with Good.

Comment author: Strange7 22 March 2010 02:50:52AM *  1 point [-]

That sounds very easy.

Walking from Helsinki to Saigon sounds easy, too, depending on how it's phrased. Just one foot in front of the other, right?

Humans make predictions all the time. Any time you perceive anything and are less than completely surprised by it, that's because you made a prediction which was at least partly successful. If, after receiving and assimilating the information in question, any of your predictions is reduced in accuracy, any part of that map becomes less closely aligned with the territory, then the information was not perfectly honest. If you ignore or misinterpret it for whatever reason, even when it's in some higher sense objectively accurate, that still fails the honesty test.

A rationalist should win; an honest communicator should make the audience understand.

Given the option, I'd take personal survival even at the cost of accurate perception and ability to act, but it's not a decision I expect to be in the position of needing to make: an entity motivated to provide me with information that improves my ability to make predictions would not want to kill me, since any incoming information that causes my death necessarily also reduces my ability to think.

Comment author: orthonormal 22 March 2010 03:16:11AM *  2 points [-]

What Robin is saying is, there's a difference between

  • "metrics that correlate well enough with what you really want that you can make them the subject of contracts with other human beings", and

  • "metrics that correlate well enough with what you really want that you can make them the subject of a transhuman intelligence's goals".

There are creative avenues of fulfilling the letter without fulfilling the spirit that would never occur to you but would almost certainly occur to a superintelligence, not because xe is malicious, but because they're the optimal way to achieve the explicit goal set for xer. Your optimism, your belief that you can easily specify a goal (in computer code, not even English words) which admits of no undesirable creative shortcuts, is grossly misplaced once you bring smarter-than-human agents into the discussion. You cannot patch this problem; it has to be rigorously solved, or your AI wrecks the world.

Comment author: RobinZ 22 March 2010 02:55:43AM 1 point [-]

Given the option, I'd take personal survival even at the cost of accurate perception and ability to act, but it's not a decision I expect to be in the position of needing to make: an entity motivated to provide me with information that improves my ability to make predictions would not want to kill me, since any incoming information that causes my death necessarily also reduces my ability to think.

Sure, but I don't want to be locked in a box watching a light blink very predictably on and off.

Comment author: Strange7 22 March 2010 03:07:09AM 0 points [-]

Building the box reduces your ability to predict anything taking place outside the box. Even if the box can be sealed perfectly until the end of time without killing you (which would in itself be a surprise to anyone who knows thermodynamics), cutting off access to compilations of medical research reduces your ability to predict your own physiological reactions. Same goes for screwing with your brain functions.

Comment author: RobinZ 22 March 2010 03:10:16AM *  3 points [-]

I do not think you should be as confident as you are that your system is bulletproof. You have already had to elaborate and clarify and correct numerous times to rule out various kinds of paperclipping failures - all it takes is one elaboration or clarification or correction forgotten to allow for a new one, attacking the problem this way.

Comment author: Strange7 22 March 2010 03:33:05AM 0 points [-]

How confident do you think I am that my plan is bulletproof?

Comment author: RobinZ 22 March 2010 03:35:26AM 0 points [-]

Given that you asked me the question, I reckon you give it somewhere between 1:100 and 2:1 odds of succeeding. I reckon the odds are negligible.

Comment author: Strange7 22 March 2010 03:45:52AM 0 points [-]

That's our problem right there: you're trying to persuade me to abandon a position I don't actually hold. I agree that an AI based strictly on a survey of all historical humans would have negligible chance of success, simply because a literal survey is infeasible and any straightforward approximation of it would introduce unacceptable errors.