Epictetus comments on Debunking Fallacies in the Theory of AI Motivation - Less Wrong

8 Post author: Richard_Loosemore 05 May 2015 02:46AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (343)

You are viewing a single comment's thread. Show more comments above.

Comment author: TheAncientGeek 16 May 2015 01:58:15PM *  3 points [-]

For example, if you give me a black box and tell me that when the box receives the inputs (1,2,3) then it gives the outputs (1,4,9), I will think backwards from the outputs to the inputs and say "it seems likely that the box is squaring its inputs." If you tell me that a black box squares its inputs, I will think forwards from the definition and say "then if I give it the inputs (1,2,3), then it'll likely give me the output (1,4,9)."So when I hear that the box gets the inputs (source code, goal statement, world model) and produces the output "this goal is inconsistent with the world model!" iff the goal statement is inconsistent with the world model, I reason backwards and say "the source code needs to somehow collide the goal statement with the world model in a way that checks for consistency."

You have assumed that the AI will have some separate boxed-off goal system, and so some unspecified component is needed to relate its inferred knowledge of human happiness back to the goal system.

Loosemore is assuming that the AI will be homogeneous, and then wondering how contradictory beliefs can co exist in such a system, what extra component firewalls off the contradiction,

See the problem? Both parties are making different assumptions, and assuming their assumptions are too obvioust to need stating, and stating differing conclusions that correctly follow their differing assumptions,

Almost. As a minor terminological point, I separate out "efficiency," which is typically "outputs divided by inputs" and "efficacy," which is typically just "outputs." Efficacy is more general, since one can trivially use a system designed to be find effective plans to find efficient plans by changing how "output" is measured. It doesn't seem unfair to view an AI with a truth goal as an AI with an efficacy goal: to effectively produce truth.

If efficiency can be substituted for truth, why is there so so much emphasis on truth in the advice given to human rationalists?

But while artificial systems with truth goals seem possible but as yet unimplemented, artificial systems with efficacy goals have been successfully implemented many, many times, with widely varying levels of sophistication. I have a solid sense of what it looks like to take a thermostat and dial it up to 11, I have only the vaguest sense of what it looks like to take a thermostat and get it to measure truth instead of temperature.

In order to achieve an AI that's smart enough to be dangerous , a number of currently unsolved problems will have to .be solved. That's a given.

Comment author: Epictetus 16 May 2015 06:14:23PM 1 point [-]

Loosemore is assuming that the AI will be homogeneous, and then wondering how contradictory beliefs can co exist in such a system, what extra component firewalls off the contradiction

How do you check for contradictions? It's easy enough when you have two statements that are negations of one another. It's a lot harder when you have a lot of statements that seem plausible, but there's an edge case somewhere that messes things up. If contradictions can't be efficiently found, then you have to deal with the fact that they might be there and hope that if they are, then they're bad enough to be quickly discovered. You can have some tests to try to find the obvious ones, of course.

Comment author: TheAncientGeek 16 May 2015 07:03:18PM *  1 point [-]

Checking for contradictions could be easy, hard or impossible depending on the architecture. Architecture dependence is the point here.