9eB1 comments on Superintelligence 13: Capability control methods - Less Wrong

7 Post author: KatjaGrace 09 December 2014 02:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (44)

You are viewing a single comment's thread. Show more comments above.

Comment author: diegocaleiro 09 December 2014 04:17:06AM 4 points [-]

When I'm faced with problems in which the principal agent problem is present, my take is usually that one should: 1) Pick three or more different metrics that correlate with what you want to measure: using the soviet classic example of a needle factory these could be a) Number of needles produced b) Weight of needles produced c) Average similarity between actual needle design and ideal needle. Then 2) Every time step where you test, you start by choosing one of the correlates at random, and then use that one to measure production.

This seems simple enough for soviet companies. You are still stuck with them trying to optimize those three metrics at the same time without optimizing production, but the more dimensions and degrees of orthogonality between them you find, the more you can be confident your system will be hard to cheat.

How do you think this would not work for AI?

Comment author: 9eB1 09 December 2014 06:04:41AM *  0 points [-]

This is a surprisingly brilliant idea, which should definitely have a name. For humans, part of the benefit of this is that it appeals to risk aversion, so people wouldn't want to completely write off one of the scenarios. It also makes it so complex to analyze, that many people would simply fall back to "doing the right thing" naturally. I could definitely foresee benefits by, for example, randomizing whether members of a team are going to be judged based on individual performance or team performance.

I'm not totally sure it would work as well for AIs, which would naturally be trying to optimize in the gaps much more than a human would, and would potentially be less risk averse than a human.

Comment author: diegocaleiro 09 December 2014 07:24:01AM *  1 point [-]

Let's name it:

Hidden Agency Solution

Caleiro Agency Conjecture

Principal agent blind spot

Cal-agency

Stochastic principal agency solution.

(wow, this naming thing is hard and awfully awkward, whether I'm optimizing for mnemonics or for fame - is there any other thing to optimize for here?)

Comment author: Halfwitz 10 December 2014 01:10:25AM 1 point [-]

Fuzzy metrics?

Comment author: diegocaleiro 11 December 2014 12:10:29AM 1 point [-]

Doesn't refer to Principal Agency.