Comment author: buybuydandavis 16 July 2015 11:38:27PM 2 points [-]

I see this failure in analysis all the time.

When people want to change the behavior of others, they find some policy and incentive that would encourage the change they desire, but never stop to ask how else people might react to that change in incentives.

Anyone ever come across any catchy name or formulation for this particular failure mode?

Comment author: robertzk 02 August 2015 02:27:41AM 1 point [-]

Isn't this an example of a reflection problem? We induce this change in a system, in this case an evaluation metric, and now we must predict not only the next iteration but the stable equilibria of this system.

Comment author: robertzk 15 March 2015 06:54:30AM 2 points [-]

Did you remove the vilification of proving arcane theorems in algebraic number theory because the LessWrong audience is more likely to fall within this demographic? (I used to be very excited about proving arcane theorems in algebraic number theory, and fully agree with you.)

Comment author: robertzk 10 March 2015 02:54:09AM *  3 points [-]

The thing that eventually leapt out when comparing the two behaviours is that behaviour 2 is far more informative about what the restriction was, than behaviour 1 was.

It sounds to me like the agent overfit to the restriction R. I wonder if you can draw some parallels to the Vapnik-style classical problem of empirical risk minimization, where you are not merely fitting your behavior to the training set, but instead achieve the optimal trade-off between generalization ability and adherence to R.

In your example, an agent that inferred the boundaries of our restriction could generate a family of restrictions R_i that derive from slightly modifying its postulates. For example, if it knows you check in usually at midnight, it should consider the counterfactual scenario of you usually checking in at 11:59, 11:58, etc. and come up with the union of (R_i = play quietly only around time i), i.e., play quietly the whole time, since this achieves maximum generalization.

Unfortunately, things are complicated by the fact you said "I'll be checking up on you!" instead of "I'll be checking up on you at midnight!" The agent needs to go one step farther than the machine teaching problem and first know how many counterfactual training points it should generate to infer your intention (the R_i's above), and then infer it.

A high-level conjecture is whether human CEV, if it can be modeled as a region within some natural high-dimensional real-valued space (e.g., R^n for high n where each dimension is a utility function?), admits minimal or near minimal curvature as a Riemannian manifold assuming we could populate the space with the maximum available set of training data as mined from all human literature.

A positive answer to the above question would be philosophically satisfying as it would imply a potential AI would not have to set up corner cases and thus have the appearance of overfitting to the restrictions.

EDIT: Framed in this way, could we use cross-validation on the above mentioned training set to test our CEV region?

Comment author: robertzk 10 March 2015 02:58:21AM 3 points [-]

Incidentally, for a community whose most important goal is solving a math problem, why is there no MathJax or other built-in Latex support?

Comment author: robertzk 10 March 2015 02:54:09AM *  3 points [-]

The thing that eventually leapt out when comparing the two behaviours is that behaviour 2 is far more informative about what the restriction was, than behaviour 1 was.

It sounds to me like the agent overfit to the restriction R. I wonder if you can draw some parallels to the Vapnik-style classical problem of empirical risk minimization, where you are not merely fitting your behavior to the training set, but instead achieve the optimal trade-off between generalization ability and adherence to R.

In your example, an agent that inferred the boundaries of our restriction could generate a family of restrictions R_i that derive from slightly modifying its postulates. For example, if it knows you check in usually at midnight, it should consider the counterfactual scenario of you usually checking in at 11:59, 11:58, etc. and come up with the union of (R_i = play quietly only around time i), i.e., play quietly the whole time, since this achieves maximum generalization.

Unfortunately, things are complicated by the fact you said "I'll be checking up on you!" instead of "I'll be checking up on you at midnight!" The agent needs to go one step farther than the machine teaching problem and first know how many counterfactual training points it should generate to infer your intention (the R_i's above), and then infer it.

A high-level conjecture is whether human CEV, if it can be modeled as a region within some natural high-dimensional real-valued space (e.g., R^n for high n where each dimension is a utility function?), admits minimal or near minimal curvature as a Riemannian manifold assuming we could populate the space with the maximum available set of training data as mined from all human literature.

A positive answer to the above question would be philosophically satisfying as it would imply a potential AI would not have to set up corner cases and thus have the appearance of overfitting to the restrictions.

EDIT: Framed in this way, could we use cross-validation on the above mentioned training set to test our CEV region?

Comment author: V_V 06 March 2015 04:17:09PM *  2 points [-]

AIs deviate from their intended programming, in ways that are dangerous for humans. And it's not thousands of years away, it's away just as much as a self-driving car crashing into a group of people to avoid a dog crossing the street.

But that's a very different kind of issue than AI taking over the world and killing or enslaving all humans.

EDIT:

To expand: all technologies introduce safety issues.
Once we got fire some people got burnt. This doesn't imply that UFFire (Unfriendly Fire) is the most pressing existential risk for humanity and we must devote huge amount of resources to prevent it and never use fire until we have proved that it will not turn "unfriendly".

Comment author: robertzk 07 March 2015 06:12:05AM *  0 points [-]

However, UFFire does not uncontrollably exponentially reproduce or improve its functioning. Certainly a conflagration on a planet covered entirely by dry forest would be an unmitigatable problem rather quickly.

In fact, in such a scenario, we should dedicate a huge amount of resources to prevent it and never use fire until we have proved it will not turn "unfriendly".