gjm comments on Stupid Questions June 2015 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (195)
Suppose we are considering an agent with a more "positive" mission than that of your pest control drone (whose purpose is best expressed negatively: get rid of small pests). For instance, perhaps the agent is working for a hedge fund and trying to increase the value of its holdings, or perhaps it's trying to improve human health and give people longer healthier lives.
How do you express that in terms of "disutility"?
I think what is doing the work here is not using "disutility" rather than "utility", but having a utility function that's (something like) bounded above and that can't be driven sky-high by (what we would regard as) weird and counterproductive actions. (So, for the "positive" agents above, rather than forcing what they do into a "disutility" framework, one could give the hedge fund machine a utility function that stops increasing after the value of the fund reaches $100bn, and the health machine a utility function that stops increasing after 95% of people are getting 70QALYs or more, or something like that.) And then some counterbalancing, not artificially bounded, negative term ("number of humans harmed" in your example; maybe more generally some measure of "amount of change" would do, though I suspect that would be hard to express rigorously) should ensure that the machine never has reason to do anything too drastic.
So: yeah, I think this is far from crazy, but I don't think it's going to solve the Friendly AI problem, for a few reasons:
I agree on all points. It seems "bounded utility" might be a better term than "disutility". The main point is that a halting condition triggered by success, and a system that is essentially trying to find the conditions where it can shut itself off, seems less likely to go horribly wrong than an unbounded search for ever more utility.
This is not an attempt to solve Friendly AI. I just figure a simple hard-coded limit to how much of anything a learning machine could want chops off a couple of avenues for disaster.