jimrandomh comments on New(ish) AI control ideas - Less Wrong

24 Post author: Stuart_Armstrong 05 March 2015 05:03PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (14)

You are viewing a single comment's thread.

Comment author: jimrandomh 05 March 2015 06:27:34PM 4 points [-]

The main problem with making a satisficer, as I currently see it, is that I don't actually know how to define utility functions in a way that's amenable to well-defined satisficing. The goal is, in effect, to define goals in a way that limit the total resources the AI will consume in service to the goal, without having to formally define what a resource is. This seems to work pretty well with values defined in terms of the existence of things - a satisficer that just wants to make 100 paperclips and has access to a factory, will probably do so in an unsurprising way. But it doesn't work so well with non-existence type values; a satisficer that wants to ensure there are no more than 100 cases of malaria, might turn all matter into space exploration probes to hunt down cases on other planets. A satisficer that wants to reduce the number of cases of malaria by at least 100 might work, but then your utility function is defined in terms of a tricky counterfactual. For example, if it's "100 fewer cases than if I hadn't been turned on", then... what if it not having been turned on, would have lead to a minimizer being created instead? Then you're back to converting all atoms into space probes.