earthwormchuck163 comments on Superintelligent AGI in a box - a question. - Less Wrong

14 Post author: Dmytry 23 February 2012 06:48PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (77)

You are viewing a single comment's thread. Show more comments above.

Comment author: jacobt 24 February 2012 10:55:03PM *  5 points [-]

If you only want the AI to solve things like optimization problems, why would you give it a utility function? I can see a design for a self-improving optimization problem solver that is completely safe because it doesn't operate using utility functions:

  1. Have a bunch of sample optimization problems.
  2. Have some code that, given an optimization problem (stated in some standardized format), finds a good solution. This can be seeded by a human-created program.
  3. When considering an improvement to program (2), allow the improvement if it makes it do better on average on the sample optimization problems without being significantly more complex (to prevent overfitting). That is, the fitness function would be something like (average performance - k * bits of optimizer program).
  4. Run (2) to optimize its own code using criterion (3). This can be done concurrently with human improvements to (2), also using criterion (3).

This would produce a self-improving AGI that would do quite well on sample optimization problems and new, unobserved optimization problems. I don't see much danger in this setup because the program would have no reason to create malicious output. Creating malicious output would just increase complexity without increasing performance on the training set, so it would not be allowed under criterion (3), and I don't see why the optimizer would produce code that creates malicious output.

EDIT: after some discussion, I've decided to add some notes:

  1. This only works for verifiable (e.g. NP) problems. These problems include general induction, writing programs to specifications, math proofs, etc. This should be sufficient for the problems mentioned in the original post.
  2. Don't just plug a possibly unfriendly AI into the seed for (2). Instead, have a group of programmers write program (2) in order to do well on the training problems. This can be crowd-sourced because any improvement can be evaluated using program (3). Any improvements the system makes to itself should be safe.

I claim that if the AI is created this way, it will be safe and do very well on verifiable optimization problems. So if this thing works I've solved friendly AI for verifiable problems.

Comment author: earthwormchuck163 26 February 2012 12:14:46AM 0 points [-]

At best, this will produce cleverly efficient solutions to your sample problems.