Goal completion: noise, errors, bias, prejudice, preference and complexity
A putative new idea for AI control; index here.
This is a preliminary look at how an AI might assess and deal with various types of errors and uncertainties, when estimating true human preferences. I'll be using the circular rocket model to illustrate how these might be distinguished by an AI. Recall that the rocket can accelerate by -2, -1, 0, 1, and 2, and the human wishes to reach the space station (at point 0 with velocity 0) and avoid accelerations of ±2. In the forthcoming, there will generally be some noise, so to make the whole thing more flexible, assume that the space station is a bit bigger than usual, covering five squares. So "docking" at the space station means reaching {-2,-1,0,1,2} with 0 velocity.

= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)