All of spacyfilmer's Comments + Replies

One alignment idea I have had that I haven't seen proposed/refuted is to have an AI which tries to compromise by satisfying over a range of interpretations of a vague goal, instead of trying to get an AI to fulfill a specific goal.  This sounds dangerous and unaligned, and it indeed would not produce an optimal, CEV-fulfilling scenario, but seems to me like it may create scenarios in which at least some people are alive and are maybe even living in somewhat utopic conditions.  I explain why below.


In many AI doom scenarios the AI intentionally pic... (read more)

1Rachel Freedman
In reward learning research, it’s common to represent the AI’s estimate of the true reward function as a distribution over possible reward functions, which I think is analogous to what you are describing. It’s also common to define optimal behavior, given a distribution over reward functions, as that behavior which maximizes the expected reward under that distribution. This is mathematically equivalent to optimizing a single reward function equal to the expectation of the distribution. So, this helps in that the AI is optimizing a reward function that is more likely to be “aligned” than one at an extreme end of the distribution. However, this doesn’t help with the problems of optimizing a single fixed reward function.