Stuart_Armstrong comments on Siren worlds and the perils of over-optimised search - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (411)
First question: how on Earth would we go about conducting a search through possible future universes, anyway? This thought experiment still feels too abstract to make my intuitions go click, in much the same way that Christiano's original write-up of Indirect Normativity did. You simply can't actually simulate or "acausally peek at" whole universes at a time, or even Earth-volumes in such. We don't have the compute-power, and I don't understand how I'm supposed to be seduced by a siren that can't sing to me.
It seems to me that the greater danger is that a UFAI would simply market itself as an FAI as an instrumental goal and use various "siren and marketing" tactics to manipulate us into cleanly, quietly accepting our own extinction -- because it could just be cheaper to manipulate people than to fight them, when you're not yet capable of making grey goo but still want to kill all humans.
And if we want to talk about complex nasty dangers, it's probably going to just be people jumping for the first thing that looks eutopian, in the process chucking out some of their value-set. People do that a lot, see: every single so-called "utopian" movement ever invented.
EDIT: Also, I think it makes a good bit of sense to talk about "IC-maximizing" or "marketing worlds" using the plainer machine-learning terminology: overfitting. Overfitting is also a model of what happens when an attempted reinforcement learner or value-learner over non-solipsistic utility functions wireheads itself: the learner has come up with a hypothesis that matches the current data-set exactly (for instance, "pushing my own reward button is Good") while diverging completely from the target function (human eutopia).
Avoiding overfitting is one very good reason why it's better to build an FAI by knowing an epistemic procedure that leads to the target function rather than just filtering a large hypothesis space for what looks good.
Two main reasons for this: first, there is Christiano's original write-up, which has this problem. Second, we may be in a situation where we ask an AI to simulate the consequences of its choice, have a glance at it, and then approve/disapprove. That's less a search problem, and more the original siren world problem, and we should be aware of the problem.
This sounds extremely counterintuitive. If I have an Oracle AI that I can trust to answer more-or-less verbal requests (defined as: any request or "program specification" too vague for me to actually formalize), why have I not simply asked it to learn, from a large corpus of cultural artifacts, the Idea of the Good, and then explain to me what it has learned (again, verbally)? If I cannot trust the Oracle AI, dear God, why am I having it explore potential eutopian future worlds for me?
Because I haven't read Less Wrong? ^_^
This is another argument against using constrained but non-friendly AI to do stuff for us...