Steelmanning MIRI critics

fowlertm

I'm giving a talk to the Boulder Future Salon in Boulder, Colorado in a few weeks on the Intelligence Explosion hypothesis. I've given it once before in Korea but I think the crowd I'm addressing will be more savvy than the last one (many of them have met Eliezer personally). It could end up being important, so I was wondering if anyone considers themselves especially capable of playing Devil's Advocate so I could shape up a bit before my talk? I'd like there to be no real surprises.

I'd be up for just messaging back and forth or skyping, whatever is convenient.

You admit that friendliness is not guaranteed. That means that you're not wrong, which is a good sign, but it doesn't fix the problem that friendliness isn't guaranteed. You have as many tries as you want for intelligence, but only one for friendliness. How do you expect to manage it in the first try?

It also doesn't seem to be clear to me that this is the best strategy. In order to get that provably friendly thing to work, you have to deal with an explicit, unchanging utility function, which means that friendliness has to be right from the beginning. If you deal with an implicit utility function that will change as the AI comes to understand itself better, you could program an AI to recognise pictures of smiles, then let it learn that the smiles correspond to happy humans and update its utility function accordingly, until it (hopefully) decides on "do what we mean".

It seems to me that part of the friendliness proof would require proving that the AI will follow its explicit utility function. This would be impossible. The AI is not capable of perfect solomonoff induction, and will alway have some bias, no matter how small. This means that its implicit utility function will never quite match its explicit utility function. Am I missing something here?

Those points were excellent, and it is no credit to LW that the comment was on negative karma when I encountered it.

No, the approach based on proveable correctness isn't a 100% guarantee, and, since it involves an unupdateable UF, and has the additional disadvantage that if you don't get the UF right first time, you can't tweak it.

The alternative family of approaches, based on flexibility, training and acculturation have often been put forward by MIRIs critics....and MIRI has never been quantiified why the one approach is better than the other.

3VAuroch12y

I think this is incorrect. If it isn't, it at least requires some proof.

4lukeprog12y

Typo? [...] Again, I think "provably friendly thing" mischaracterizes what MIRI thinks will be possible. I'm not sure exactly what you're saying in the rest of your comment. Have you read the section on indirect normativity in Superintelligence? I'd start there.

8

Steelmanning MIRI critics

8

8

8

Steelmanning MIRI critics

8

8