Steelmanning MIRI critics

fowlertm

I'm giving a talk to the Boulder Future Salon in Boulder, Colorado in a few weeks on the Intelligence Explosion hypothesis. I've given it once before in Korea but I think the crowd I'm addressing will be more savvy than the last one (many of them have met Eliezer personally). It could end up being important, so I was wondering if anyone considers themselves especially capable of playing Devil's Advocate so I could shape up a bit before my talk? I'd like there to be no real surprises.

I'd be up for just messaging back and forth or skyping, whatever is convenient.

For one thing, you'd have to explicitly come up with the utility function before you can prove the AI follows it.

You can either make an AI that will proveably do what you mean, or make one that will hopefully figure out what you meant when you said "do what I mean," and do that.

When I picture what a proven-Friendly AI looks like, I think of something where it's goals are 1)Using a sample of simulated humans, generalize to unpack 'do what I mean' followed by 2)Make satisfying that your utility function.

Proving those two steps each rigorously would produce a proven-Friendly AI without an explicit utility function. Proving step 1 to be safe would obviously be very difficult; proving step 2 to be safe would probably be comparatively easy. Both, however, are plausibly rigorously provable.

8

Steelmanning MIRI critics

8

8

8

Steelmanning MIRI critics

8

8