Rafael Cosman

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

What is wrong with this approach to corrigibility?

Really appreciate all the thoughtful and substantive comments!! Thanks very much, honestly was exactly what I was hoping for from posting.

Reply

What is wrong with this approach to corrigibility?

Rafael Cosman2y30

If implemented as described, the AI should be exactly indifferent to pushing the button? I guess the AI’s behavior in that situation is not well defined… and if we make the button give expected value minus epsilon reward, then the AI might kill you to stop you from pressing the button (because it wants that epsilon reward!)

So overall I suppose this is a fair criticism of the approach and is possibly what Paul means by issues with precisely balancing!

Reply