RomeoStevens comments on Introducing Corrigibility (an FAI research subfield) - LessWrong

29 Post author: So8res 20 October 2014 09:09PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (28)

You are viewing a single comment's thread.

Comment author: RomeoStevens 20 October 2014 11:29:57PM 4 points [-]

is a "corrigibility module" a plausible safeguard against some (significant) classes of UfAIs?

Comment author: jimrandomh 20 October 2014 11:39:38PM 6 points [-]

Yes. It's not a sure-fire safeguard and it doesn't work against all UFAIs, but if done correctly, you can think of corrigibility as granting a saving throw. But note that while this paper is a huge step forward, "how to do corrigibility correctly" is not nearly a solved problem yet.

(Corrigibility was a topic at the second MIRIx Boston workshop, and we have results that build on this paper which we are working on writing up.)

Comment author: Eliezer_Yudkowsky 22 October 2014 09:35:12PM 5 points [-]

No, at least not anything like the corrigibility we're currently considering. Everything we've written about so far relies on having the ability to specify the utility function in detail, the utility function being reflectively stable, the utility function being able to contain references to external objects like 'the shutdown button' with the corresponding problems of adapting to new ontologies as the surrounding system shifts representations (see the notion of an 'ontological crisis'), etcetera. It's a precaution for a Friendly AI in the process of being built; you couldn't tack it onto super-Eurisko.