No, at least not anything like the corrigibility we're currently considering. Everything we've written about so far relies on having the ability to specify the utility function in detail, the utility function being reflectively stable, the utility function being able to contain references to external objects like 'the shutdown button' with the corresponding problems of adapting to new ontologies as the surrounding system shifts representations (see the notion of an 'ontological crisis'), etcetera. It's a precaution for a Friendly AI in the process of being built; you couldn't tack it onto super-Eurisko.
Benja, Eliezer, and I have published a new technical report, in collaboration with Stuart Armstrong of the Future of Humanity institute. This paper introduces Corrigibility, a subfield of Friendly AI research. The abstract is reproduced below:
We're excited to publish a paper on corrigibility, as it promises to be an important part of the FAI problem. This is true even without making strong assumptions about the possibility of an intelligence explosion. Here's an excerpt from the introduction:
(See the paper for references.)
This paper includes a description of Stuart Armstrong's utility indifference technique previously discussed on LessWrong, and a discussion of some potential concerns. Many open questions remain even in our small toy scenario, and many more stand between us and a formal description of what it even means for a system to exhibit corrigible behavior.