Quoting Eliezer from the interview:
That is an informal argument that most decision systems with coherent utility functions automatically preserve their utility function under self-modification if they are able to do so. If I could prove it formally I would know a great deal more than I do right now.
I'm having trouble understanding this passage. If you could prove what formally? That most decision systems with coherent utility functions automatically preserve their utility function under self-modification if they are able to do so? But why is that interesting?
Or prove that some particular decision system you're planning to implement would preserve its utility function under self-modification? But you wouldn't necessarily want it to do that. For example, suppose Omega appears to the FAI and says that if you (the FAI) change your utility function to be a paperclip maximizer, it would give you a whole bunch of utils under your original utility function (that you otherwise wouldn't be able to obtain), then the FAI should do so, right?
So what is Eliezer talking about here?
He likely means a formal statement of the claim about decision systems that would take the form something like "Under the following formal definition of a decision system, as long as the following pathological/stupid conditions don't hold, a decision system will not seek to modify its goals." There are a fair number of mathematical theorems that have forms close to this where we can prove something for some large set of things but there are edge cases where we can't. That's the sort of thing Eliezer is talking about here (although we don't even have a really satisfactory definition of decision system at this point so what Eliezer wants is very optimistic here.)
Simplified Humanism, Positive Futurism & How to Prevent the Universe From Being Turned Into Paper Clips
Michael Anissimov recently did an interview with Eliezer for h+ magazine. It covers material basic to those familiar with the Less Wrong rationality sequences but is worth reading.
The list of questions:
1. Hi Eliezer. What do you do at the Singularity Institute?
2. What are you going to talk about this time at Singularity Summit?
3. Some people consider “rationality” to be an uptight and boring intellectual quality to have, indicative of a lack of spontaneity, for instance. Does your definition of “rationality” match the common definition, or is it something else? Why should we bother to be rational?
4. In your recent work over the last few years, you’ve chosen to focus on decision theory, which seems to be a substantially different approach than much of the Artificial Intelligence mainstream, which seems to be more interested in machine learning, expert systems, neural nets, Bayes nets, and the like. Why decision theory?
5. What do you mean by Friendly AI?
6. What makes you think it would be possible to program an AI that can self-modify and would still retain its original desires? Why would we even want such an AI?
7. How does your rationality writing relate to your Artificial Intelligence work?
8. The Singularity Institute turned ten years old in June. Has the organization grown in the way you envisioned it would since its founding? Are you happy with where the Institute is today?