Steelmanning MIRI critics

fowlertm

This paragraph is a simplification rather than the whole story, but: Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control. Methods like testing and probabilistic failure analysis require more knowledge of the target system than we now have for AGI.

And we do try to be clear about the role that proof plays in our research. E.g. see the tiling agents LW post:

The paper uses first-order logic (FOL) because FOL has a lot of useful standard machinery for reflection which we can then invoke; in real life, FOL is of course a poor representational fit to most real-world environments outside a human-constructed computer chip with thermodynamically expensive crisp variable states.

As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip. This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).

And later, in an Eliezer comment:

Reply to: "My previous understanding had been that MIRI staff think that by default, one should expect to need to solve the Lob problem in order to build a Friendly AI."

By default, if you can build a Friendly AI you were not troubled by the Lob problem. That working on the Lob Problem gets you closer to being able to build FAI is neither obvious nor certain (perhaps it is shallow to work on directly, and those who can build AI resolve it as a side effect of doing something else) but everything has to start somewhere. Being able to state crisp difficulties to work on is itself rare and valuable, and the more you engage with a problem like stable self-modification, the more you end up knowing about it. Engagement in a form where you can figure out whether or not your proof goes through is more valuable than engagement in the form of pure verbal arguments and intuition, although the latter is significantly more valuable than not thinking about something at all.

My guess is that people hear the words "proof" and "Friendliness" in the same sentence but (quite understandably!) don't take time to read the actual papers, and end up with the impression that MIRI is working on "provably Friendly AI" even though, as far as I can tell, we've never claimed that.

8

Steelmanning MIRI critics

8

8

8

Steelmanning MIRI critics

8

8