I thought the example was pretty terrible.
Glad to see you're doing well, Benja :)
Sorry for being curmudgeonly there -- I did afterwards wish that I had tempered that. The thing is that when you write something like
I also agree that the idea of "logical uncertainty" is very interesting. I spend much of my time as a grad student working on problems that could be construed as versions of logical uncertainty.
that sounds to me like you're painting MIRI as working on these topics just because it's fun, and supporting its work by arguments that are obviously naive to someone who knows the field, and that you're supporting this by arguments that miss the point of what MIRI is trying to say. That's why I found the example of program analysis so annoying -- people who think that the halting problem means that program analysis is impossible really are misinformed (actually Rice's theorem, really, but someone with this misconception wouldn't be aware of that), both about the state of the field and about why these theorems say what they say. E.g., yes, of course your condition is undecidable as long as there is any choice f(s) of chooseAction2(s) that satisfies it; proof: let chooseAction2(s) be the program that checks whether chooseAction2(s) satisfies your criterion, and if yes return chooseAction(s), otherwise f(s). That's how these proofs always go, and of course that doesn't mean that there are no programs that are able to verify the condition for an interesting subclass of chooseAction2's; the obvious interesting example is searching for a proof of the condition in ZFC, and the obvious boring example is that there is a giant look-up table which decides the condition for all choices of chooseAction2(s) of length less than L.
One possibility is that MIRI's arguments actually do look that terrible to you, but that this is because MIRI hasn't managed to make them clearly enough. I'm thinking this may be the case because you write:
In addition, there are more general proof strategies than the above if that one does not satisfy you. For instance, I could just require that any proposed modification to chooseAction2 come paired with a proof that that modification will be safe. Now I agree that there exist choices of chooseAction2 that are safe but not provably safe and this strategy disallows all such modifications. But that doesn't seem so restrictive to me.
First, that's precisely the "obvious" strategy that's the starting point for MIRI's work.
Second, yes, Eliezer's arguing that this isn't good enough, but the reason isn't that it there are some safe modifications which aren't provably safe. The work around the Löbian obstacle has nothing to do with trying to work around undecidability. (I will admit that for a short period at the April workshop I thought this might be a good research direction, because I had my intuitions shaken by the existence of Paul's system and got overly optimistic, but Paul quickly convinced me that this was an unfounded hope, and in any case the main work around the Löbian obstacle was never really related to this.) MIRI's argument definitely isn't that "the above algorithm can't decide for all chooseAction2 whether they're safe, therefore it probably can't decide it for the kind of chooseAction2 we're interested in, therefore it's unacceptable". If that's how you've understood the argument, then I see why you would think that the program analysis example is relevant. (The argument is indeed that it seems to be unable to decide safety for the chooseAction2 we're interested in, but not because it's unable to decide this for any generic chooseAction2.)
Third, you seem to imply that your proposal will only take safe actions. You haven't given an argument for why we should think so, but the implication seems clear: You're using a chooseAction that is obviously safe as long as it doesn't rewrite itself, and it will only accept a proposed modification if it comes with a proof that it is safe, so if it does choose to rewrite itself then its successor will in fact be safe as well. Now I think this is fine reasoning, but you don't seem to agree:
Finally, I agree that such a naieve proof strategy as "doing the following trivial self-modification is safe because the modified me will only do things that it proves won't destroy the world, thus it won't destroy the world" does not work. I'm not proposing that. The proof system clearly has to do some actual work.
You seem to be quite willing to use that reasoning yourself to show that the initial AI is safe, but you don't think the AI should be able to use the same sort of reasoning. Eliezer's argument is that this is in fact reasoning you want to use when building a self-improving AI: Yes, you can reason in more detail about how the AI you are building works, but this AI_1 will build an AI_2 and so on, and when proving that it's ok to build AI_1 you don't want to reason in detail about how AI_1,000,000 is going to work (which is built using design principles you don't understand, by AI_999,999 which is much smarter than you); rather, you want to use general principles to reason that the because of the way AI_1,000,000 came to be, it has to be safe (because AI_999,999 only builds safe AIs, because it was built by AI_999,998 which only builds safe AIs...). But not only you need to reason like that, because you don't know and aren't smart enough to comprehend AI_1,000,000's exact design; AI_1, which also isn't that smart, will need to be able to use the same sort of reasoning. Hence, the interest in the Löbian obstacle.
There are caveats to add to this and parts of your comment I haven't replied to, but I'm running into the same problem as you with your original comment in this thread, having already spent too much time on this. I'd be happy to elaborate if useful. For my part, I'd be interested in your reply to the other part of my comment: do you think I have localized our disagreement correctly?
Oh, one last point that I shouldn't skip over: I assume the point about MIRI lacking "an understanding of the surrounding literature" refers to the thing about being tripped up at the July workshop by not knowing Gaifman's work on logical uncertainty well enough. If so, I agree that that was an avoidable fail, but I don't think it's indicative of always ignoring the relevant literature or something like that. I'll also admit that I still haven't myself more with Gaifman's work, but that's because I'm not currently focusing on logical uncertainty, and I intend to do so in the future.
Thought experiment:
We have an AI which controls a robotic arm inside a box. The box also contains various cooking tools and ingredients to make cake, and a big red button that kills a puppy.
We prefer to cake to no cake, and we prefer our cake to be delicious and moist cake, but above all we prefer the companion cu-... puppy to stay alive.
Therefore, we implement in the AI a "puppy safety module" (PSM) which vetoes any course of action proposed by the planning module if it determines that there is any non-negligible probability of the red button ...
Previously: Why Neglect Big Topics.
Why was there no serious philosophical discussion of normative uncertainty until 1989, given that all the necessary ideas and tools were present at the time of Jeremy Bentham?
Why did no professional philosopher analyze I.J. Good’s important “intelligence explosion” thesis (from 19591) until 2010?
Why was reflectively consistent probabilistic metamathematics not described until 2013, given that the ideas it builds on go back at least to the 1940s?
Why did it take until 2003 for professional philosophers to begin updating causal decision theory for the age of causal Bayes nets, and until 2013 to formulate a reliabilist metatheory of rationality?
By analogy to financial market efficiency, I like to say that “theoretical discovery is fairly inefficient.” That is: there are often large, unnecessary delays in theoretical discovery.
This shouldn’t surprise us. For one thing, there aren’t necessarily large personal rewards for making theoretical progress. But it does mean that those who do care about certain kinds of theoretical progress shouldn’t necessarily think that progress will be hard. There is often low-hanging fruit to be plucked by investigators who know where to look.
Where should we look for low-hanging fruit? I’d guess that theoretical progress may be relatively easy where:
These guesses make sense of the abundant low-hanging fruit in much of MIRI’s theoretical research, with the glaring exception of decision theory. Our September decision theory workshop revealed plenty of low-hanging fruit, but why should that be? Decision theory is widely applied in multi-agent systems, and in philosophy it’s clear that visible progress in decision theory is one way to “make a name” for oneself and advance one’s career. Tons of quality-adjusted researcher hours have been devoted to the problem. Yes, new theoretical advances (e.g. causal Bayes nets and program equilibrium) open up promising new angles of attack, but they don’t seem necessary to much of the low-hanging fruit discovered thus far. And progress in decision theory is definitely not valuable only to those with unusual views. What gives?
Anyway, three questions:
1 Good (1959) is the earliest statement of the intelligence explosion: “Once a machine is designed that is good enough… it can be put to work designing an even better machine. At this point an ”explosion“ will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.” The term itself, “intelligence explosion,” originates with Good (1965). Technically, artist and philosopher Stefan Themerson wrote a "philosophical analysis" of Good's intelligence explosion thesis called Special Branch, published in 1972, but by "philosophical analysis" I have in mind a more analytic, argumentative kind of philosophical analysis than is found in Themerson's literary Special Branch. ↩