I thought the example was pretty terrible.
Glad to see you're doing well, Benja :)
Here's a concrete way you could try to get stable self-modification:
Suppose for concreteness that we have a C program, call it X, and that within the C program there is an array called "world_state" of length M and a floating point number called "utility". A simple instantiation of X would look something like:
while(true){ action = chooseAction(worldState); world_state = propgateWorldState(worldState, action); utility = calculateUtility(worldState); }
We would like to consider modifications to X where we replace chooseAction with some new method chooseAction2 to get a program X2. Suppose we want to ensure some condition such as: from the current world state, if we use X2 instead of X, then after some finite period of time the sequence of utilities we get from using chooseAction2 will always be larger than the corresponding sequence if we have used chooseAction. Abusing notation a bit, this is the same as verifying the statement:
"there exists N such that for all n > N, utility2(n) > utility(n)"
[although note that utility2 and utility have fairly complicated descriptions if you actually try to write them out]. Now I agree that reasoning about this for arbitrary choices of chooseAction and chooseAction2 will be quite difficult (probably undecidable although I haven't proved that). But the key point is that I get to choose chooseAction2, and there are many decision procedures that can prove such a statement in special cases. For instance, I could partition the space of world states into finitely many pieces, write down a transition function that over-approximates the possible transitions (for instance, by having a transition from Piece1 to Piece2 if any element of Piece1 can transition to any element of Piece2). Then I only need to reason about finite automata and those are trivially decidable.
You could argue that this proof system is fairly weak, but again, the AI gets to tailor its choices of chooseAction2 to be easy to reason about. You could also argue that an AI that could only create instances of chooseAction2 that could be reasoned about as above would be severely hampered, but I think this overlooks the fact that such techniques have been extraordinarily successful at reasoning about fairly complex systems.
In addition, there are more general proof strategies than the above if that one does not satisfy you. For instance, I could just require that any proposed modification to chooseAction2 come paired with a proof that that modification will be safe. Now I agree that there exist choices of chooseAction2 that are safe but not provably safe and this strategy disallows all such modifications. But that doesn't seem so restrictive to me.
Finally, I agree that such a naieve proof strategy as "doing the following trivial self-modification is safe because the modified me will only do things that it proves won't destroy the world, thus it won't destroy the world" does not work. I'm not proposing that. The proof system clearly has to do some actual work. But your argument seems to me to be saying "this extremely naieve strategy doesn't work, let's develop an entire subfield of logic to try to make it work" whereas I advocate "this extremely naieve strategy doesn't work, let's use a non-naieve approach instead, and look at the relevant literature on how to solve similar problems".
I also agree that the idea of "logical uncertainty" is very interesting. I spend much of my time as a grad student working on problems that could be construed as versions of logical uncertainty. But it seems like a mistake to me to tackle such problems without at least one, and preferably both, of: an understanding of the surrounding literature; experience with concrete instantiations of the problems at hand.
I thought the example was pretty terrible.
Glad to see you're doing well, Benja :)
Sorry for being curmudgeonly there -- I did afterwards wish that I had tempered that. The thing is that when you write something like
I also agree that the idea of "logical uncertainty" is very interesting. I spend much of my time as a grad student working on problems that could be construed as versions of logical uncertainty.
that sounds to me like you're painting MIRI as working on these topics just because it's fun, and supporting its work by arguments tha...
Previously: Why Neglect Big Topics.
Why was there no serious philosophical discussion of normative uncertainty until 1989, given that all the necessary ideas and tools were present at the time of Jeremy Bentham?
Why did no professional philosopher analyze I.J. Good’s important “intelligence explosion” thesis (from 19591) until 2010?
Why was reflectively consistent probabilistic metamathematics not described until 2013, given that the ideas it builds on go back at least to the 1940s?
Why did it take until 2003 for professional philosophers to begin updating causal decision theory for the age of causal Bayes nets, and until 2013 to formulate a reliabilist metatheory of rationality?
By analogy to financial market efficiency, I like to say that “theoretical discovery is fairly inefficient.” That is: there are often large, unnecessary delays in theoretical discovery.
This shouldn’t surprise us. For one thing, there aren’t necessarily large personal rewards for making theoretical progress. But it does mean that those who do care about certain kinds of theoretical progress shouldn’t necessarily think that progress will be hard. There is often low-hanging fruit to be plucked by investigators who know where to look.
Where should we look for low-hanging fruit? I’d guess that theoretical progress may be relatively easy where:
These guesses make sense of the abundant low-hanging fruit in much of MIRI’s theoretical research, with the glaring exception of decision theory. Our September decision theory workshop revealed plenty of low-hanging fruit, but why should that be? Decision theory is widely applied in multi-agent systems, and in philosophy it’s clear that visible progress in decision theory is one way to “make a name” for oneself and advance one’s career. Tons of quality-adjusted researcher hours have been devoted to the problem. Yes, new theoretical advances (e.g. causal Bayes nets and program equilibrium) open up promising new angles of attack, but they don’t seem necessary to much of the low-hanging fruit discovered thus far. And progress in decision theory is definitely not valuable only to those with unusual views. What gives?
Anyway, three questions:
1 Good (1959) is the earliest statement of the intelligence explosion: “Once a machine is designed that is good enough… it can be put to work designing an even better machine. At this point an ”explosion“ will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.” The term itself, “intelligence explosion,” originates with Good (1965). Technically, artist and philosopher Stefan Themerson wrote a "philosophical analysis" of Good's intelligence explosion thesis called Special Branch, published in 1972, but by "philosophical analysis" I have in mind a more analytic, argumentative kind of philosophical analysis than is found in Themerson's literary Special Branch. ↩