I vastly disagree. I will just state it for now, and hopefully this will be a commitment to explain it further when I have the time. Here are my unjustified assertions about the nature of philosophy regarding OP's topics:
Philosophy has the most huge search space known to man, it encompasses everything (a) without a good clear-cut solution and (b) which has any hope to be solved (this rules out two extremes: science and religion).
Philosophy, by its very nature, has few systematized methods for efficient search. Seems like we discovered logical and clear thinking recently, but that's almost about it.
Because it is so difficult, philosophy is wrong 99,9% of the time.
When philosophy is right, major breakthroughs are made, sciences are created, new reasoning tools, higher moral standards and so on.
There's a massive and astronomical hindsight bias. Once solved a problem is no longer on the realm of philosophy and the solution tend to seem extremely obvious after 1 or 2 generations.
Thus, low hanging fruits in philosophy are nowhere to be found. Most of your examples were already found, they just need to be worked on. I chalenge you to present a yet unknown low hanging fruit, one all your peers don't know it already, one which would knock Nick's socks off.
I will second this. It's not that the process of theoretical discovery is inefficient due to any fault of its own, it's that the problem is intractable (e.g. we don't know how to do better than exhaustive search). So that linear looking search path from concept A to concept B did not take linear time to find...
Luke,
I think you are mistaken about the relative efficiency / inefficiency of scientific research. I believe that research is comparably efficient to much of industry, and that many of the things that look like inefficiencies are actually trading off small local gains for large global gains. I've come to this conclusion as the result of years of doing scientific research, where almost every promising idea I've come up with (including some that I thought quite clever) had already been explored by someone else. In fact, the typical case for when I was able to make progress was when solving the problem required a combination of tools, each of which individually was relatively rare in the field.
For instance, my paper on stochastic verification required: (i) familiarity of sum-of-squares programming; (ii) the application of supermartingale techniques from statistics; and (iii) the ability to produce relatively non-trivial convex relaxations of a difficult optimization problem. In robotics, most people are familiar with convex optimization, and at least some are familiar with sum-of-squares programming and supermartingales. In fact, at least one other person had already published a fairl...
I agree that Luke here overstates the significance of my result, but I do think you miss the point a bit or are uncharitable. Regardless of whether making predictions about your own behavior is fundamentally difficult, we don't yet understand any formal framework that can capture reasoning of the form “my decisions are good because my beliefs correspond to reality.” Assuming there is a natural formal framework capturing human reasoning (I think the record so far suggests optimism) then there is something interesting that we don’t yet understand. It seems like you are applying the argument: “We know that humans can do X, so why do you think that X is an important problem?” The comment about undecidability issues not applying in practice also seems a bit unfair; for programs that do proof search we know that we cannot prove claims of the desired type based on simple Godelian arguments, and almost all interesting frameworks for reasoning are harder to prove things about than a simple proof search. (Of course the game is that we don’t want to prove things about the algorithms in question, we are happy to form justified beliefs about them in whatever way we can, including inductive infe...
Jacob, have you seen Luke's interview with me, where I've tried to reply to some arguments of the sort you've given in this thread and elsewhere?
I don't think [the fact that humans' predictions about themselves and each other often fail] is sufficient to dismiss my example. Whether or not we prove things, we certainly have some way of reasoning at least somewhat reliably about how we and others will behave. It seems important to ask why we expect AI to be fundamentally different; I don't think that drawing a distinction between heuristics and logical proofs is sufficient to do so, since many of the logical obstacles carry over to the heuristic case, and to the extent they don't this seems important and worth grappling with.
Perhaps here is a way to get a handle on where we disagree: Suppose we make a whole-brain emulation of Jacob Steinhardt, and you start modifying yourself in an attempt to achieve superintelligence while preserving your values, so that you can save the world. You try to go through billions of (mostly small) changes. In this process, you use careful but imperfect human (well, eventually transhuman) reasoning to figure out which changes are sufficiently safe to ...
I thought the example was pretty terrible.
Glad to see you're doing well, Benja :)
Sorry for being curmudgeonly there -- I did afterwards wish that I had tempered that. The thing is that when you write something like
I also agree that the idea of "logical uncertainty" is very interesting. I spend much of my time as a grad student working on problems that could be construed as versions of logical uncertainty.
that sounds to me like you're painting MIRI as working on these topics just because it's fun, and supporting its work by arguments that are obviously naive to someone who knows the field, and that you're supporting this by arguments that miss the point of what MIRI is trying to say. That's why I found the example of program analysis so annoying -- people who think that the halting problem means that program analysis is impossible really are misinformed (actually Rice's theorem, really, but someone with this misconception wouldn't be aware of that), both about the state of the field and about why these theorems say what they say. E.g., yes, of course your condition is undecidable as long as there is any choice f(s) of chooseAction2(s) that satisfies it; proof: le...
Regardless of why the opportunity has presented itself, can we hope that the MIRI research team and associated researchers will use (or are using) the fact that "visible progress in decision theory is one way to “make a name” for oneself" and proceed to do so? Seems like pretty low-hanging status-fruit given the team's progress so far.
For MIRI, the hard part is writing up the results in a way that appeals to philosophers. That's a highly specialized skill, and not one we've focused on hiring for (at our current budget). We tried to pay Rachael Briggs $20k to do it, since she had two decision theory papers selected for the Philosopher's Annual, but it was too work-intensive even for her. I think it would drive Eliezer mad to write in that style. I suspect I could do it, but it would take a lot of my time. I might be able to persuade Preston Greene to do it some day. Or maybe Kenny Easwaran, who attended our September 2013 decision theory workshop.
If possible, I'd be curious to hear more details about why Briggs found it too work-intensive. Her giving up on it was definitely not an outcome I would have predicted.
Maybe the difficulties that you face are part of the answer to the question of why theoretical progress doesn't happen faster.
After a paper published in 2011: "[Original draft was available in 2003. Hurrah for academic publishing. One journal reviewed the manuscript for nearly two years before determining that it was too long. No wonder philosophy has not advanced farther in the past 2,500 years.] "
Just wanted to mention that Physics is not immune to this. Bell's theorem requires only a first-year college math skill, yet it took 30 odd years after EPR to formulate it. Not even Einstein himself was able do it. Event horizons and inescapability of singularity required virtually no new math beyond 1917, yet it took some 50 years for the physicists to understand the picture. There are clearly some mental blocks people have which take decades and new generations to overcome.
Charles Fort, Lo!: "If human thought is a growth, like all other growths, its logic is without foundation of its own, and is only the adjusting constructiveness of all other growing things. A tree can not find out, as it were, how to blossom, until comes blossom-time. A social growth cannot find out the use of steam engines, until comes steam-engine-time. For whatever is supposed to be meant by progress, there is no need in human minds for standards of their own: this is in the sense that no part of a growing plant needs guidance of its own devising, ...
Some examples of a different kind of inefficiency, from AntiFragile:
...It struck me how lacking in imagination we are: we had been putting our suitcases on top of a cart with wheels, but nobody thought of putting tiny wheels directly under the suitcase. Can you imagine that it took close to six thousand years between the invention of the wheel (by, we assume, the Mesopotamians) and this brilliant implementation (by some luggage maker in a drab industrial suburb)? And billions of hours spent by travelers like myself schlepping luggage through corridors full
I think conformity effects play a huge role in this area. The large majority of modern philosophers all have similar educational and cultural backgrounds. They go to elite universities. They read the standard Western philosophical canon. They work very hard to publish a lot of papers in prestigious journals. They are friends with other academics and with other high achievers in the "standard" fields like finance, law, and medicine. Their parents were probably academics or from the upper middle class. They have spent most of their lives in a university setting.
If I had to take an honest guess? Theoretical discovery will behave "inefficiently" when it requires a breadth-first (or at least, breadth-focused) search through the idea space before you can find things that "fit together". Only once you have a bunch of things which "fit together" can you look at the shape of the "hole in idea-space" they all border, dive to the bottom of that lake, and bring up an entirely new idea which links them or unifies them.
So:
1) Mostly agreed, as described above.
2) As described above. M...
It strikes me that there's a second set of reasons that read something like:
See: the free will problem.
Normative uncertainty does not seem particularly interesting. You can just use set multiplication over probable futures and evaluations of them, and wind up with a regular problem with certain evaluations. It even agrees with normal methods if you assume a single, certain evaluation.
Now, as a concept it's important - realizing that you might not know what you want is a tremendous source of uncertainty that is easy to overlook. I just don't think that any new concepts or mathematical tools are needed to tackle it.
Do you agree about the relative inefficiency of theoretical discovery?
In presence, yes. In degree, no. Even the efficient market hypothesis presumes (indeed, requires) some delay inefficiencies, and even restricting to fields where economic incentives are very, very high we see some pretty significant delay inefficiencies.
The modern containerization revolution is dependent on certain types of math and metallurgy being available, but the underlying tools were probably available before the first world war (and, indeed, would have been even more useful...
Don't worry, I wasn't offended :)
Good to hear, and thanks for the reassurance :-) And yeah, I do too well know the problem of having too little time to write something polished, and I do certainly prefer having the discussion in fairly raw form to not having it at all.
One possibility is that MIRI's arguments actually do look that terrible to you
What I would say is that the arguments start to look really fishy when one thinks about concrete instantiations of the problem.
I'm not really sure what you mean by a "concrete instantiation". I can think of concrete toy models, of AIs using logical reasoning which know an exact description of their environment as a logical formula, which can't reason in the way I believe is what we want to achieve, because of the Löbian obstacle. I can't write down a self-rewriting AGI living in the real world that runs into the Löbian obstacle, but that's because I can't write down any AGI that lives in the real world.
My reason for thinking that the Löbian obstacle may be relevant is that, as mentioned in the interview, I think that a real-world seed FAI will probably use (something very much like) formal proofs to achieve the high level of confidence it needs in most of its self-modifications. I feel that formally specified toy models + this informal picture of a real-world FAI are as close to thinking about concrete instantiations as I can get at this point.
I may be wrong about this, but it seems to me that when you think about concrete instantiations, you look towards solutions that reason about the precise behavior of the program they're trying to verify -- reasoning like "this variable gets decremented in each iteration of this loop, and when it reaches zero we exit the loop, so we won't loop forever". But heuristically, while it seems possible to reason about the program you're creating in this way, our task is to ensure that we're creating a program which creates a program which creates a program which goes out to learn about the world and look for the most efficient way to use transistors it finds in the external environment to achieve its goals, and we want to verify that those transistors won't decide to blow up the world; it seems clear to me that this is going to require reasoning of the type "the program I'm creating is going to reason correctly about the program it is creating", which is the kind of reasoning that runs into the Löbian obstacle, rather than the kind of reasoning applied by today's automated verification techniques.
Writing this, I'm not too confident that this will be helpful to getting the idea across. Hope the face-to-face with Paul with help, perhaps also with translating your intuitions to a language that better matches the way I think about things.
I think that the point above would be really helpful to clarify, though. This seems to be a recurring theme in my reactions to your comments on MIRI's arguments -- e.g. there was that LW conversation you had with Eliezer where you pointed out that it's possible to verify properties probabilistically in more interesting ways than running a lot of independent trials, and I go, yeah, but how is that going to help with verifying whether the far-future descendant of an AI we build now, when it has entire solar systems of computronium to run on, is going to avoid running simulations which by accident contain suffering sentient beings? It seems that to achieve confidence that this far-future descendant will behave in a sensible way, without unduly restricting the details of how it is going to work, is going to need fairly abstract reasoning, and the sort of tools you point to don't seem to be capable of this or to extend in some obvious way to dealing with this.
You seem to be quite willing to use that reasoning yourself to show that the initial AI is safe
I'm not sure I understand what you're saying here, but I'm not convinced that this is the sort of reasoning I'd use.
I'm fairly sure that the reason your brain goes "it would be safe if we only allow self-modifications when there's a proof that they're safe" is that you believe that if there's a proof that a self-modification is safe, then it is safe -- I think this is probably a communication problem between us rather than you actually wanting to use different reasoning. But again, hopefully the face-to-face with Paul can help with that.
I don't think that "whole brain emulations can safely self-modify" is a good description of our disagreements. I think that this comment (the one you just made) does a better job of it. But I should also add that my real objection is something more like: "The argument in favor of studying Lob's theorem is very abstract and it is fairly unintuitive that human reasoning should run into that obstacle. [...]"
Thanks for the reply! Thing is, I don't think that ordinary human reasoning should run into that obstacle, and the "ordinary" is just to exclude humans reasoning by writing out formal proofs in a fixed proof system and having these proofs checked by a computer. But I don't think that ordinary human reasoning can achieve the level of confidence an FAI needs to achieve in its self-rewrites, and the only way I currently know how an FAI could plausibly reach that confidence is through logical reasoning. I thought that "whole brain emulations can safely self-modify" might describe our disagreement because that would explain why you think that human reasoning not being subject to Löb's theorem would be relevant.
My next best guess is that you think that even though human reasoning can't safely self-modify, its existence suggests that it's likely that there is some form of reasoning which is more like human reasoning than logical reasoning and therefore not subject to Löb's theorem, but which is sufficiently safe for a self-modifying FAI. Request for reply: Would that be right?
I can imagine that that might be the case, but I don't think it's terribly likely. I can more easily imagine that there would be something completely different from both human reasoning or logical reasoning, or something quite similar to normal logical reasoning but not subject to Löb's theorem. But if so, how will we find it? Unless essentially every kind of reasoning except human reasoning can easily be made safe, it doesn't seem likely that AGI research will hit on a safe solution automatically. MIRI's current research seems to me like a relatively promising way of trying to search for a solution that's close to logical reasoning.
When I say "failure to understand the surrounding literature", I am referring more to a common MIRI failure mode of failing to sanity-check their ideas / theories with concrete examples / evidence. I doubt that this comment is the best place to go into that, but perhaps I will make a top-level post about this in the near future.
Ok, I think I probably don't understand this yet, and making a post about it sounds like a good plan!
Sorry for ducking most of the technical points, as I said, I hope that talking to Paul will resolve most of them.
No problem, and hope so as well.
I don't have time to reply to all of this right now, but since you explicitly requested a reply to:
My next best guess is that you think that even though human reasoning can't safely self-modify, its existence suggests that it's likely that there is some form of reasoning which is more like human reasoning than logical reasoning and therefore not subject to Löb's theorem, but which is sufficiently safe for a self-modifying FAI. Request for reply: Would that be right?
The answer is yes, I think this is essentially right although I would probably want to a...
Previously: Why Neglect Big Topics.
Why was there no serious philosophical discussion of normative uncertainty until 1989, given that all the necessary ideas and tools were present at the time of Jeremy Bentham?
Why did no professional philosopher analyze I.J. Good’s important “intelligence explosion” thesis (from 19591) until 2010?
Why was reflectively consistent probabilistic metamathematics not described until 2013, given that the ideas it builds on go back at least to the 1940s?
Why did it take until 2003 for professional philosophers to begin updating causal decision theory for the age of causal Bayes nets, and until 2013 to formulate a reliabilist metatheory of rationality?
By analogy to financial market efficiency, I like to say that “theoretical discovery is fairly inefficient.” That is: there are often large, unnecessary delays in theoretical discovery.
This shouldn’t surprise us. For one thing, there aren’t necessarily large personal rewards for making theoretical progress. But it does mean that those who do care about certain kinds of theoretical progress shouldn’t necessarily think that progress will be hard. There is often low-hanging fruit to be plucked by investigators who know where to look.
Where should we look for low-hanging fruit? I’d guess that theoretical progress may be relatively easy where:
These guesses make sense of the abundant low-hanging fruit in much of MIRI’s theoretical research, with the glaring exception of decision theory. Our September decision theory workshop revealed plenty of low-hanging fruit, but why should that be? Decision theory is widely applied in multi-agent systems, and in philosophy it’s clear that visible progress in decision theory is one way to “make a name” for oneself and advance one’s career. Tons of quality-adjusted researcher hours have been devoted to the problem. Yes, new theoretical advances (e.g. causal Bayes nets and program equilibrium) open up promising new angles of attack, but they don’t seem necessary to much of the low-hanging fruit discovered thus far. And progress in decision theory is definitely not valuable only to those with unusual views. What gives?
Anyway, three questions:
1 Good (1959) is the earliest statement of the intelligence explosion: “Once a machine is designed that is good enough… it can be put to work designing an even better machine. At this point an ”explosion“ will clearly occur; all the problems of science and technology will be handed over to machines and it will no longer be necessary for people to work. Whether this will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.” The term itself, “intelligence explosion,” originates with Good (1965). Technically, artist and philosopher Stefan Themerson wrote a "philosophical analysis" of Good's intelligence explosion thesis called Special Branch, published in 1972, but by "philosophical analysis" I have in mind a more analytic, argumentative kind of philosophical analysis than is found in Themerson's literary Special Branch. ↩