Mark_Friedenbach comments on Leaving LessWrong for a more rational life - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (268)
Thanks for taking the time to explain your reasoning, Mark. I'm sorry to hear you won't be continuing the discussion group! Is anyone else here interested in leading that project, out of curiosity? I was getting a lot out of seeing people's reactions.
I think John Maxwell's response to your core argument is a good one. Since we're talking about the Sequences, I'll note that this dilemma is the topic of the Science and Rationality sequence:
This is why there's a lot of emphasis on hard-to-test ("philosophical") questions in the Sequences, even though people are notorious for getting those wrong more often than scientific questions -- because sometimes (e.g., in the case of cryonics and existential risk) the answer matters a lot for our decision-making, long before we have a definitive scientific answer. That doesn't mean we should despair of empirically investigating these questions, but it does mean that our decision-making needs to be high-quality even during periods where we're still in a state of high uncertainty.
The Sequences talk about the Many Worlds Interpretation precisely because it's an unusually-difficult-to-test topic. The idea isn't that this is a completely typical example, or that it's a good idea to disregard evidence when it is available; the idea, rather, is that we sometimes do need to predicate our decisions on our best guess in the absence of perfect tests.
Its placement in Rationality: From AI to Zombies immediately after the 'zombies' sequence (which, incidentally, is an example of how and why we should reject philosophical thought experiments, no matter how intuitively compelling they are, when they don't accord with established scientific theories and data) is deliberate. Rather than reading either sequence as an attempt to defend a specific fleshed-out theory of consciousness or of physical law, they should primarily be read as attempts to show that extreme uncertainty about a domain doesn't always bleed over into 'we don't know anything about this topic' or 'we can't rule out any of the candidate solutions'.
We can effectively rule out epiphenomenalism as a candidate solution to the hard problem of consciousness even if we don't know the answer to the hard problem (which we don't), and we can effectively rule out 'consciousness causes collapse' and 'there is no objective reality' as candidate solutions to the measurement problem in QM even if we don't know the answer to the measurement problem (which, again, we don't). Just advocating 'physicalism' or 'many worlds' is a promissory note, not a solution.
In discussions of EA and x-risk, we likewise need to be able to prioritize more promising hypotheses over less promising ones long before we've answered all the questions we'd like answered. Even deciding what studies to fund presupposes that we've 'philosophized', in the sense of mentally aggregating, heuristically analyzing, and drawing tentative conclusions from giant complicated accumulated-over-a-lifetime data sets.
You wrote:
That's true, and it's one of the basic assumptions behind MIRI research: that understanding agents smarter than us isn't obviously hopeless, because our human capacity for abstract reasoning makes it possible for us to model systems even when they're extremely complex and dynamic. MIRI's research is intended to make this likelier to happen.
It's not the default that we're always able to predict what our inventions will do before we run them to see what happens; and there are some basic limits on our ability to do so when the system we're predicting is smarter than the predictor. But with enough intellectual progress we may become able to model abstract safety-relevant features of AGI behavior, even though we can't predict in detail the exact decisions the AGI will make. (If we could predict the exact decisions of the AGI, we'd have to be at least as smart as the AGI.)
If it isn't possible to learn a variety of generalizations about smarter autonomous systems, then, interestingly, that also undermines the case for intelligence explosion. Both 'humans trying to make superintelligent AI safe' and 'AI undergoing a series of recursive self-improvements' are cases where less intelligent agents are trying to reliably generate agents that meet various abstract criteria (including superior intelligence). The orthogonality thesis, likewise, simultaneously supports the claim 'many possible AI systems won't have humane goals' and 'it is possible for an AI system to have human goals'. This is why Bostrom/Yudkowsky-type arguments don't uniformly inspire pessimism.
Are you familiar with MIRI's technical agenda? You may also want to check out the AI Impacts project, if you think we should be prioritizing forecasting work at this point rather than object-level mathematical research.
Yes I'm familiar with the technical agenda. What do you mean by "forecasting work"--AI impacts? That seems to be of near-zero utility to me.
What MIRI should be doing, what I've advocated MIRI to do from the start, and which I can't get a straight answer on why they are not doing that does not in some way terminate in referencing the more speculative sections of the sequences I take issue with, is this: build artificial general intelligence and study it. Not a provably-safe-from-first-principles-before-we-touch-a-single-line-of-code AGI. Just a regular, run of the mill AGI using any one of the architectures presently being researched in the artificial intelligence community. Build it and study it.
A few quick concerns:
The closer we get to AGI, the more profitable further improvements in AI capabilities become. This means that the more we move the clock toward AGI, the more likely we are to engender an AI arms race between different nations or institutions, and the more (apparent) incentives there are to cut corners on safety and security. At the same time, AGI is an unusual technology in that it can potentially be used to autonomously improve on our AI designs -- so that the more advanced and autonomous AI becomes, the likelier it is to undergo a speed-up in rates of improvement (and the likelier these improvements are to be opaque to human inspection). Both of these facts could make it difficult to put the brakes on AI progress.
Both of these facts also make it difficult to safely 'box' an AI. First, different groups in an arms race may simply refuse to stop reaping the economic or military/strategic benefits of employing their best AI systems. If there are many different projects that are near or at AGI-level when your own team suddenly stops deploying your AI algorithms and boxes them, it's not clear there is any force on earth that can compel all other projects to freeze their work too, and to observe proper safety protocols. We are terrible at stopping the flow of information, and we have no effective mechanisms in place to internationally halt technological progress on a certain front. It's possible we could get better at this over time, but the sooner we get AGI, the less intervening time we'll have to reform our institutions and scientific protocols.
A second reason speed-ups make it difficult to safely box an AGI is that we may not arrest its self-improvement in the (narrow?) window between 'too dumb to radically improve on our understanding of AGI' and 'too smart to keep in a box'. We can try to measure capability levels, but only using imperfect proxies; there is no actual way to test how hard it would be for an AGI to escape a box beyond 'put the AGI in the box and see what happens'. Which means we can't get much of a safety assurance until after we've done the research you're talking about us doing on the boxed AI. If you aren't clear on exactly how capable the AI is, or how well measures of its apparent capabilities in other domains transfer to measures of its capability at escaping boxes, there are limits to how confident you can be that the AI is incapable of finding clever methods to bridge air gaps, or simply adjusting its software in such a way the methods we're using to inspect and analyze the AI compromise the box.
'AGI' is not actually a natural kind. It's just an umbrella term for 'any mind we could build that's at least as powerful as a human'. Safe, highly reliable AI in particular is likely to be an extremely special and unusual subcategory. Studying a completely arbitrary AGI may tell as about as much about how to build a safe AGI as studying nautilus ecology would tell us about how to safely keep bees and farm their honey. Yes, they're both 'animals', and we probably could learn a lot, but not as much as if we studied something a bit more bee-like. But in this case that presupposes that we understand AI safety well enough to build an AGI that we expect to look at least a little like our target safe AI. And our understanding just isn't there yet.
We already have seven billion general intelligences we can study in the field, if we so please; it's not obvious that a rushed-to-completion AGI would resemble a highly reliable safe AGI in all that much more detail than humans resemble either of those two hypothetical AGIs.
(Of course, our knowledge would obviously improve! Knowing about a nautilus and a squirrel really does tell us a lot more about beekeeping than either of those species would on its own, assuming we don't have prior experience with any other animals. But if the nautilus is a potential global catastrophic risk, we need to weigh those gains against the risk and promise of alternative avenues of research.)
Was any of that unclear?