In the past, people like Eliezer Yudkowsky (see 1, 2, 3, 4, and 5) have argued that MIRI has a medium probability of success. What is this probability estimate based on and how is success defined?
(Meta: I don't think this deserves a discussion thread, but I posted this on the open thread and no-one responded, and I think it's important enough to merit a response.)
A non-exhaustive list of some reasons why I strongly disagree with this combination of views:
AI which is not vastly superhuman can be restrained from crime, because humans can be so restrained, and with AI designers have the benefits of the ability to alter the mind's parameters (desires, intuitions, capability for action, duration of extended thought, etc) inhibitions, test copies in detail, read out its internal states, and so on, making the problem vastly easier (although control may need to be tight if one is holding back an intelligence explosion while this is going on)
If 10-50 humans can solve AI safety (and build AGI!) in less than 50 years, then 100-500 not very superhuman AIs at 1200x speedup should be able to do so in less than a month
There are a variety of mechanisms by which humans could monitor, test, and verify the work conducted by such systems
The AIs can also work on incremental improvements to the control mechanisms being used initially, with steady progress allowing greater AI capabilities to develop better safety measures, until one approaches perfect safety
If a small group can solve all the relevant problems over a few decades, then probably a large portion of the AI community (and beyond) can solve the problems in a fraction of the time if mobilized
As AI becomes visibly closer such mobilization becomes more likely
Developments in other fields may make things much easier: better forecasting, cognitive enhancement, global governance, brain emulations coming first, global peace/governance
The broad shape of AI risk is known and considered much more widely than MIRI: people like Bill Gates and Peter Norvig consider it, but think that acting on it now is premature; if they saw AGI as close, or were creating it themselves, they would attend to the control problems
Paul Christiano, and now you, have started using the phrase "AI control problems". I've gone along with it in my discussions with Paul, but before many people start adopting it maybe we ought to talk about whether it makes sense to frame the problem that way (as opposed to "Friendly AI"). I see a number of problems with it:
1. Control != Safe or Friendly. An AI can be perfectly controlled by a human and be extremely dangerous, because most humans aren't very altruistic or rational.
2. The framing implicitly suggests (and you also explicitly suggest) that the control problem can be solved incrementally. But I think we have reason to believe this is not the case, that in short "safety for superintelligent AIs" = "solving philosophy/metaphilosophy" which can't be done by "incremental improvements to the control mechanisms being used initially".
3. "Control" suggests that the problem falls in the realm of engineering (i.e., belongs to the reference class of "control problems" in engineering, such as "aircraft flight control"), whereas, again, I think the real problem is one of philosophy (plus lots of engineering as well of course, but philosophy is where most of the difficulty lies). This makes a big difference in trying to predict the success of various potential attempts to solve the problem, and I'm concerned that people will underestimate the difficulty of the problem or overestimate the degree to which it's parallelizable or generally amenable to scaling with financial/human resources, if the problem becomes known as "AI control".
Do you disagree with this, on either the terminological issue ("AI control" suggests "incremental engineering problem") or the substantive issue (the actual problem we face is more like philosophy than engineering)? If the latter, I'm surprised not to have seen you talk about your views on this topic earlier, unless you did and I missed it?
7Kawoomba
Not that it should be used to dismiss any of your arguments, but reading your other comments in this thread I thought you must be playing devil's advocate. Your phrasing here seems to preclude that possibility.
If you are so strongly convinced that while AGI is a non-negligible x-risk, MIRI will probably turn out to have been without value even if a good AGI outcome were to be eventually achieved, why are you a research fellow there?
I'm puzzled. Let's consider an edge case: even if MIRI's factual research turned out to be strictly non-contributing to an eventual solution, there's no reasonable doubt that it has raised awareness of the issue significantly (in relative terms).
Would the current situation with the CSER or FHI be unchanged or better if MIRI had never existed? Do you think those have a good chance of being valuable in bringing about a good outcome? Answering 'no' to the former and 'yes' to the latter would transitively imply that MIRI is valuable as well.
I.e. that alone --nevermind actual research contributions -- would make it valuable in hindsight, given an eventual positive outcome. Yet you're strongly opposed to that view?
In the past, people like Eliezer Yudkowsky (see 1, 2, 3, 4, and 5) have argued that MIRI has a medium probability of success. What is this probability estimate based on and how is success defined?
I've read standard MIRI literature (like "Evidence and Import" and "Five Theses"), but I may have missed something.
-
(Meta: I don't think this deserves a discussion thread, but I posted this on the open thread and no-one responded, and I think it's important enough to merit a response.)