Whether the individual in question has other motivations doesn't by itself make the questions raised any less valid.
It could be evidence that the questioner isn't worth engaging, because the conversation is unlikely to be productive. The questioner might have significantly motivated cognition or have written the bottom line.
I'm not sure why being very agenty would necessarily mean that starting your own company should be the best bet. Being an employee lets you reap the benefits of specialization and others having taken all the risks for you (most new companies fail), and can be a very comfortable if you can just find a position that lets you use your skills to the fullest. Then you can focus on doing what you actually enjoy, as opposed to having to spend large amounts of extra energy to running a company.
Yes - absolutely true. 'Entrepreneurship' requires a unique skill set, and like every other skill set, comparative advantage and division of labor apply. It's entirely consistent to be a god of programming, relatively worse at entrepreneurship, and otherwise excellent at achieving your goals (agenty).
I'd spend a few weeks writing a set of incredibly convincing essays arguing that the reader should attempt to revive me. They would be aimed at a few major possible future civilizations, based on my best extrapolations of current civilization. I'd try to appeal to future values in as general a way as possible, while remaining convincing. I'd write them in English.
If you get revived, it's going to be because someone decided to revive you; they have total power over you before they revive you, so bringing a weapon wouldn't be very helpful. In fact, I probably wouldn't revive someone who put a weapon in their time capsule, at least without arming myself first!
Intelligence amplification technology is widespread, preventing any differential adoption by the FAI team. However, FAI researchers are able to keep up with competing efforts to use that technology for AI research.
If the "FAI is important" position is correct, but requires intelligence to understand, would widespread IA cause more people to become interested in working on FAI?
Yeah, I've heard that argument before. The idea is that intelligence not only makes you better at stuff, but also impacts how you make decisions about what to work on.
The alternate hypothesis is that intelligence-amplified people would just get better at being crazy. Perhaps one could start to tease apart the hypotheses by distinguishing 'intelligence' from 'reflectiveness' and 'altruism', and trying to establish how those quantities interact.
I like how you've partitioned things up into IA/government/status/memes/prediction/xrisk/security and given excellent/good/ok options for each. This helps imagine mix-and-match scenarios, e.g. "FAI team has got good at security but sucks at memes and status".
A few quick points:
The fantastic list has 8 points and the others have 7 (as there are two "government" points). This brings me on to...
Should there be a category for "funding"? The fantastic/good/ok options could be something like:
- Significant government funding and/or FAI team use their superior rationality to acquire very large amounts of money through business
- Attracts small share of government/philanthropy budgets and/or a lot of small donations from many individuals
- Shoestring budget, succeeds anyway because the problem turns out to not be so hard after all once you've acquired the right insights
Does it have to be the FAI team implementing the "other xrisk reduction efforts" or can it just be "such institutions exist"?
I'll add this, and the one from your other comment. (By the way, thank you for being the only person so far to actually answer the goddamn prompt!)
The main way complexity of this sort would be addressable is if the intellectual artifact that you tried to prove things about were simpler than the process that you meant the artifact to unfold into. For example, the mathematical specification of AIXI is pretty simple, even though the hypotheses that AIXI would (in principle) invent upon exposure to any given environment would mostly be complex. Or for a more concrete example, the Gallina kernel of the Coq proof engine is small and was verified to be correct using other proof tools, while most of the complexity of Coq is in built-up layers of proof search strategies which don't need to themselves be verified, as the proofs they generate are checked by Gallina.
Isn't that as unbelievable as the idea that you can prove that a particular zygote will never grow up to be an evil dictator? Surely this violates some principles of complexity, chaos [...]
Yes, any physical system could be subverted with a sufficiently unfavorable environment. You wouldn't want to prove perfection. The thing you would want to prove would be more along the lines of, "will this system become at least somewhere around as capable of recovering from any disturbances, and of going on to achieve a good result, as it would be if its designers had thought specifically about what to do in case of each possible disturbance?". (Ideally, this category of "designers" would also sort of bleed over in a principled way into the category of "moral constituency", as in CEV.) Which, in turn, would require a proof of something along the lines of "the process is highly likely to make it to the point where it knows enough about its designers to be able to mostly duplicate their hypothetical reasoning about what it should do, without anything going terribly wrong".
We don't know what an appropriate formalization of something like that would look like. But there is reason for considerable hope that such a formalization could be found, and that this formalization would be sufficiently simple that an implementation of it could be checked. This is because a few other aspects of decision-making which were previously mysterious, and which could only be discussed qualitatively, have had powerful and simple core mathematical descriptions discovered for cases where simplifying modeling assumptions perfectly apply. Shannon information was discovered for the informal notion of surprise (with the assumption of independent identically distributed symbols from a known distribution). Bayesian decision theory was discovered for the informal notion of rationality (with assumptions like perfect deliberation and side-effect-free cognition). And Solomonoff induction was discovered for the informal notion of Occam's razor (with assumptions like a halting oracle and a taken-for-granted choice of universal machine). These simple conceptual cores can then be used to motivate and evaluate less-simple approximations for situations where where the assumptions about the decision-maker don't perfectly apply. For the AI safety problem, the informal notions (for which the mathematical core descriptions would need to be discovered) would be a bit more complex -- like the "how to figure out what my designers would want to do in this case" idea above. Also, you'd have to formalize something like our informal notion of how to generate and evaluate approximations, because approximations are more complex than the ideals they approximate, and you wouldn't want to need to directly verify the safety of any more approximations than you had to. (But note that, for reasons related to Rice's theorem, you can't (and therefore shouldn't want to) lay down universally perfect rules for approximation in any finite system.)
Two other related points are discussed in this presentation: the idea that a digital computer is a nearly deterministic environment, which makes safety engineering easier for the stages before the AI is trying to influence the environment outside the computer, and the idea that you can design an AI in such a way that you can tell what goal it will at least try to achieve even if you don't know what it will do to achieve that goal. Presumably, the better your formal understanding of what it would mean to "at least try to achieve a goal", the better you would be at spotting and designing to handle situations that might make a given AI start trying to do something else.
(Also: Can you offer some feedback as to what features of the site would have helped you sooner be aware that there were arguments behind the positions that you felt were being asserted blindly in a vacuum? The "things can be surprisingly formalizable, here are some examples" argument can be found in lukeprog's "Open Problems Related to the Singularity" draft and the later "So You Want to Save the World", though the argument is very short and hard to recognize the significance of if you don't already know most of the mathematical formalisms mentioned. A backup "you shouldn't just assume that there's no way to make this work" argument is in "Artificial Intelligence as a Positive and Negative Factor in Global Risk", pp 12-13.)
what will prevent them from becoming "bad guys" when they wield this much power
That's a problem where successful/practically applicable formalizations are harder to hope for, so it's been harder for people to find things to say about it that pass the threshold of being plausible conceptual progress instead of being noisy verbal flailing. See the related "How can we ensure that a Friendly AI team will be sane enough?". But it's not like people aren't thinking about the problem.
This is actually one of the best comments I've seen on Less Wrong, especially this part:
Shannon information was discovered for the informal notion of surprise (with the assumption of independent identically distributed symbols from a known distribution). Bayesian decision theory was discovered for the informal notion of rationality (with assumptions like perfect deliberation and side-effect-free cognition). And Solomonoff induction was discovered for the informal notion of Occam's razor (with assumptions like a halting oracle and a taken-for-granted choice of universal machine). These simple conceptual cores can then be used to motivate and evaluate less-simple approximations for situations where where the assumptions about the decision-maker don't perfectly apply. For the AI safety problem, the informal notions (for which the mathematical core descriptions would need to be discovered) would be a bit more complex -- like the "how to figure out what my designers would want to do in this case" idea above.
Thanks for the clear explanation.
I haven't offered a rigorous definition, and I'm not going to, but I think you know what I mean.
I might have some inkling of what you want to mean, but on this forum, you ought to be able to define your terms to be taken seriously. I suspect that if you honestly try defining "good guys", you will find that it is harder than it looks and not at all obvious.
I'm not saying that the definition is obvious - I'm saying that it's besides the point. It was clearly detracting from the quality of the conversation, though, so I've removed the term.
The good guys implement deliberate X-risk reduction efforts to stave off non-AI X-risks. Those might include a global nanotech immune system, cheap and rigorous biotech tests and safeguards, an asteroid defense system, nuclear safeguards, etc.
Why are these part of the "fantastic scenario"? An asteroid defense system will almost certainly not be needed: the overwhelmingly likely case (backed up by telescope observations and outside view statistics) is that there won't be any big threatening asteroids over the relevant timescales.
Similarly, many of the other scenarios you list are concerned with differences that would slightly (or perhaps substantially for some) shift the probability of global outcomes, not outcomes. That's pretty different from a central requirement of a successful outcome. The framework here could be clearer.
Why are these part of the "fantastic scenario"? An asteroid defense system will almost certainly not be needed: the overwhelmingly likely case (backed up by telescope observations and outside view statistics) is that there won't be any big threatening asteroids over the relevant timescales.
I imagined the 'fantastic scenario' as being one in which "The good guys implement deliberate X-risk reduction efforts to stave off non-AI X-risks". I meant to cite "a global nanotech immune system, cheap and rigorous biotech tests and safeguards, an asteroid defense system, nuclear safeguards" as examples of "X-risk reduction efforts" in order to fill out the category, regardless of the individual relevance of any of the examples. Anyway, it's confusing, and I should remove it.
Similarly, many of the other scenarios you list are concerned with differences that would slightly (or perhaps substantially for some) shift the probability of global outcomes, not outcomes. That's pretty different from a central requirement of a successful outcome.
Yeah, I think I want a picture of what the world looks like where the probability of success was as high as possible, and then we succeeded. I think the central requirements of successful outcomes are far fewer, and less helpful for figuring out where to go.
you know what I mean.
Right, but this is a public-facing post. A lot of readers might not know why you could think it was obvious that "good guys" would imply things like information security, concern for Friendliness so-named, etc., and they might think that the intuition you mean to evoke with a vague affect-laden term like "good guys" is just the same argument-disdaining groupthink that would be implied if they saw it on any other site.
To prevent this impression, if you're going to use the term "good guys", then at or before the place where you first use it, you should probably put an explanation, like
(I.e. people who are familiar with the kind of thinking that can generate arguments like those in "The Detached Lever Fallacy", "Fake Utility Functions" and the posts leading up to it, "Anthropomorphic Optimism" and "Contaminated by Optimism", "Value is Fragile" and the posts leading up to it, and the "Envisioning perfection" and "Beyond the adversarial attitude" discussions in Creating Friendly AI or most of the philosophical discussion in Coherent Extrapolated Volition, and who understand what it means to be dealing with a technology that might be able to bootstrap to the singleton level of power that could truly engineer a "forever" of the "a boot stamping on a human face — forever" kind.)
Okay, I'm convinced. I think I will just remove the term altogether, because it's confusing the issue.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
This list is focused on scenarios where FAI succeeds by creating an AI that explodes and takes over the world. What about scenarios where FAI succeeds by creating an AI that provably doesn't take over the world? This isn't a climactic ending (although it may be a big step toward one), but it's still a success for FAI, since it averts a UFAI catastrophe.
(Is there a name for the strategy of making an oracle AI safe by making it not want to take over the world? Perhaps 'Hermit AI' or 'Anchorite AI', because it doesn't want to leave its box?)
This scenario deserves more attention that it has been getting, because it doesn't depend on solving all the problems of FAI in the right order. Unlike Nanny AI that takes over the world but only uses its powers for certain purposes, Anchorite AI might be a much easier problem than full-fledged FAI, so it might be developed earlier.
In the form of the OP:
Thanks! I've added it to the post. I particularly like that you included the 'sufficiently good' scenario - I hadn't directly thought about that before.