Intelligence Amplification and Friendly AI

lukeprog

Part of the series AI Risk and Opportunity: A Strategic Analaysis. Previous articles on this topic: Some Thoughts on Singularity Strategies, Intelligence enhancement as existential risk mitigation, Outline of possible Singularity scenarios that are not completely disastrous.

Below are my quickly-sketched thoughts on intelligence amplification and FAI, without much effort put into organization or clarity, and without many references.[1] But first, I briefly review some strategies for increasing the odds of FAI, one of which is to work on intelligence amplification (IA).

Some possible “best current options” for increasing the odds of FAI

Suppose you find yourself in a pre-AGI world,[2] and you’ve been convinced that the status quo world is unstable, and within the next couple centuries we’ll likely[3] settle into one of four stable outcomes: FAI, uFAI, non-AI extinction, or a sufficiently powerful global government which can prevent AGI development[4]. And you totally prefer the FAI option. What should you do to get there?

Obvious direct approach: start solving the technical problems that must be solved to get FAI: goal stability under self-modification, decision algorithms that handle counterfactuals and logical uncertainty properly, indirect normativity, and so on. (MIRI’s work, some FHI work.)
Do strategy research, to potentially identify superior alternatives to the other items on this list, or superior versions of the things on this list already. (FHI’s work, some MIRI work, etc.)
Accelerate IA technologies, so that smarter humans can tackle FAI. (E.g. cognitive genomics.)
Try to make sure we get high-fidelity WBEs before AGI, without WBE work first enabling dangerous neuromorphic AGI. (Dalyrmple’s work?)
Improve political and scientific institutions so that the world is more likely to handle AGI wisely when it comes. (Prediction markets? Vannevar Group?)
Capacity-building. Grow the rationality community, the x-risk reduction community, the effective altruism movement, etc.
Other stuff. (More in later posts).

The IA route

Below are some key considerations about the IA route. I’ve numbered them so they’re easy to refer to later. My discussion assumes MIRI’s basic assumptions, including timelines similar to my own AGI timelines.

Maybe FAI is so hard that we can only get FAI with a large team of IQ 200+ humans, whereas uFAI can be built by a field of IQ 130–170 humans with a few more decades and lots of computing power and trial and error. So to have any chance of FAI at all, we’ve got to do WBE or IA first.
You could accelerate FAI relative to AGI if you somehow kept IA technology secret, for use only by FAI researchers (and maybe their supporters).
Powerful IA technologies would likely get wide adoption, and accelerate economic growth and scientific progress in general. If you think Earths with slower economic growth have a better chance at FAI, that could be bad for our FAI chances. If you think the opposite, then broad acceleration from IA could be good for FAI.
Maybe IA increases one’s “rationality” and “philosophical ability” (in scare quotes because we mostly don’t know how to measure them yet), and thus IA increases the frequency with which people will realize the risks of AGI and do sane things about it.
Maybe IA increases the role of intelligence and designer understanding, relative to hardware and accumulated knowledge, in AI development.[5]

Below are my thoughts about all this. These are only my current views: other MIRI personnel (including Eliezer) disagree with some of the points below, and I wouldn’t be surprised to change my mind about some of these things after extended discussion (hopefully in public, on Less Wrong).

I doubt (1) is true. I think IQ 130–170 humans could figure out FAI in 50–150 years if they were trying to solve the right problems, and if FAI development wasn’t in a death race with the strictly easier problem of uFAI. If normal smart humans aren’t capable of building FAI in that timeframe, that’s probably for lack of rationality and philosophical skill, not for lack of IQ. And I’m not confident that rationality and philosophical skill predictably improve with IQ after about IQ 140. It’s a good sign that atheism increases with IQ after IQ 140, but on the other hand I know too many high-IQ people who think that (e.g.) an AI that maximizes K-complexity is a win, and also there’s Stanovich’s research on how IQ and rationality come apart. For these reasons, I’m also not convinced (4) would be a large positive effect on our FAI chances.

Can we train people in rationality and philosophical skill beyond that of say, the 95th percentile Less Wronger? CFAR has plans to find out, but they need to grow a lot first to execute such an ambitious research program.

(2) looks awfully hard, unless we can find a powerful IA technique that also, say, gives you a 10% chance of cancer. Then some EAs devoted to building FAI might just use the technique, and maybe the AI community in general doesn’t.

(5) seems right, though I doubt it’ll be a big enough effect to make a difference for the final outcome.

I think (3) is the dominant consideration here, along with the worry about lacking the philosophical skill (but not IQ) to build FAI at all. At the moment, I (sadly) lean toward the view that slower Earths have a better chance at FAI. (Much of my brain doesn’t know this, though: I remember reading the Summers news with glee, and then remembering that on my current model this was actually bad news for FAI.)

I could say more, but I’ll stop for now and see what comes up in discussion.

My thanks to Justin Shovelain for sending me his old notes on the “IA first” case, and to Wei Dai, Carl Shulman, and Eliezer Yudkowsky for their feedback on this post. ↩
Not counting civilizations that might be simulating our world. This matters, but I won’t analyze that here. ↩
There are other possibilities. For example, there could be a global nuclear war that kills all but about 100,000 people, which could set back social, economic, and technological progress by centuries, thus delaying the crucial point in Earth’s history in which it settles into one of the four stable outcomes. ↩
And perhaps also advanced nanotechnology, intelligence amplification technologies, and whole brain emulation. ↩
Thanks to Carl Shulman for making this point. ↩

Some possible “best current options” for increasing the odds of FAI

Obvious direct approach: start solving the technical problems that must be solved to get FAI: goal stability under self-modification, decision algorithms that handle counterfactuals and logical uncertainty properly, indirect normativity, and so on. (MIRI’s work, some FHI work.)
Do strategy research, to potentially identify superior alternatives to the other items on this list, or superior versions of the things on this list already. (FHI’s work, some MIRI work, etc.)
Accelerate IA technologies, so that smarter humans can tackle FAI. (E.g. cognitive genomics.)
Try to make sure we get high-fidelity WBEs before AGI, without WBE work first enabling dangerous neuromorphic AGI. (Dalyrmple’s work?)
Improve political and scientific institutions so that the world is more likely to handle AGI wisely when it comes. (Prediction markets? Vannevar Group?)
Capacity-building. Grow the rationality community, the x-risk reduction community, the effective altruism movement, etc.
Other stuff. (More in later posts).

The IA route

Maybe FAI is so hard that we can only get FAI with a large team of IQ 200+ humans, whereas uFAI can be built by a field of IQ 130–170 humans with a few more decades and lots of computing power and trial and error. So to have any chance of FAI at all, we’ve got to do WBE or IA first.
You could accelerate FAI relative to AGI if you somehow kept IA technology secret, for use only by FAI researchers (and maybe their supporters).
Powerful IA technologies would likely get wide adoption, and accelerate economic growth and scientific progress in general. If you think Earths with slower economic growth have a better chance at FAI, that could be bad for our FAI chances. If you think the opposite, then broad acceleration from IA could be good for FAI.
Maybe IA increases one’s “rationality” and “philosophical ability” (in scare quotes because we mostly don’t know how to measure them yet), and thus IA increases the frequency with which people will realize the risks of AGI and do sane things about it.
Maybe IA increases the role of intelligence and designer understanding, relative to hardware and accumulated knowledge, in AI development.[5]

(5) seems right, though I doubt it’ll be a big enough effect to make a difference for the final outcome.

I could say more, but I’ll stop for now and see what comes up in discussion.

My thanks to Justin Shovelain for sending me his old notes on the “IA first” case, and to Wei Dai, Carl Shulman, and Eliezer Yudkowsky for their feedback on this post. ↩
Not counting civilizations that might be simulating our world. This matters, but I won’t analyze that here. ↩
There are other possibilities. For example, there could be a global nuclear war that kills all but about 100,000 people, which could set back social, economic, and technological progress by centuries, thus delaying the crucial point in Earth’s history in which it settles into one of the four stable outcomes. ↩
And perhaps also advanced nanotechnology, intelligence amplification technologies, and whole brain emulation. ↩
Thanks to Carl Shulman for making this point. ↩

(1) I've butted heads with you on timelines before. We're about a single decade away from AGI, if reasonable and appropriate resources are allocated to such a project. FAI in the sense that MIRI defines the term - provably friendly - may or may not be possible, and just finding that out is likely to take more time than we have left. I'm glad you made this post because if your estimate for a FAI theory timeline is correct, then MIRI is on the entirely wrong track, and alternatives or hybrid alternatives involving IA need to be considered. This is a discussion which needs to happen, and in public. (Aside: this is why I have not, and continue to refuse to donate to MIRI. You're solving the wrong problem, albeit with the best of intentions, and my money and time is better spent elsewhere.)

(2) Secrecy rarely has the intended outcome, is too easily undone, and is itself an self-destructive battle that would introduce severe risks. Achieving and maintaining operational security is a major entropy-fighting effort which distracts from the project goals, often drives away participants, and amplifies power dynamics among project leaders. That's a potent, and very bad mix.

(3-5) Seem mostly right, or wrong without negative consequences. I don't think there's any specific reasons in there not to take an IA route.

Take for example an UFAI (not actively unfriendly, just MIRI's definition of not-provably-FAI) tasked with the singular goal of augmenting the intelligence of humanity. This would be much safer than the Scary Idea strawman that MIRI usually paints, as it would in practice be engineering its own demise through an explicit goal to create runaway intelligence in humans.

If you were to actually implement this, the goal may need a little clarity:

Instead of “humanity” you may need to explicitly specify a group of humans (pre-chosen by the community or a committee for their own history of moral action and rational decision making), as well as constraints that they all advance / are augmented at approximately the same rate.
The AI should be penalized for any route it takes which results in the AI being even temporarily smarter than the humans. Presumably there is a tradeoff and an approximation here as the AI needs to be at least some level of superhuman smart in order to start the augmentation process, and needs to continue to improve itself in order to come up with even better augmentations. But it should favor plans which require smaller AI/human intelligence differentials.
To prevent weird outcomes, utility of future states should be weighted by an exponential decay. The AI should be focused on just getting existing humans augmented in the near term, and not worry itself over what it thinks future outcome would be millennia from now - that's for the augmented humans to worry about.

And I'm sure there are literally hundreds of other potential problems and small protective tweaks required. I would rather that MIRI spent it's time and money working on scenarios like this and formulating the various risks and counter measures, rather than obsessing over Löb obstacles (an near complete waste of time).

This is similar to how things are done in computer security. We have a well understood repertoire of attacks and general counter measures. Cryptographers then design specific protocols which through their unique construction are not vulnerable to the known attacks, and auditors make sure that implementations are free of side-channel vulnerabilities and such. How many security systems are provably secure? Very few, and none if you consider that those which have proofs have underlying assumptions which are not universally true. Nevertheless the process works and with each iterative design we move the ball forward towards end goal of a fully secure system in practice.

I'm not interested in an airtight mathematical proof of an AGI design which by your own estimate take an order of magnitude longer to develop than an unfriendly AGI. Money and time spent towards that is better directed towards other projects. I'd much rather see effort towards the evaluation of existing designs for self-modifying AGI, such as the GOLEM architecture[1], and accompanying “safe” goal systems implementing hybrid IA like I outlined above, or an AGI nanny architecture, etc.

EDIT: See Wai Dei's post[2] for a similar argument.

EDIT2: If you want to down-vote, that's fine. But please explain why in a reply.

Lets suppose that Nanotechnology capable of recording and manipulating brains on a subneronal level exists, to such a level that duplicating people is straightforward. Lets also assume that everyone working on this project has the same goal function, and that they aren't too intrinsically concerned about modifying themselves. The problem you are setting this AI is, given a full brain state, modify it to be much smarter but otherwise the same "person". Same person implies both same goal function and same memories and same personality quirks. ... (read more)

21

Intelligence Amplification and Friendly AI

21

Some possible “best current options” for increasing the odds of FAI

The IA route

21

21

Intelligence Amplification and Friendly AI

21

Some possible “best current options” for increasing the odds of FAI

The IA route

21