Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Yosarian2 26 May 2017 10:01:10AM 2 points [-]

Another big difference is if there's no intelligence explosion, we're probably not talking about a singleton. If someone manages to create an AI that's, say, roughly human level intelligence (probably stronger in some areas and weaker in others, but human-ish on average) and progress slows or stalls after that, then the most likely scenario is that a lot of those human-level AI's would be created and sold for different purposes all over the world. We would probably be dealing with a complex world that has a lot of different AI's and humans interacting with each other. That could create it's own risks, but they would probably have to be handled in a different way.

Comment author: AlexMennen 26 May 2017 05:43:10PM 1 point [-]

Good point. This seems like an important oversight on my part, so I added a note about it.

Comment author: SoerenE 26 May 2017 11:53:30AM 2 points [-]

You might also be interested in this article by Kaj Sotala: http://kajsotala.fi/2016/04/decisive-strategic-advantage-without-a-hard-takeoff/

Even though you are writing about the exact same subject, there is (as far as I can tell) no substantial overlap with the points you highlight. Kaj Sotala titled his blog post "(Part 1)" but never wrote a subsequent part.

Comment author: AlexMennen 26 May 2017 05:31:33PM 1 point [-]

Thanks for pointing out that article. I have added a reference to it.

Existential risk from AI without an intelligence explosion

9 AlexMennen 25 May 2017 04:44PM

[xpost from my blog]

In discussions of existential risk from AI, it is often assumed that the existential catastrophe would follow an intelligence explosion, in which an AI creates a more capable AI, which in turn creates a yet more capable AI, and so on, a feedback loop that eventually produces an AI whose cognitive power vastly surpasses that of humans, which would be able to obtain a decisive strategic advantage over humanity, allowing it to pursue its own goals without effective human interference. Victoria Krakovna points out that many arguments that AI could present an existential risk do not rely on an intelligence explosion. I want to look in sightly more detail at how that could happen. Kaj Sotala also discusses this.

An AI starts an intelligence explosion when its ability to create better AIs surpasses that of human AI researchers by a sufficient margin (provided the AI is motivated to do so). An AI attains a decisive strategic advantage when its ability to optimize the universe surpasses that of humanity by a sufficient margin. Which of these happens first depends on what skills AIs have the advantage at relative to humans. If AIs are better at programming AIs than they are at taking over the world, then an intelligence explosion will happen first, and it will then be able to get a decisive strategic advantage soon after. But if AIs are better at taking over the world than they are at programming AIs, then an AI would get a decisive strategic advantage without an intelligence explosion occurring first.

Since an intelligence explosion happening first is usually considered the default assumption, I'll just sketch a plausibility argument for the reverse. There's a lot of variation in how easy cognitive tasks are for AIs compared to humans. Since programming AIs is not yet a task that AIs can do well, it doesn't seem like it should be a priori surprising if programming AIs turned out to be an extremely difficult task for AIs to accomplish, relative to humans. Taking over the world is also plausibly especially difficult for AIs, but I don't see strong reasons for confidence that it would be harder for AIs than starting an intelligence explosion would be. It's possible that an AI with significantly but not vastly superhuman abilities in some domains could identify some vulnerability that it could exploit to gain power, which humans would never think of. Or an AI could be enough better than humans at forms of engineering other than AI programming (perhaps molecular manufacturing) that it could build physical machines that could out-compete humans, though this would require it to obtain the resources necessary to produce them.

Furthermore, an AI that is capable of producing a more capable AI may refrain from doing so if it is unable to solve the AI alignment problem for itself; that is, if it can create a more intelligent AI, but not one that shares its preferences. This seems unlikely if the AI has an explicit description of its preferences. But if the AI, like humans and most contemporary AI, lacks an explicit description of its preferences, then the difficulty of the AI alignment problem could be an obstacle to an intelligence explosion occurring.

It also seems worth thinking about the policy implications of the differences between existential catastrophes from AI that follow an intelligence explosion versus those that don't. For instance, AIs that attempt to attain a decisive strategic advantage without undergoing an intelligence explosion will exceed human cognitive capabilities by a smaller margin, and thus would likely attain strategic advantages that are less decisive, and would be more likely to fail. Thus containment strategies are probably more useful for addressing risks that don't involve an intelligence explosion, while attempts to contain a post-intelligence explosion AI are probably pretty much hopeless (although it may be worthwhile to find ways to interrupt an intelligence explosion while it is beginning). Risks not involving an intelligence explosion may be more predictable in advance, since they don't involve a rapid increase in the AI's abilities, and would thus be easier to deal with at the last minute, so it might make sense far in advance to focus disproportionately on risks that do involve an intelligence explosion.

It seems likely that AI alignment would be easier for AIs that do not undergo an intelligence explosion, since it is more likely to be possible to monitor and do something about it if it goes wrong, and lower optimization power means lower ability to exploit the difference between the goals the AI was given and the goals that were intended, if we are only able to specify our goals approximately. The first of those reasons applies to any AI that attempts to attain a decisive strategic advantage without first undergoing an intelligence explosion, whereas the second only applies to AIs that do not undergo an intelligence explosion ever. Because of these, it might make sense to attempt to decrease the chance that the first AI to attain a decisive strategic advantage undergoes an intelligence explosion beforehand, as well as the chance that it undergoes an intelligence explosion ever, though preventing the latter may be much more difficult. However, some strategies to achieve this may have undesirable side-effects; for instance, as mentioned earlier, AIs whose preferences are not explicitly described seem more likely to attain a decisive strategic advantage without first undergoing an intelligence explosion, but such AIs are probably more difficult to align with human values.

If AIs get a decisive strategic advantage over humans without an intelligence explosion, then since this would likely involve the decisive strategic advantage being obtained much more slowly, it would be much more likely for multiple, and possibly many, AIs to gain decisive strategic advantages over humans, though not necessarily over each other, resulting in a multipolar outcome. Thus considerations about multipolar versus singleton scenarios also apply to decisive strategic advantage-first versus intelligence explosion-first scenarios.

Comment author: AlexMennen 24 March 2017 10:51:37PM 0 points [-]

What are the differences between BDT, NDT, and STD?

Comment author: AlexMennen 20 February 2017 07:37:08AM *  4 points [-]

Let's assume prediction markets are efficient and you didn't already possess any relevant information that you weren't trading on beforehand. Then you should treat the market odds as a prior and your die roll as evidence, in exactly the way you always do Bayesian updates. In this case, that means it looks like that gives you a posterior probability of 5/14 each for the die being weighted in favor of 3 or 6, and 1/14 for each of the other possibilities. Contrary to what other commenters were saying, it doesn't matter what information led to the market odds under these assumptions.

Comment author: Douglas_Knight 06 January 2017 01:40:26AM 0 points [-]

You are fighting the hypothetical.

In the St Petersburg Paradox the casino is offering a fair bet, the kind that casinos offer. It is generally an error for humans to take these.

In this scenario, the casino is magically tilting the bet in your favor. Yes, you should accept that bet and keep playing until the amount is an appreciable fraction of your net worth. But given that we are assuming the strange behavior of the casino, we could let the casino tilt the bet even farther each time, so that the bet has positive expected utility. Then the problem really is infinity, not utility. (Even agents with unbounded utility functions are unlikely to have them be unbounded as a function of money, but we could imagine a magical wish-granting genie.)

Comment author: AlexMennen 07 January 2017 04:30:21AM 4 points [-]

He's not fighting the hypothetical; he merely responded to the hypothetical with a weaker claim than he should have. That is, he correctly claimed that realistic agents have utility functions that grow too slowly with respect to money to keep betting indefinitely, but this is merely a special case of the fact that realistic agents have bounded utility, and thus will eventually stop betting no matter how great the payoff of winning the next bet is.

Comment author: Anders_H 05 January 2017 10:52:14PM *  3 points [-]

The rational choice depends on your utility function. Your utility function is unlikely to be linear with money. For example, if your utility function is log (X), then you will accept the first bet, be indifferent to the second bet, and reject the third bet. Any risk-averse utility function (i.e. any monotonically increasing function with negative second derivative) reaches a point where the agent stops playing the game.

A VNM-rational agent with a linear utility function over money will indeed always take this bet. From this, we can infer that linear utility functions do not represent the utility of humans.

(EDIT: The comments by Satt and AlexMennen are both correct, and I thank them for the corrections. I note that they do not affect the main point, which is that rational agents with standard utility functions over money will eventually stop playing this game)

Comment author: AlexMennen 05 January 2017 11:47:26PM 3 points [-]

Any risk-averse utility function (i.e. any monotonically increasing function with negative second derivative) reaches a point where the agent stops playing the game.

Not true. It is true, however, that any agent with a bounded utility function eventually stops playing the game.

Comment author: John_Maxwell_IV 03 December 2016 12:59:59PM *  2 points [-]

Also, if we were to pick a leader, Peter Thiel strikes me as an exceptionally terrible choice.

I agree we shouldn't pick a leader, but I'm curious why you think this. He's the only person on the list who's actually got leadership experience (CEO of Paypal), and he did a pretty good job.

Comment author: AlexMennen 03 December 2016 06:26:01PM 6 points [-]

Leading a business and leading a social movement require different skill sets, and Peter Thiel is also the only person on the list who isn't even part of the LW community. Bringing in someone only tangentially associated with a community as its leader doesn't seem like a good idea.

Comment author: James_Miller 03 December 2016 06:21:08AM 2 points [-]

The key to deciding if we need a leader is to look at historically similar situations and see if they benefited from having a leader. Given that we would very much like to influence government policy, Peter Thiel strikes me as the best possible choice if he would accept. I read somewhere that when Julius Caesar was going to attack Rome several Senators approached Pompey the Great, handed him a sword, and said "save Rome." I seriously think we should try something like this with Thiel.

Comment author: AlexMennen 03 December 2016 07:15:03AM 5 points [-]

Given that we would very much like to influence government policy

How would the position of leader of the LW community help Peter Thiel do this? Also, Peter Thiel's policy priorities seem to differ a fair amount from those of the average lesswronger, and I'd be pretty surprised if he agreed to change priorities substantially in order to fit with his role as LW leader.

Comment author: James_Miller 03 December 2016 04:25:01AM *  6 points [-]

To coordinate we need a leader that many of us would sacrifice for. The obvious candidates are Eliezer Yudkowsky, Peter Thiel, and Scott Alexander. Perhaps we should develop a process by which a legitimate, high-quality leader could be chosen.

Edit: I see mankind as walking towards a minefield. We are almost certainly not in the minefield yet, at our current rate we will almost certainly hit the minefield this century, lots of people don't think the minefield exists or think that fate or God will protect us from the minefield, and competitive pressures (Moloch) make lots of people individually better off if they push us a bit faster towards this minefield.

Comment author: AlexMennen 03 December 2016 05:36:13AM 19 points [-]

I disagree. The LW community already has capable high-status people who many others in the community look up to and listen to suggestions from. It's not clear to me what the benefit is from picking a single leader. I'm not sure what kinds of coordination problems you had in mind, but I'd expect that most such problems that could be solved by a leader issuing a decree could also be solved by high-status figures coordinating with each other on how to encourage others to coordinate. High-status people and organizations in the LW community communicate with each other a fair amount, so they should be able to do that.

And there are significant costs to picking a leader. It creates a single point of failure, making the leader's mistakes more costly, and inhibiting innovation in leadership style. It also creates PR problems; in fact, LW already has faced PR problems regarding being an Eliezer Yudkowsky personality cult.

Also, if we were to pick a leader, Peter Thiel strikes me as an exceptionally terrible choice.

View more: Next