"Can we know what to do about AI?": An Introduction

JonahS

I'm currently working on a research project for MIRI, and I would welcome feedback on my research as I proceed. In this post, I describe the project.

As a part of an effort to steel-man objections to MIRI's mission, MIRI Executive Director Luke Muehlhauser has asked me to develop the following objection:

"Even if AI is somewhat likely to arrive during the latter half of this century, how on earth can we know what to do about it now, so far in advance?"

In Luke's initial email to me, he wrote:

I think there are plausibly many weak arguments and historical examples suggesting that P: "it's very hard to nudge specific distant events in a positive direction through highly targeted actions or policies undertaken today." Targeted actions might have no lasting effect, or they might completely miss their mark, or they might backfire.

If P is true, this would weigh against the view that a highly targeted intervention today (e.g. Yudkowsky's Friendly AI math research) is likely to positively affect the future creation of AI, and might instead weigh in favor of the view that all we can do about AGI from this distance is to engage in broad interventions likely to improve our odds of wisely handling future crises in general — e.g. improving decision-making institutions, spreading rationality, etc.

I'm interested in abstract arguments for P, but I'm even more interested in historical data. What can we learn from seemingly analogous cases, and are those cases analogous in the relevant ways? What sorts of counterfactual history can we do to clarify our picture?

Luke and I brainstormed a list of potential historical examples of people predicting the future 10+ years out, and using the predictions to inform their actions. We came up with the following potential examples, which I've listed in chronological order by approximate year:

1896: Svante Arrhenius's prediction of anthropogenic climate change.
1935: Leo Szilard's ~1935 attempts to keep his patent of the atomic bomb secret from Germany.
1950-1980: Efforts to win the Cold War decades later, such as increasing education for gifted children.
1960: Norbert Weiner highlighting the dangers of artificial intelligence.
1972: The circle of ideas and actions around the The Limits to Growth, a book about the consequences of unchecked population growth and economic growth.
1975: The WASH-1400 reactor safety study, which attempted to assess the risks associated with nuclear reactors.
1975: The Asilomar Conference on Recombinant DNA, which set up guidelines to ensure the safety of recombinant DNA technology.
1978: China's one-child policy to reduce population growth.
1980: The Ford Foundation setting up a policy think in India that helped India recover from its 1991 financial crisis
1988: Early climate change mitigation efforts
1992+: Asteroid strike deflection efforts
???: Possible deliberate long term efforts to produce revolutionary scientific technologies.
????: Long term computer security research

In addition, we selected

The Signal and the Noise: Why So Many Predictions Fail — but Some Don't by Nate Silver
Expert Political Judgment: How Good Is It? How Can We Know? by Philip Tetlock

as background reading.

I would greatly appreciate any ideas from the Less Wrong community concerning potential historical examples and relevant background reading.

Over the coming weeks, I'll be making a series of discussion board posts on Less Wrong reporting on my findings, and linking these posts here.

I'm currently working on a research project for MIRI, and I would welcome feedback on my research as I proceed. In this post, I describe the project.

As a part of an effort to steel-man objections to MIRI's mission, MIRI Executive Director Luke Muehlhauser has asked me to develop the following objection:

"Even if AI is somewhat likely to arrive during the latter half of this century, how on earth can we know what to do about it now, so far in advance?"

In Luke's initial email to me, he wrote:

I think there are plausibly many weak arguments and historical examples suggesting that P: "it's very hard to nudge specific distant events in a positive direction through highly targeted actions or policies undertaken today." Targeted actions might have no lasting effect, or they might completely miss their mark, or they might backfire.

If P is true, this would weigh against the view that a highly targeted intervention today (e.g. Yudkowsky's Friendly AI math research) is likely to positively affect the future creation of AI, and might instead weigh in favor of the view that all we can do about AGI from this distance is to engage in broad interventions likely to improve our odds of wisely handling future crises in general — e.g. improving decision-making institutions, spreading rationality, etc.

I'm interested in abstract arguments for P, but I'm even more interested in historical data. What can we learn from seemingly analogous cases, and are those cases analogous in the relevant ways? What sorts of counterfactual history can we do to clarify our picture?

1896: Svante Arrhenius's prediction of anthropogenic climate change.
1935: Leo Szilard's ~1935 attempts to keep his patent of the atomic bomb secret from Germany.
1950-1980: Efforts to win the Cold War decades later, such as increasing education for gifted children.
1960: Norbert Weiner highlighting the dangers of artificial intelligence.
1972: The circle of ideas and actions around the The Limits to Growth, a book about the consequences of unchecked population growth and economic growth.
1975: The WASH-1400 reactor safety study, which attempted to assess the risks associated with nuclear reactors.
1975: The Asilomar Conference on Recombinant DNA, which set up guidelines to ensure the safety of recombinant DNA technology.
1978: China's one-child policy to reduce population growth.
1980: The Ford Foundation setting up a policy think in India that helped India recover from its 1991 financial crisis
1988: Early climate change mitigation efforts
1992+: Asteroid strike deflection efforts
???: Possible deliberate long term efforts to produce revolutionary scientific technologies.
????: Long term computer security research

In addition, we selected

The Signal and the Noise: Why So Many Predictions Fail — but Some Don't by Nate Silver
Expert Political Judgment: How Good Is It? How Can We Know? by Philip Tetlock

as background reading.

I would greatly appreciate any ideas from the Less Wrong community concerning potential historical examples and relevant background reading.

Over the coming weeks, I'll be making a series of discussion board posts on Less Wrong reporting on my findings, and linking these posts here.

Your arguments would be much more convincing if you showed results from actual code. In engineering fields, including control theory and computer science, papers that contain mathematical arguments but no test data are much more likely to have errors than papers that include test data, and most highly-cited papers include test data. In less polite language, you appear to be doing philosophy instead of science (science requires experimental data, while philosophy does not).

What would you want this code to do? What code (short of a full-functioning AGI) would be at all useful here?

At minimum, I would like to see you try to test your theories by examining the actual performance of real-world computer systems, such as search engines, as they perform tasks analogous to making high-level ethical decisions.

Can you expand on this, possibly with example tasks, because I'm not sure what you are requesting here.

Your examples about predicting the future are only useful if you can identify trends by also considering past predictions that turned out to be inaccurate. The most exciting predictions about the future tend to be wrong, and the biggest advances tend to be unexpected.

This is a trenchant critique, but it ultimately isn't that strong: having trouble predicting should be a reason to if anything be more worried rather than less.

It appears that as technology improves, human lives become better and safer. I expect this trend to continue. I am not convinced that AI is fundamentally different -- in current societies, individuals with greatly differing intellectual capabilities and conflicting goals already coexist, and liberal democracy seems to work well for maintaining order and allowing incremental progress. If current trends continue, I would expect competing AIs to become unimaginably wealthy, while non-enhanced humans enjoy increasing welfare benefits. The failure mode I am most concerned about is a unified government turning evil (in other words, evolution stopping because the entire population becomes one unchanging organism), but it appears that this risk is minimized by existing antitrust laws (which provide a political barrier to a unified government) and by the high likelihood of space colonization occurring before superhuman AI appears (which provides a spatial barrier to a unified government).

This is missing the primary concern of people at MIRI and elsewhere. The concern isn't anything like gradually more and more competing AI coming online that are slightly smarter than baseline humans. The concern is that the first true AGI will self-modify itself to become far smarter and more capable of controlling the environment around it than anything else. In that scenario, issues like anti-trust or economics aren't relevant. It is true that on balance human lives have become better and safer, but that isn't by itself a strong reason to think that trend will continue, especially when considering hypothetical threats such the AGI threat whose actions are fundamentally discontinuous to prior human trends for standards of living.

Thanks for the thoughtful reply!

What code (short of a full-functioning AGI) would be at all useful here?

Possible experiments could include:

Simulate Prisoner's Dilemma agents that can run each others' code. Add features to the competition (e.g. group identification, resource gathering, paying a cost to improve intelligence) to better model a mix of humans and AIs in a society. Try to simulate what happens when some agents gain much more processing power than others, and what conditions make this a winning strategy. If possible, match results to real-w

... (read more)

28

"Can we know what to do about AI?": An Introduction

28

28

28

"Can we know what to do about AI?": An Introduction

28

28