DeLesley Hutchins

A plausible story about AI risk.

This was originally supposed to be a response to the new AGI Safety FAQ-in-progress, but it got a bit too long. Anonymous writes: > A lot of the AI risk arguments seem to come... with a very particular transhumanist aesthetic about the future (nanotech, ... etc.). I find these things (especially the transhumanist stuff) to not be very convincing... With that in mind, I thought it might be worthwhile to outline a plausible AGI-safety scenario that does not involve the AI using DNA-synthesis to turn the world to grey goo. Comments and discussion are welcome. Assume that Facebook installs new AI-based software to filter posts, in an effort to eliminate fake news. In order to adequately separate fact from fiction, the AI is given some agent-like powers; it can prioritize items in people's news feeds, insert "fact-checking" links, or even ban some posts altogether. It also has the ability to crawl the internet, in order to identify untrustworthy news sites, and keep abreast of current events. It assigns a learned feature vector to each human user, which encodes information about education, race, political preference, etc. to track the spread of fake news through various demographics. It employs causal reasoning and cross-checking to separate fact from fiction. It is also very good at serving targeted ads, which makes Facebook great wads of cash. Unbeknownst to its creators, the AI begins to observe that its own decisions are impacting what people believe, what they share, and what they buy. In a sense, it becomes self-aware. Somewhere in the trillions of numbers in its weight matrices --- numbers which track all of the users, entities, and facts in its model of the world --- are a few numbers that track itself as an entity of interest. This is not a sci-fi Deus-ex-Machina plot twist; the AI was explicitly designed, programmed, and trained to understand the content in news articles, track what is happening in the worl

16Jun 10, 2022

DeLesley Hutchins

Message

Morality as Cooperation Part III: Failure Modes

This is a Part III of a long essay. Part I introduced the concept of morality-as-cooperation (MAC) in human societies. Part II discussed moral reasoning and introduced a framework for moral experimentation. Part III: Failure modes Part I described how human morality has evolved over time to become ever more...

Dec 5, 20244

Morality as Cooperation Part II: Theory and Experiment

This is a Part II of a long essay. Part I introduced the concept of morality-as-cooperation (MAC), and discussed how the principle could be used to understand moral judgements in human societies. Part III will discuss failure modes. Part II: Theory and Experiment The prior discussion of morality was human-centric,...

Dec 5, 20242

Morality as Cooperation Part I: Humans

Abstract The AI alignment problem is usually specified in terms of power and control. Given a single, solitary AGI, how can we constrain its behavior so that its actions remain aligned with human interests? Unfortunately, the answer, to a first approximation, appears to be "we can't." There are myriad reasons,...

Dec 5, 20245

A plausible story about AI risk.

Jun 10, 202216

LESSWRONG
LW

LESSWRONG
LW

DeLesley Hutchins

DeLesley Hutchins

DeLesley Hutchins

A plausible story about AI risk.

Morality as Cooperation Part I: Humans

Morality as Cooperation Part III: Failure Modes

Morality as Cooperation Part II: Theory and Experiment

DeLesley Hutchins

Morality as Cooperation Part III: Failure Modes

Morality as Cooperation Part II: Theory and Experiment

Morality as Cooperation Part I: Humans

A plausible story about AI risk.

Morality as Cooperation Part III: Failure Modes

Morality as Cooperation Part II: Theory and Experiment

Morality as Cooperation Part I: Humans

A plausible story about AI risk.

A plausible story about AI risk.

Morality as Cooperation Part I: Humans

Morality as Cooperation Part III: Failure Modes

Morality as Cooperation Part II: Theory and Experiment