Jacob Griffith

There can be an inherent illogic to human evil. In AI safety, it is important to consider whether a rogue AI agent is capable of committing this type of evil, and if so, what kind of agent might pose a risk of doing so. I draw on work from Corin Katzke and Joseph Carlsmith to explore the power-seeking tendencies of AI, focusing particularly on Carlsmith’s “messy” and “clean” goal directedness model, to conceptualise the nature in which an AI agent may acquire power-seeking characteristics. I then look at instances of catastrophic power-seeking carried out by humans: genocides, asserting that genocide is a product of “messy” goal directed-ness, and following this I question... (read 1803 more words →)

LESSWRONG
LW

LESSWRONG
LW

Jacob Griffith

Jacob Griffith

Jacob Griffith

On the plausibility of a “messy” rogue AI committing human-like evil

Jacob Griffith

Jacob Griffith

Jacob Griffith

On the plausibility of a “messy” rogue AI committing human-like evil