This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Wikitags
LW
Login
Power Seeking (AI)
Settings
Applied to
The Human Alignment Problem for AIs
by
rife
2mo
ago
Applied to
Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake
by
TurnTrout
4mo
ago
Applied to
From Human to Posthuman: Transhumanism, Anarcho-Capitalism, and AI’s Role in Global Disparity and Governance
by
DyingNaive
5mo
ago
Applied to
A framework for thinking about AI power-seeking
by
RobertM
8mo
ago
Applied to
Steering Llama-2 with contrastive activation additions
by
TurnTrout
1y
ago
Applied to
Natural Abstraction: Convergent Preferences Over Information Structures
by
paulom
1y
ago
Applied to
You can't fetch the coffee if you're dead: an AI dilemma
by
hennyge
2y
ago
Applied to
The Game of Dominance
by
Karl von Wendt
2y
ago
Applied to
Incentives from a causal perspective
by
tom4everitt
2y
ago
Applied to
Instrumental Convergence? [Draft]
by
Dan H
2y
ago
Applied to
Categorical-measure-theoretic approach to optimal policies tending to seek power
by
Vika
2y
ago
Applied to
My Overview of the AI Alignment Landscape: Threat Models
by
Michelle Viotti
2y
ago
Applied to
Ideas for studies on AGI risk
by
dr_s
2y
ago
Applied to
Instrumental convergence in single-agent systems
by
Jacob Pfau
2y
ago
Applied to
[Linkpost] Shorter version of report on existential risk from power-seeking AI
by
Ruby
2y
ago