LESSWRONG
Wikitags
LW

Power Seeking (AI)

Settings

Applied to The Human Alignment Problem for AIs by rife 2mo ago

Applied to Intrinsic Power-Seeking: AI Might Seek Power for Power’s Sake by TurnTrout 4mo ago

Applied to From Human to Posthuman: Transhumanism, Anarcho-Capitalism, and AI’s Role in Global Disparity and Governance by DyingNaive 5mo ago

Applied to A framework for thinking about AI power-seeking by RobertM 8mo ago

Applied to Steering Llama-2 with contrastive activation additions by TurnTrout 1y ago

Applied to Natural Abstraction: Convergent Preferences Over Information Structures by paulom 1y ago

Applied to You can't fetch the coffee if you're dead: an AI dilemma by hennyge 2y ago

Applied to The Game of Dominance by Karl von Wendt 2y ago

Applied to Incentives from a causal perspective by tom4everitt 2y ago

Applied to Instrumental Convergence? [Draft] by Dan H 2y ago

Applied to Categorical-measure-theoretic approach to optimal policies tending to seek power by Vika 2y ago

Applied to My Overview of the AI Alignment Landscape: Threat Models by Michelle Viotti 2y ago

Applied to Ideas for studies on AGI risk by dr_s 2y ago

Applied to Instrumental convergence in single-agent systems by Jacob Pfau 2y ago

Applied to [Linkpost] Shorter version of report on existential risk from power-seeking AI by Ruby 2y ago