Palus Astra — LessWrong

LESSWRONG
LW

Evan Hubinger on Inner Alignment, Outer Alignment, and Proposals for Building Safe Advanced AI

It’s well-established in the AI alignment literature what happens when an AI system learns or is given an objective that doesn’t fully capture what we want. Human preferences and values are inevitably left out and the AI, likely being a powerful optimizer, will take advantage of the dimensions of freedom afforded by the misspecified objective and set them to extreme values. This may allow for better optimization on the goals in the objective function, but can have catastrophic consequences for human preferences and values the system fails to consider. Is it possible for misalignment to also occur between the model being trained and the objective function used for training? The answer looks... (read 19832 more words →)

FLI Podcast: On Superforecasting with Robert de Neufville

Palus Astra

Essential to our assessment of risk and ability to plan for the future is our understanding of the probability of certain events occurring. If we can estimate the likelihood of risks, then we can evaluate their relative importance and apply our risk mitigation resources effectively. Predicting the future is, obviously, far from easy — and yet a community of "superforecasters" are attempting to do just that. Not only are they trying, but these superforecasters are also reliably outperforming subject matter experts at making predictions in their own fields. Robert de Neufville joins us on this episode of the FLI Podcast to explain what superforecasting is, how it's done, and the ways it... (read 15409 more words →)

AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah

Palus Astra

Just a year ago we released a two part episode titled An Overview of Technical AI Alignment with Rohin Shah. That conversation provided details on the views of central AI alignment research organizations and many of the ongoing research efforts for designing safe and aligned systems. Much has happened in the past twelve months, so we've invited Rohin — along with fellow researcher Buck Shlegeris — back for a follow-up conversation. Today's episode focuses especially on the state of current research efforts for beneficial AI, as well as Buck's and Rohin's thoughts about the varying approaches and the difficulties we still face. This podcast thus serves as a non-exhaustive overview of how... (read 26696 more words →)

FLI Podcast: The Precipice: Existential Risk and the Future of Humanity with Toby Ord

Palus Astra

Toby Ord’s “The Precipice: Existential Risk and the Future of Humanity" has emerged as a new cornerstone text in the field of existential risk. The book presents the foundations and recent developments of this budding field from an accessible vantage point, providing an overview suitable for newcomers. For those already familiar with existential risk, Toby brings new historical and academic context to the problem, along with central arguments for why existential risk matters, novel quantitative analysis and risk estimations, deep dives into the risks themselves, and tangible steps for mitigation. "The Precipice" thus serves as both a tremendous introduction to the topic and a rich source of further learning for existential risk... (read 13553 more words →)

AI Alignment Podcast: On Lethal Autonomous Weapons with Paul Scharre

Palus Astra

Most relevant to AI alignment, and a pertinent question to focus on for interested readers/listeners is: if we are are unable to establish a governance mechanism as a global community on the concept that we should not let AI make the decision to kill humans, then what effects will this have on and can we still deal with more subtle short-term alignment considerations and long-term AI x-risk?

Podcast page and audio: https://futureoflife.org/2020/03/16/on-lethal-autonomous-weapons-with-paul-scharre/

Transcript:

Lucas Perry: Welcome to the AI Alignment Podcast. I’m Lucas Perry. Today’s conversation is with Paul Scharre and explores the issue of lethal autonomous weapons. And so just what is the relation of lethal autonomous weapons... (read 14332 more words →)