JustinShovelain

Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks

by Mark Kagach, EliasSchlie, Thomas Van Damme, and JustinShovelain

Summary This is a write-up on preparing for warning shots to catalyze international cooperation on AGI risks, and the corollary list of projects one could pursue. We argue we must first (1) understand types of warning shots, then (2) prepare to catch them. We must stay vigilant: both to (3)...

Jun 539

Counter-considerations on AI arms races

by Mateusz Bagiński and JustinShovelain

(Work done at Convergence Analysis. Mateusz wrote the post and is responsible for the outline of the argument, many details of which crystallized in conversations with Justin. Thanks to Olga Babeeva for the feedback on this post.) 1. Introduction: Clarifying the DSA-AI theses Over the last decade, AI research and...

May 15, 202524

Goodhart Typology via Structure, Function, and Randomness Distributions

(Work done at Convergence Analysis. The ideas are due to Justin. Mateusz wrote the post. Thanks to Olga Babeeva for feedback on this post.) In this post, we introduce the typology of structure, function, and randomness that builds on the framework introduced in the post Goodhart's Law Causal Diagrams. We...

Mar 25, 202535

Bounded AI might be viable

by Mateusz Bagiński and JustinShovelain

(Work done at Convergence Analysis. Mateusz wrote the post and is responsible for most of the ideas with Justin helping to think it through. Thanks to Olga Babeeva for the feedback on this post.) 1. Motivation Suppose the perspective of pausing or significantly slowing down AI progress or solving the...

Mar 6, 202524

Information-Theoretic Boxing of Superintelligences

Boxing an agent more intelligent than ourselves is daunting, but information theory, thermodynamics, and control theory provide us with tools that can fundamentally constrain agents independent of their intelligence. In particular, we may be able to contain an AI by limiting its access to information Constraining output and inputs Superintelligent...

Nov 30, 202331

The risk-reward tradeoff of interpretability research

Interpretability research is conducted to improve our understanding of AI. Many see interpretability as essential for AI safety, but recently some have argued that it can also increase the risk posed by AI by facilitating improved AI capabilities. We agree, and in this post, we’ll explain why, as well as...

Jul 5, 202317

Aligning AI by optimizing for "wisdom"

In this post, we’ll introduce wisdom as a measure of the benevolence and internal coherence of an arbitrary agent. We’ll define several factors, such as the agent’s values, plans, evidence, and alignment with human values, and then define wisdom as consistency within and between these factors. We believe this is...

Jun 27, 202328

JustinShovelain

JustinShovelain

Coffee: When it helps, when it hurts

Using vector fields to visualise preferences and make them consistent

Updating Utility Functions

Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks

JustinShovelain

Coffee: When it helps, when it hurts

Using vector fields to visualise preferences and make them consistent

Updating Utility Functions

Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks

Preparing for Warning Shots to Catalyze International Cooperation on AGI Risks

Counter-considerations on AI arms races

Goodhart Typology via Structure, Function, and Randomness Distributions

Bounded AI might be viable

Information-Theoretic Boxing of Superintelligences

The risk-reward tradeoff of interpretability research

Aligning AI by optimizing for "wisdom"