Advanced AIs might be capable of various credible commitments unavailable to humans, which they could use when bargaining with each other. “Bargaining” can sound like something pretty specific: haggling over (literal) prices. But, in the sense discussed in Schelling’s The Strategy of Conflict for instance, “bargaining” refers to any attempt...
In Part I of CLR's safe Pareto improvements (SPI) agenda, we gave our high-level strategy for evaluating models for SPI-incompatible behavior and reasoning. This guide gives more details on how I’m thinking about executing on this strategy, especially: * the kind of workflow I think we should use, to start...
Executive summary * Safe Pareto improvements (SPIs) are ways of changing agents’ bargaining strategies that make all parties better off, regardless of their original strategies. SPIs are an unusually robust approach to preventing catastrophic conflict between AI systems, especially AIs capable of credible commitments. This is because SPIs can reduce...
(Subtitle: “And ethics, and epistemology, and…”. Cross-posted from my Substack.) We want to make decisions for good reasons. But I worry some common approaches to decision theory stray from this purpose. They start with a bottom-line verdict, “I should choose this action”, then use this verdict to justify claims about...
(Cross-posted from my Substack.) Here’s an important way people might often talk past each other when discussing the role of intuitions in philosophy.[1] Intuitions as predictors When someone appeals to an intuition to argue for something, it typically makes sense to ask how reliable their intuition is. Namely, how reliable...
I’ve argued in my unawareness sequence that when we properly account for our severe epistemic limitations, we are clueless about our impact from an impartial altruistic perspective. However, this argument and my responses to counterarguments involve a lot of moving parts. And the term “clueless” gets used in various importantly...
As many folks in AI safety have observed, even if well-intentioned actors succeed at intent-aligning highly capable AIs, they’ll still face some high-stakes challenges.[1] Some of these challenges are especially exotic and could be prone to irreversible, catastrophic mistakes. E.g., deciding whether and how to do acausal trade. To deal...