All of accountability's Comments + Replies

Retracted. I apologize for mischaracterizing the report and for the unfair attack on your work.

Indeed, I find it somewhat notable that high-level arguments for AI risk rarely attend in detail to the specific structure of an AI’s motivational system, or to the sorts of detailed trade-offs a not-yet-arbitrarily-powerful-AI might face in deciding whether to engage in a given sort of problematic power-seeking. [...] I think my power-seeking report is somewhat guilty in this respect; I tried, in my report on scheming, to do better.

Your 2021 report on power-seeking does not appear to discuss the cost-benefit analysis that a misaligned AI would conduct whe... (read more)

[This comment is no longer endorsed by its author]Reply
5Joe Carlsmith
I don't think this is quite right. For example: Section 4.3.3 of the report, "Controlling circumstances" focuses on the possibility of ensuring that an AI's environmental constraints are such that the cost-benefit calculus does not favor problematic power-seeking. Quoting: See also the discussion of myopia in 4.3.1.3... And of "controlling capabilities" in section 4.3.2:  I also discuss the cost-benefit dynamic in the section on instrumental convergence (including discussion of trying-to-make-a-billion-dollars as an example), and point people to section 4.3 for more discussion.  That said, I'm happy to acknowledge that the discussion of instrumental convergence in the power-seeking report is one of the weakest parts, on this and other grounds (see footnote for more);[1] that indeed numerous people over the years, including the ones you cite, have pushed back on issues in the vicinity (see e.g. Garfinkel's 2021 review for another example; also Crawford (2023)); and that this pushback (along with other discussions and pieces of content -- e.g., Redwood Research's work on "control," Carl Shulman on the Dwarkesh Podcast) has further clarified for me the importance of this aspect of picture. I've added some citations in this respect. And I am definitely excited about people (external academics or otherwise) criticizing/refining these arguments -- that's part of why I write these long reports trying to be clear about the state of the arguments as I currently understand them. 1. ^ The way I'd personally phrase the weakness is: the formulation of instrumental convergence focuses on arguing from "misaligned behavior from an APS system on some inputs" to a default expectation of "misaligned power-seeking from an APS system on some inputs." I still think this is a reasonable claim, but per the argument in this post (and also per my response to Thorstad here), in order to get to an argument for misaligned power-seeking on the the inputs the AI will actually recei