I’ve written a draft report evaluating a version of the overall case for existential risk from misaligned AI, and taking an initial stab at quantifying the risk from this version of the threat. I’ve made the draft viewable as a public google doc here (Edit: arXiv version here, video presentation here, human-narrated audio version here). Feedback would be welcome.
This work is part of Open Philanthropy’s “Worldview Investigations” project. However, the draft reflects my personal (rough, unstable) views, not the “institutional views” of Open Philanthropy.
Two clarifications. First, even in the existing version, POWER can be defined for any bounded reward function distribution - not just IID ones. Second, the power-seeking results no longer require IID. Most reward function distributions incentivize POWER-seeking, both in the formal sense, and in the qualitative "keeping options open" sense.
To address your main point, though, I think we'll need to get more concrete. Let's represent the situation with a state diagram.
up
isgap year
, andright
isgo to college right away
.Both you and Rohin are glossing over several relevant considerations, which might be driving misunderstanding. For one:
Power depends on your time preferences. If your discount rate is very close to 1 and you irreversibly close off your ability to pursue ϵ percent of careers, then yes, you have decreased your POWER by
going to college
right away. If your discount rate is closer to 0, thencollege
lets you pursue more careers quickly, increasing your POWER for most reward function distributions.You shouldn't need to contort the distribution used by POWER to get reasonable outputs. Just be careful that we're talking about the same time preferences. (I can actually prove that in a wide range of situations, the POWER of state 1 vs the POWER of state 2 is ordinally robust to choice of distribution. I'll explain that in a future post, though.)
My position on "is POWER a good proxy for intuitive-power?" is that yes, it's very good, after thinking about it for many hours (and after accounting for sub-optimality; see the last part of appendix B). I think the overhauled power-seeking post should help, but perhaps I have more explaining to do.
Also, I perceive an undercurrent of "goal-driven agents should tend to seek power in all kinds of situations; your formalism suggests they don't; therefore, your formalism is bad", which is wrong because the premise is false. (Maybe this isn't your position or argument, but I figured I'd mention it in case you believe that)
This is superficially correct, but we have to be careful because
Basically, satisfactory formal analysis of this kind of situation is more involved than you make it seem.