Great post, thanks for sharing!
I don't have good intuitions about the Gamma distribution, and I'd like to have good intuitions for computing your Rule's outcomes in my head. Here's a way of thinking about it -- do you think it makes sense?
Let denote either or (whichever your rule says is appropriate).
I notice that for , your probability of zero events , where is what I'd call the estimated event rate .
So one nice intuitive interpretation of your rule is that, if we assume event times are exponentially distributed, we should model the rate as . Does that sound right? It's been a while since I've done a ton of math, so I wouldn't be surprised if I'm missing something here.
In general, this post has prompted me to think more about the transition period between AI that's weaker than humans and stronger than all of human civilization, and that's been interesting! A lot of people assume that that takeoff will happen very quickly, but if it lasts for multiple years (or even decades) then the dynamics of that transition period could matter a lot, and trade is one aspect of that.
some stray thoughts on what that transition period could look like:
I love the genre of "Katja takes an AI risk analogy way more seriously than other people and makes long lists of ways the analogous thing could work." (the previous post in the genre being the classic "Beyond fire alarms: freeing the groupstuck.")
Digging into the implications of this post:
In sum, for AI systems to be to humans as we are to ants, would be for us to be able to do many tasks better than AI, and for the AI systems to be willing to pay us grandly for them, but for them to be unable to tell us this, or even to warn us to get out of the way. Is this what AI will be like? No. AI will be able to communicate with us, though at some point we will be less useful to AI systems than ants could be to us if they could communicate.
I'm curious how much you think the arguments in this post should affect our expectations of AI-human relations overall? At its core, my concern is:
I can think of a few reasons that human-AI trade might matter for the end-state:
Maybe one useful thought experiment is whether we could train a dog-level intelligence to do most of these tasks if it had the actuators of an ant colony, given our good understanding of dog training (~= "communication") and the fact that dogs still lack a bunch of key cognitive abilities humans have (so dog-human relations are somewhat analogous to human-AI relations).
(Also, ant colonies in aggregate do pretty complex things, so maybe they're not that far off from dogs? But I'm mostly just thinking of Douglas Hofstadter's "Aunt Hillary" here :)
My guess is that for a lot of Katja's proposed trades, you'd only need the ants to have a moderate level of understanding, something like "dog level" or "pretty dumb AI system level". (e.g. "do thing X in situations where you get inputs Y that were associated with thing-we-actually-care-about Z during the training session we gave you".)
The 'failure to communicate' is therefore in fact a failure to be able to think and act at the required level of flexibility and abstraction, and that seems more likely to carry over to our relations with some theoretical, super advanced AI or civilisation.
Definitely true that you're a more valuable trade partner if you're smarter. But there are some particularly useful intelligence/comms thresholds that we meet and ants don't -- e.g. the "dog level", plus some self-awareness stuff, plus not-awful world models in some domains.
Meta: the dog analogy ignores the distinction between training and trading. I'm eliding this here bc it's hard to know what an ant colony's "considered opinion" / "reflective endorsement" would mean, let alone an ant's. but ofc this matters a lot for AGI-human interactions. Consider an AGi that keeps humans around on a "human preserve" out of sentiment, but only cares about certain features of humanity and genetically modifies others out of existence (analogous to training out certain behaviors or engaging in selective breeding), or tortures / brainwashes humans to get them to act the way it wants. (These failure modes of "having things an AI wants, and being able to give it those things, but not defend yourself" are also alluded to in other comments here, e.g. gwern and Elisabeth's comments about "the noble wolf" and torture, respectively.)
In light of the FTX thing, maybe a particularly important heuristic is to notice cases where the worst-case is not lower-bounded at zero. Examples:
Not that you should definitely not do things that potentially have large-negative downsides, but you can be a lot more willing to experiment when the downside is capped at zero.
Thanks for your posts, Scott! This has been super interesting to follow.
Figuring out where to set the AM-GM boundary strikes me as maybe the key consideration wrt whether I should use GM -- otherwise I don't know how to use it in practical situations, plus it just makes GM feel inelegant.
From your VNM-rationality post, it seems like one way to think about the boundary is commensurability. You use AM within clusters whose members are willing to sacrifice for each other (are willing to make Kaldor-Hicks improvements, and have some common currency s.t. "K-H improvement" is well-defined; or, in another framing, have a meaningfully shared utility function) . Maybe that's roughly the right notion to start with? But then it feels strange to me to not consider things commensurate across epistemic viewpoints, especially if those views are contained in a single person (though GM-ing across internal drives does seem plausible to me).
I'd love to see you (or someone else) explore this idea more, and share hot takes about how to pin down the questions you allude to in the AM-GM boundary section of this post: where to set this boundary, examples of where you personally would set it in different cases, and what desiderata we should have for boundary-setting eventually. (It feels plausible to me that having maximally large clusters is in some important sense the right thing to aim for).
Oops yes, sorry!