CEO at Redwood Research.
AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.
I think it's very unlikely that (conditioned on no AI takeover) something similar to "all humans get equal weight in deciding what happens next" happens; I think that a negotiation between a small number of powerful people (some of whom represent larger groups, e.g. nations) that ends with an ad hoc distribution seems drastically more likely. The bargaining solution of "weight everyone equally" seems basically so implausible that it seems pointless to even discuss it as a pragmatic solution.
Thanks heaps for pointing out the Eliezer content!
I am very skeptical that you'll get "all of humanity equally" as the bargaining solution, as opposed to some ad hoc thing that weighs powerful people more. I'm not aware of any case where the solution to a bargaining problem was "weigh the preference of everyone in the world equally". (This isn't even how most democracies work internally!)
I am sympathetic on the object level to the kind of perspective you're describing here, where you say we should do something like the extrapolated preferences of some set of bargainers. Two problems:
One reason to think that the US winning concentrates power less is that the US is a democracy with a strong tradition of maintaining individual rights and a reasonably strong history (over the last 80 years) of pursuing a world order where it benefits from lots of countries being pretty stable and not e.g. invading each other.
My recommendation is to, following Joe Carlsmith, use them as synonyms, and use the term "schemer" instead of "deceptively aligned model". I do this.
Joe's issues with the term "deceptive alignment":
I think that the term "deceptive alignment" often leads to confusion between the four sorts of deception listed above. And also: if the training signal is faulty, then "deceptively aligned" models need not be behaving in aligned ways even during training (that is, "training gaming" behavior isn't always "aligned" behavior).
- increasing endorsement/linking of right wing figures like hanania and cremieux
Idk, back in the day LessWrong had a reasonable amount of discussion of relatively right-wing figures like Moldbug and other neoreactionaries, or on the less extreme end, people like Bryan Caplan. And there's always been an undercurrent of discussion of e.g. race and IQ.
low confidence but i feel like i can kind of assume that the median rat has libertarian sympathies now in a way that i couldn't before?
I feel like the median rat had strong libertarian sympathies 10 years ago.
My guess is that neither of us will hear about any of these discussions until after they're finalized.