Buck

CEO at Redwood Research.

AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Wikitag Contributions

Comments

Sorted by

My guess is that neither of us will hear about any of these discussions until after they're finalized.

I think it's very unlikely that (conditioned on no AI takeover) something similar to "all humans get equal weight in deciding what happens next" happens; I think that a negotiation between a small number of powerful people (some of whom represent larger groups, e.g. nations) that ends with an ad hoc distribution seems drastically more likely. The bargaining solution of "weight everyone equally" seems basically so implausible that it seems pointless to even discuss it as a pragmatic solution.

Thanks heaps for pointing out the Eliezer content!

I am very skeptical that you'll get "all of humanity equally" as the bargaining solution, as opposed to some ad hoc thing that weighs powerful people more. I'm not aware of any case where the solution to a bargaining problem was "weigh the preference of everyone in the world equally". (This isn't even how most democracies work internally!)

I am sympathetic on the object level to the kind of perspective you're describing here, where you say we should do something like the extrapolated preferences of some set of bargainers. Two problems:

  • I think that when people talk about CEV, they're normally not defining it in terms of humanity because humans are who you pragmatically have to coordinate with. E.g. I don't see anything like that mentioned in the wiki page or in the original paper on a quick skim; I interpret Eliezer as referencing humanity because that's who he actually cares about the values of. (I could be wrong about what Eliezer thinks here.)
  • I think it's important to note that if you settle on CEV as a bargaining solution, this probably ends up with powerful people (AI company employees, heads of state) drastically overrepresented in the bargain, which is both unattractive and doesn't seem to be what people are usually imagining when they talk about CEV.

One reason to think that the US winning concentrates power less is that the US is a democracy with a strong tradition of maintaining individual rights and a reasonably strong history (over the last 80 years) of pursuing a world order where it benefits from lots of countries being pretty stable and not e.g. invading each other.

BuckΩ673

My recommendation is to, following Joe Carlsmith, use them as synonyms, and use the term "schemer" instead of "deceptively aligned model". I do this.

Joe's issues with the term "deceptive alignment":

I think that the term "deceptive alignment" often leads to confusion between the four sorts of deception listed above. And also: if the training signal is faulty, then "deceptively aligned" models need not be behaving in aligned ways even during training (that is, "training gaming" behavior isn't always "aligned" behavior).

BuckΩ331

I was mostly thinking of misaligned but non-deceptively-aligned models.

BuckΩ8117

I think it's conceivable for non-deceptively-aligned models to gradient hack, right?

Buck1526

I think that shifting from 15% to 20% over ten years is so plausible under the null hypothesis that it doesn't really cry out for explanation, and any proposed explanation has to somehow explain why it didn't lead to a larger effect!

  • increasing endorsement/linking of right wing figures like hanania and cremieux

Idk, back in the day LessWrong had a reasonable amount of discussion of relatively right-wing figures like Moldbug and other neoreactionaries, or on the less extreme end, people like Bryan Caplan. And there's always been an undercurrent of discussion of e.g. race and IQ.

low confidence but i feel like i can kind of assume that the median rat has libertarian sympathies now in a way that i couldn't before?

I feel like the median rat had strong libertarian sympathies 10 years ago.

Load More