Buck

CEO at Redwood Research.

AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Wikitag Contributions

Comments

Sorted by
BuckΩ473

Ryan agrees, the main thing he means by "behavioral output" is what you're saying: an actually really dangerous action.

BuckΩ222

I think we should probably say that exploration hacking is a strategy for sandbagging, rather than using them as synonyms.

Buck64

Isn’t the answer that the low hanging fruit of explaining unexplained observations has been picked?

Buck402

I appeared on the 80,000 Hours podcast. I discussed a bunch of points on misalignment risk and AI control that I don't think I've heard discussed publicly before.

Transcript + links + summary here; it's also available as a podcast in many places.

Buck42

I love that I can guess the infohazard from the comment 

BuckΩ14280

A few months ago, I accidentally used France as an example of a small country that it wouldn't be that catastrophic for AIs to take over, while giving a talk in France 😬

Buck40

No problem, my comment was pretty unclear and I can see from the other comments why you'd be on edge!

Buck30

It seems extremely difficult to make a blacklist of models in a way that isn't trivially breakable. (E.g. what's supposed to happen when someone adds a tiny amount of noise to the weights of a blacklisted model, or rotates them along a gauge invariance?)

Buck90

I agree that this isn't what I'd call "direct written evidence"; I was just (somewhat jokingly) making the point that the linked articles are Bayesian evidence that Musk tries to censor, and that the articles are pieces of text.

Buck10

It is definitely evidence that was literally written

Load More