LESSWRONG
LW

750
Buck
13912Ω3100455352
Message
Dialogue
Subscribe

CEO at Redwood Research.

AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.

Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
12Buck's Shortform
Ω
6y
Ω
233
Anthropic's leading researchers acted as moderate accelerationists
Buck12d*104

Despite the shift, 80,000 Hours continues to recommend talented engineers to join Anthropic.

 

FWIW, it looks to me like they restrict their linked roles to things that are vaguely related to safety or alignment. (I think that the 80,000 Hours job board does include some roles that don't have a plausible mechanism for improving AI outcomes except via the route of making Anthropic more powerful, e.g. the alignment fine-tuning role.)

Reply11
Sam Marks's Shortform
Buck13d112

This can matter for deployment as well as research! Back in 2021, a friend of mine made this mistake while training a customer service model, leading to the model making an extremely inappropriate sexual comment while being demoed to a potential customer; he eventually figured out that a user had said that to the model at some point.

That said I'm not actually sure why in general it would be a mistake in practice to train on the combination. Often models improve their performance when you train them on side tasks that have some relevance to what they're supposed to be doing---that's the whole point of pretraining. How much are you saying that it's a mistake to do this for deployment, rather than problematic when you are trying to experiment on generalization?

Reply1
An epistemic advantage of working as a moderate
Buck15d53

Even though the basic ideas are kind of obvious, I think that us thinking them through and pushing on them has made a big difference in what companies are planning to do.

Reply11
An epistemic advantage of working as a moderate
Buck15d2016

The bioanchors post was released in 2020. I really wish that you bothered to get basic facts right when being so derisive about people's work.

I also think it's bad manners for you to criticize other people for making clear predictions given that you didn't make such predictions publicly yourself.

Reply6
An epistemic advantage of working as a moderate
Buck15d1011

Notably, bioanchors doesn't say that we should be confident AI won't arrive before 2040! Here's Ajeya's distribution in the report (which was finished in about July 2020).

Reply
An epistemic advantage of working as a moderate
Buck15d76

My core argument in this post isn't really relevant to anything that was happening in 2020, because people weren't really pushing on concrete changes to safety practices at AI companies yet.

Reply
Buck's Shortform
Buck19dΩ22386

Ugh, I think you're totally right and I was being sloppy; I totally unreasonably interpreted Eliezer as saying that he was wrong about how long/how hard/how expensive it would be to get between capability levels. (But maybe Eliezer misinterpreted himself the same way? His subsequent tweets are consistent with this interpretation.)

I totally agree with Eliezer's point in that post, though I do wish that he had been clearer about what exactly he was saying.

Reply1
The Problem
Buck20d64

Another important point here is if there was substantial economic incentive to build strong Go players, then powerful Go players would have built earlier, and the time between players of those two levels would probably have been more longer.

Reply
Buck's Shortform
Buck20d20

Thanks heaps for pulling this up! I totally agree with Eliezer's point there.

Reply
Buck's Shortform
Buck20d3421

It really depends what you mean by a small amount of time. On a cosmic scale, ten years is indeed short. But I definitely interpreted Eliezer back then (for example, while I worked at MIRI) as making a way stronger claim than this; that we'd e.g. within a few days/weeks/months go from AI that was almost totally incapable of intellectual work to AI that can overpower humanity. And I think you need to believe that much stronger claim in order for a lot of the predictions about the future that MIRI-sphere people were making back then to make sense. I wish we had all been clearer at the time about what specifically everyone was predicting.

Reply
Load More
No wikitag contributions to display.
210An epistemic advantage of working as a moderate
19d
94
48Four places where you can put LLM monitoring
Ω
1mo
Ω
0
25Research Areas in AI Control (The Alignment Project by UK AISI)
Ω
1mo
Ω
0
49Why it's hard to make settings for high-stakes control research
Ω
2mo
Ω
6
91Recent Redwood Research project proposals
Ω
2mo
Ω
0
190Lessons from the Iraq War for AI policy
2mo
25
51What's worse, spies or schemers?
Ω
2mo
Ω
2
56How much novel security-critical infrastructure do you need during the singularity?
Ω
2mo
Ω
7
62There are two fundamentally different constraints on schemers
Ω
2mo
Ω
0
150Comparing risk from internally-deployed AI to insider and outsider threats from humans
Ω
2mo
Ω
22
Load More