All of ozhang's Comments + Replies

Thank you! This has been updated.

The main overlap between Modeling the impact of AI safety field-building programs and the other two posts is the disclaimers, which we believe should be copied in all three posts, and the main QARY definition, which seemed significant enough to add. Beyond that, the intro post is distinct from the two analysis posts.

This post does have much in common with the Cost-effectiveness of student programs for AI safety research.  The two post are structured in an incredibly similar manner. That being said, the sections, are doing the same analysis to differen... (read more)

Of course!

We ask practitioners who have direct experience with these programs for their beliefs as to which research avenues participants pursue before and after the program. Research relevance (before/without, during, or after) is given by the sum product of these probabilities with CAIS’s judgement of the relevance of different research avenues (in the sense defined here).You can find the explicit calculations for workshops at lines 28-81 of this script, and for socials at lines 28-38 of this script.

Using workshop contenders’ research relevance without t... (read more)

4DanielFilan
I guess I just don't have a strong sense of where the practitioners' numbers are coming from, or why they believe what they believe. Which is fine if you want to bulid a pipeline that turn some intuitions into decisions, but not obviously incredibly useful for the rest of us (beyond just telling us those intuitions). The thing you link shows that if you change the conversion ratio of both programs the same amount, the relative cost-effectiveness doesn't change, which makes sense. But if workshops produced 100x more conversions than socials, or vice versa, presumably this must make a difference. If you say that the treatment effects only differ by a factor of 2, then fair enough, but that's just not super a priori clear (and the fact that you claim that (a) you can measure the TDC better and (b) the TDC has a different treatment effect makes me skeptical). (For the record, I couldn't really make heads or tails of the spreadsheet you linked or what the calculations in the script were supposed to be, but I didn't try super hard to understand them - perhaps I'd write something different if I really understood them)

Yup! The bounty is still ongoing, now funded by a different source. We have been awarding prizes throughout the duration of the bounty and will post an update in January detailing the results.

1BrianTan
I'm following up on Leon's question - have the results already been posted? If not, when will they be posted (if they will be)? I'm curious to know.  Thanks!
1Leon Lang
Has this already been posted? I could not find the post. 

Don't have a concrete definition off the top of my head, but I can try to give you a sense of what we're thinking about. "Alignment theory" for us refers to the class of work which is more "reason about alignment from first principles" rather than running actual experiments. (Happy to have a discussion on why this is our focus if the discussion would be useful?)

Examples: Risks from learned optimization, inaccessible information, most posts in Evan's list of research artifacts.