All of Alexandra Bos's Comments + Replies

Hi Joe, thanks a lot for your thoughtful comment! We think you're making some valid points here and will take your suggestions and questions into consideration

Hi there, thanks for bringing this up. There are a few ways we're planning to reduce the risk of us incubating orgs that end up fast-tracking capabilities research over safety research. 

Firstly, we want to select for a strong impact focus & value-alignment in participants. 

Secondly, we want to assist the founders to set up their organization in a way that limits the potential for value drift (e.g. a charter for the forming organization that would legally make this more difficult, choosing the right legal structure, and helping them with vetting or suggestions for who you can best take on as an investor or board member)

If you have additional ideas around this we'd be happy to hear them.

Retain an option to buy the org later for a billion dollars, reducing their incentive to become worth more than a billion dollars.

Hi, thanks for asking! We're moving forward, got funding from Lightspeed, and plan to run our pilot in Q4 of this year.  You can subscribe at the bottom of catalyze-impact.org if you want to make sure to stay in the loop about sign-ups and updates

To share some anecdotal data: I personally have had positive experiences doing regular coaching calls with Kat this year and feel that her input has been very helpful. 

I would encourage us all to put off updating until we also get the second side of the story - that generally seems like good practice to me whenever it is possible.

(also posted this comment on the EA forum)

2Adam Zerner
I'm not sure if it was intended as such, but I see this to be very weak evidence for, for lack of a better phrase, the "correct judgement" being "in favor of" rather than "against" Nonlinear. I say this because people who are manipulative and unethical (and narcissistic, and sociopathic...) tend to also be capable of being very charming, likable, and helpful. So I think it is very possible that Kat both is "significantly in the wrong" about various things while also having lots of positive interactions with others, in such a way that would make you think "surely someone so nice and friendly would never do all of these other unethical things".

I strongly disagree with the premise that we haven't gotten the second side of the story.

I actually believe that the Bayesian evidence for what the second side of the story is is quite strong.

  • As Ben explains, Nonlinear was given three hours to provide their side of the story. I would strongly expect there to be a Pareto Principle thing that applies here. In the first hour, I'd expect that -- let's just make up numbers -- 70% of the thrust (ie. the big idea, not necessarily every little detail) of the "second side of the story" would be provided. Then in th
... (read more)

So there's a danger of: "I read the accusation, the response comes out and for whatever reason I don't see it or I put it on my to-read list and forget, and I come out believing the false accusation".

There's also a danger of: "I don't read the accusation, or read it and withhold judgment, pending the response. Then the response doesn't come out when it was promised, and I think oh, these things sometimes take longer than expected, it'll be here soon. And at some point I just forget that it never came out at all." Or: "Then when the response comes out, it's been long enough since I read the original that I don't notice it's actually a pretty weak response to it."

So I dunno what good policy is in general.

Hi, I'd encourage you to apply if you recognize yourself in the About you section!

When in doubt always apply is my motto personally

I'd be curious to hear from the people who pressed the disagreement button on Evan's remark:  what part of this do you disagree with or not recognize?

4Thomas Kwa
I didn't hit disagree, but IMO there are way more than "few research directions" that can be accessed without cutting-edge models, especially with all the new open-source LLMs. * All conceptual work: agent foundations, mechanistic anomaly detection, etc. * Mechanistic interpretability, which when interpreted broadly could be 40% of empirical alignment work * Model control like the nascent area of activation additions I've heard that evals, debate, prosaic work into honesty, and various other schemes need cutting-edge models, but in the past few weeks transitioning from mostly conceptual work into empirical work, I have far more questions than I have time to answer using GPT-2 or AlphaStar sized models. If alignment is hard we'll want to understand the small models first.

I was thinking about helping with infrastructure around access to large amounts of compute but had not considered trying to help with access to cutting-edge models but I think it might be a very good suggestion. Thanks for sharing your thoughts!