All of philipn's Comments + Replies

Thank you for this. This is very close to what I was hoping to find!

It looks like Benjamin Hilton makes a rough guess of the proportion of workers dedicated to AI x-risk for each organization. This seems appropriate for assessing a rough % across all organizations, but if we want to nudge organizations to employ more people toward alignment then I think we want to highlight exact figures.

E.g. we want to ask the organizations how many people they have working on alignment and then post what they say - a sort of accountability feedback loop. 

You mention the number of people at OpenAI doing alignment work.  I think it would be helpful to compile a list of the different labs and the number of people that can be reasonably said to be doing alignment work.  Then we could put together a chart of sorts, highlighting this gap.

Highlighting gaps like this is a proven and effective strategy to drive change when dealing with various organizational-level inequities.

If people reading this comment have insight into the number of people at the various labs doing alignment work and/or the total number of people at said labs: please comment here!

6Aaron_Scher
A few people have already looked into this. See footnote 3 here. 

Could you elaborate on "For NN Model 1, the belief is encoded in the learned parameters . For NN Model 2, the belief is encoded in the architecture itself "?

5philh
If A=AT (i.e. A is symmetric), then xTAy=yTAx. The first model would (we suppose) learn a symmetric A, because in reality siblingness is symmetric. The second model uses a matrix that will always be symmetric, no matter what it's learned. (In reality the first model presumably wouldn't learn an exactly-symmetric matrix, but we could talk about "close enough" and/or about behavior in the limit.)