Recently, I've been mulling over the question of whether it was a good idea or not to join a frontier AI company's safety team for the purposes of reducing extinction risk. One of my big cons was something like:
Jay, you think the incentives are less likely to affect you compared to most people. But most AI safety people who join frontier labs probably think this. You will be affected as well.
So I decided on a partial mitigation strategy, entirely as a precautionary principle and not at all because I thought I needed to. I committed to myself and to several people I'm close to that if I were to join a frontier lab safety team, I would donate 100% of the surplus that I would gain as a result of taking that job instead of a less lucrative job somewhere else.
At this time I was applying for a few jobs, one of which was at a frontier company. Approximately immediately, my System 1 became way less interested in that job. And I didn't even have an offer in hand for a specific amount of money. I don't have good reasons to care a lot about getting more money for myself. I have enough already, and I voluntarily live well below my means. This did not stop the effect from existing, and I didn't notice the effect before. I still don't notice the effect on my thinking in a vacuum. I only notice it by doing a mental side-by-side comparison.
I now think anyone who is considering joining a frontier company in order to reduce extinction risk should make this same commitment as a basic defensive measure against perverted incentives. I am sure there exist people who are entirely indifferent to money in this way - this is at least partially a skill issue on my part. But it does seem that "Thinking you are indifferent to the money" is not a reliable signal that your thinking is unaltered by it.
This is also an opportunity to say that, if I ever do join a frontier safety team, I officially give you permission to ask me if I'm meeting this commitment of mine in conversation, even if