Update (March 12): Transformer published on this. Their article (and comments here) note that there has already been some public discussions, which I hadn't seen. Many Anthropic employees, especially, are sympathetic to AI safety and (will) have lots of money. This is something that is being talked about a lot...
I'm writing this post to share some of my thinking about situational awareness, since I'm not sure others are thinking about it this way. For context, I think situational awareness is a critical part of the case for rogue AI and scheming-type risks. But incredibly, it seems to have been...
There are three broad types of approach I see for making interpretability rigorous. I've put them in ascending order of how much assurance I think they can provide. I think they all have pros and cons, and am generally in favor of rigor. 1. (weakest) Practical utility: Does this interpretability...
People want to measure and track gradual disempowerment. One issue with a lot of the proposals I've seen is that they don't distinguish between empowering and disempowering uses of AI. If everyone is using AI to write all of their code, that doesn't necessarily mean they are disempowered (in an...
AI companies are explicitly trying to build AIs that are smarter than humans, despite clear signs that it might lead to human extinction. It will be tragic and ironic if humanity’s largest project ever is an all-out race to destroy ourselves. But can we really stop building more and more...
https://evitable.com/ Our mission is to inform and organize the public to confront societal-scale risks of AI, and put an end to the reckless race to develop superintelligence. We're hiring for 3 roles: 1) Operations Associate or Head of Operations 2) Communications Associate or Head of Communications 3) Chief of Staff...