AI alignment researcher, ML engineer. Masters in Neuroscience.
I believe that cheap and broadly competent AGI is attainable and will be built soon. This leads me to have timelines of around 2024-2027. Here's an interview I gave recently about my current research agenda. I think the best path forward to alignment is through safe, contained testing on models designed from the ground up for alignability trained on censored data (simulations with no mention of humans or computer technology). I think that current ML mainstream technology is close to a threshold of competence beyond which it will be capable of recursive self-improvement, and I think that this automated process will mine neuroscience for insights, and quickly become far more effective and efficient. I think it would be quite bad for humanity if this happened in an uncontrolled, uncensored, un-sandboxed situation. So I am trying to warn the world about this possibility.
See my prediction markets here:
I also think that current AI models pose misuse risks, which may continue to get worse as models get more capable, and that this could potentially result in catastrophic suffering if we fail to regulate this.
I now work for SecureBio on AI-Evals.
relevant quotes:
"There is a powerful effect to making a goal into someone’s full-time job: it becomes their identity. Safety engineering became its own subdiscipline, and these engineers saw it as their professional duty to reduce injury rates. They bristled at the suggestion that accidents were largely unavoidable, coming to suspect the opposite: that almost all accidents were avoidable, given the right tools, environment, and training." https://www.lesswrong.com/posts/DQKgYhEYP86PLW7tZ/how-factories-were-made-safe
"The prospect for the human race is sombre beyond all precedent. Mankind are faced with a clear-cut alternative: either we shall all perish, or we shall have to acquire some slight degree of common sense. A great deal of new political thinking will be necessary if utter disaster is to be averted." - Bertrand Russel, The Bomb and Civilization 1945.08.18
"For progress, there is no cure. Any attempt to find automatically safe channels for the present explosive variety of progress must lead to frustration. The only safety possible is relative, and it lies in an intelligent exercise of day-to-day judgment." - John von Neumann
"I believe that the creation of greater than human intelligence will occur during the next thirty years. (Charles Platt has pointed out the AI enthusiasts have been making claims like this for the last thirty years. Just so I'm not guilty of a relative-time ambiguity, let me more specific: I'll be surprised if this event occurs before 2005 or after 2030.)" - Vernor Vinge, Singularity
Have you considered using the new PaCMAP instead of UMAP?
Good point, I should use a different term to distinguish "destructive/violent direct action" from "non-violent obstructive/inconvenient direct action".
Another thing that I've noticed is that after submitting a comment, sometimes the comment appears in the list of comments, but also the text remains in the editing textbox. This leads to sometimes thinking the first submit didn't work, and submitting again, and thus double-posting the same comment.
Other times, the text does correctly get removed from the textbox after hitting submit and the submitted comment appearing, but the page continues to think that you have unsubmitted text and to warn you about this when you try to navigate away. Again, a confusing experience that can lead to double-posting.
[edit: clarifying violent/destructive direct action vs nonviolent obstructive direct action.]
I have a different point of view on this. I think PauseAI is mistaken, and insofar as they actually succeeded at their aims would in fact worsen the situation by shifting research emphasis onto improving algorithmic efficiency.
On the other hand, I believe every successful nonviolent grassroots push for significant governance change has been accompanied by a disavowed activist fringe.
Look at the history of Ghandi's movement, of MLK's. Both set their peaceful movements up in opposition to the more violent/destructive direct-action embracing rebels (e.g. Malcom X). I think this is in fact an important aid to such a movement. If the powers-that-be find themselves choosing between cooperating with the peaceful protest movement, or radicalizing more opponents into the violent/destructive direct-action movement, then they are more likely to come to a compromise. No progress without discomfort.
Consider the alternative, a completely mild protest movement that carefully avoids upsetting anyone, and no hint of a radical fringe willing to take violent/destructive direct action. How far does this get? Not far, based on my read of history. The BATNA of ignoring the peaceful protest movement versus compromising with them is that they keep being non-upsetting. The BATNA of ignoring the peaceful protest movement when there is a violent/destructive protest movement also ongoing which is trying to recruit from the peaceful movement, is that you strengthen the violent/destructive activist movement by refusing to compromise with the peaceful movement.
I do think it's important for the 'sensible AI safety movement' to distance itself from the 'activist AI safety movement', and I intend to be part of the sensible side, but that doesn't mean I want the activist side to stop existing.
Firefox, windows
Couple of UI notes:
on mobile, there's a problem with bullet-points squishing text too much. I'd rather go without the indentation at all than allow the indentation to smoosh the text into unreadability.
Nice work! Quite comprehensive and well explained. A lot of overlap with existing published ideas, but also some novel ideas.
I intend to give some specific suggestions for details I think are worth adding. In the meantime, I'm going to add a link to your work to my list of such works at the top of my own (much shorter) AI safety plan. I recommend you take a look at mine, since mine is short but touches on a few points I think could be valuable to your thinking. Also, the links and resources in mine might be of interest to you if there's some you haven't seen.
Yeah, for sure. My technique is trying to prepare in advance for more capable future models, not something we need yet.
The idea is that if there's a future model so smart and competent that it's able to scheme very successfully and hide this scheming super well, then impairing the model's capabilities in a broad untargeted way should lead to some telling slip-ups in its deception.
I agree with Joe Carlsmith that this seems like goal guarding.
I would be interested to see if my team's noise-injection technique interferes with these behaviors in a way that makes them easier to detect.
It's faster and more logical in its theoretical underpinning, and generally does a better job than UMAP.
Not a big deal, I just like letting people know that there's a new algorithm now which seems like a solid pareto improvement.