Advanced AI systems could lead to existential risks via several different pathways, some of which may not fit neatly into traditional risk forecasts. Many previous forecasts, for example the well known report by Joe Carlsmith, decompose a failure story into a conjunction of different claims, and in doing so risk...
Written quickly rather than not at all, I was thinking about this a couple of days ago, and decided to commit to writing something by today rather than adding this idea to my list of a million things to write up. This post describes a toy alignment problem that I’d...
Written quickly rather than not at all, as I've described this idea a few times and wanted to have something to point at when talking to people. 'Quickly' here means I was heavily aided by a language model while writing, which I want to be up-front about given recent discussion....
Epistemic status: trying to unpack a fuzzy mess of related concepts in my head into something a bit cleaner. A lot of my concern about risks from advanced AI is because of the possibility of deceptive alignment. Deceptive alignment has already been discussed in detail on this forum, so I...
I read through Eliezer’s “AGI Ruin: A List of Lethalities” post, and wrote down my reactions* as I went. I tried to track my internal responses, *before* adjusting for my perception of the overall thoughts of the field. In order to get this written and posted, I aimed to write...
This is a crosspost from the EA forum. It refers to EAs and the EA community a couple of times, but as it is essentially just about a nice norm and decision making, it seemed worth having here too. There are a lot of things about this community that I...