I see a lot of energy and interest being devoted toward detecting deception in AIs, trying to make AIs less deceptive, making AIs honest, etc. But I keep trying to figure out why so many think this is very important. For less-than-human intelligence, deceptive tactics will likely be caught by...
You are sitting in your bedroom, dorm, tent, sidewalk, wherever you are staying right now, reading these words, when a dark void appears in your peripheral vision. A figure walks out of the void. They look a bit different, but also familiar. Surprise! It’s you from the future! And you...
Why haven't more people who work on alignment read Parallel Distributed Processing or even seem at all familiar with Rumelhart's work? This is the fundamental model of cognition and behaviour that all of modern AI is built on, the work that Hinton used for most of his insights. The model...
A whole lot of Alignment work seems to be resource-constrained. Many funders have talked about how they were only able to give grants to a small percentage of projects and work they found promising. Many researchers also receive a small fraction of what they could make in the for-profit sector...
AI Timelines as a Hydra Think of current timelines as a giant hydra. You can’t exactly see where the head is, and you don’t know exactly if you’re on the neck of the beast or the body. But you do have some sense of what a hydra is, and the...
This is experimenting with a new kind of post which is meant to convey a lot of ideas very quickly, without going into much detail for each. Things I wish people in AI Safety would stop talking about A list of topics people concerned about x-risk from AI spend, in...
The goal of this prize is to generate ideas and plans for a long-term, positive future. The starting prize pool is $500 USD, though others are welcome to add to it. We are going to pretend for this exercise that the technical part of AI Alignment is already solved. Your...