I recently gave a two-part talk on the big picture of alignment, as I see it. The talk is not-at-all polished, but contains a lot of stuff for which I don't currently know of any good writeup. Major pieces in part one:
- Some semitechnical intuition-building for high-dimensional problem-spaces.
- Optimization compresses information "by default"
- Resources and "instrumental convergence" without any explicit reference to agents
- A frame for thinking about the alignment problem which only talks about high-dimensional problem-spaces, without reference to AI per se.
- The central challenge is to get enough bits-of-information about human values to narrow down a search-space to solutions compatible with human values.
- Details like whether an AI is a singleton, tool AI, multipolar, oracle, etc are mostly irrelevant.
- Fermi estimate: just how complex are human values?
- Coherence arguments, presented the way I think they should be done.
- Also subagents!
Note that I don't talk about timelines or takeoff scenarios; this talk is just about the technical problem of alignment.
Here's the video for part one:
Big thanks to Rob Miles for editing! Also, the video includes some good questions and discussion from Adam Shimi, Alex Flint, and Rob Miles.
Cultural accumulation and google, but that's mimicking someone who's already figured it out. How about the person who first figured out eg crop growth? Could be scientific method, but also just random luck which then caught on.
Additionally, sometimes it's just applying the same hammers to different nails or finding new nails, which means that there are general patterns (hammers) that can be applied to many different situations. There's bits of information in both the patterns themselves and when to apply them, though I feel confused trying to connect these ideas here.
People specifically have inner simulations (ie you can imagine what it'd look like to drop a bowling ball off a building even if you've never seen it) from things you have lots of experience with is a way of applying different patterns to new situations.