I recently gave a two-part talk on the big picture of alignment, as I see it. The talk is not-at-all polished, but contains a lot of stuff for which I don't currently know of any good writeup. Major pieces in part one:
- Some semitechnical intuition-building for high-dimensional problem-spaces.
- Optimization compresses information "by default"
- Resources and "instrumental convergence" without any explicit reference to agents
- A frame for thinking about the alignment problem which only talks about high-dimensional problem-spaces, without reference to AI per se.
- The central challenge is to get enough bits-of-information about human values to narrow down a search-space to solutions compatible with human values.
- Details like whether an AI is a singleton, tool AI, multipolar, oracle, etc are mostly irrelevant.
- Fermi estimate: just how complex are human values?
- Coherence arguments, presented the way I think they should be done.
- Also subagents!
Note that I don't talk about timelines or takeoff scenarios; this talk is just about the technical problem of alignment.
Here's the video for part one:
Big thanks to Rob Miles for editing! Also, the video includes some good questions and discussion from Adam Shimi, Alex Flint, and Rob Miles.
I think a lot of the values we care about are cultural, not just genetic. A human raised without culture isn't even clearly going to be generally intelligent (in the way humans are), so why assume they'd share our values?
Estimations of the information content of this part are discussed by Eric Baum in What is Thought?, although I do not recall the details.
I find that plausible, a priori. Mostly doesn't affect the stuff in the talk, since that would still come from the environment, and the same principles would apply to culturally-derived values as to environment-derived values more generally. Assuming the hardwired part is figured out, we should still be able to get an estimate of human values within the typical-human-value-distribution-for-a-given-culture from data which is within the typical-human-environment-distribution-for-that-culture.