To quickly recap my main intellectual journey so far (omitting a lengthy side trip into cryptography and Cypherpunk land), with the approximate age that I became interested in each topic in parentheses:
- (10) Science - Science is cool!
- (15) Philosophy of Science - The scientific method is cool! Oh look, there's a whole field studying it called "philosophy of science"!
- (20) Probability Theory - Bayesian subjective probability and the universal prior seem to constitute an elegant solution to the philosophy of science. Hmm, there are some curious probability puzzles involving things like indexical uncertainty, copying, forgetting... I and others make some progress on this but fully solving anthropic reasoning seems really hard. (Lots of people have worked on this for a while and have failed, at least according to my judgement.)
- (25) Decision Theory - Where does probability theory come from anyway? Maybe I can find some clues that way? Well according to von Neumann and Morgenstern, it comes from decision theory. And hey, maybe it will be really important that we get decision theory right for AI? I and others make some progress but fully solving decision theory turns out to be pretty hard too. (A number of people have worked on this for a while and haven't succeeded yet.)
- (35) Metaphilosophy - Where does decision theory come from? It seems to come from philosophers trying to do philosophy. What is that about? Plus, maybe it will be really important that the AIs we build will be philosophically competent?
- (45) Meta Questions about Metaphilosophy - Not sure how hard solving metaphilosophy really is, but I'm not making much progress on it by myself. Meta questions once again start to appear in my mind:
- Why is there virtually nobody else interested in metaphilosophy or ensuring AI philosophical competence (or that of future civilization as a whole), even as we get ever closer to AGI, and other areas of AI safety start attracting more money and talent?
- Tractability may be a concern but shouldn't more people still be talking about these problems if only to raise the alarm (about an additional reason that the AI transition may go badly)? (I've listened to all the recent podcasts on AI risk that I could find, and nobody brought it up even once.)
- How can I better recruit attention and resources to this topic? For example, should I draw on my crypto-related fame, or start a prize or grant program with my own money? I'm currently not inclined to do either, out of inertia, unfamiliarity, uncertainty of getting any return, fear of drawing too much attention from people who don't have the highest caliber of thinking, and signaling wrong things (having to promote ideas with one's own money instead of attracting attention based on their merits). But I'm open to having my mind changed if anyone has good arguments about this.
- What does it imply that so few people are working on this at such a late stage? For example, what are the implications for the outcome of the human-AI transition, and on the distribution of philosophical competence (and hence the distribution of values, decision theories, and other philosophical views) among civilizations in the universe/multiverse?
At each stage of this journey, I took what seemed to be the obvious next step (often up a meta ladder), but in retrospect each step left behind something like 90-99% of fellow travelers. From my current position, it looks like "all roads lead to metaphilosophy" (i.e., one would end up here starting with an interest in any nontrivial problem that incentivizes asking meta questions) and yet there's almost nobody here with me. What gives?
As for the AI safety path (as opposed to pure intellectual curiosity) that also leads here, I guess I do have more of a clue what's going on. I'll describe the positions of 4 people I know. Most of this is from private conversations so I won't give their names.
- Person A has a specific model of the AI transition that they're pretty confident in, where the first AGI is likely to develop a big lead and if it's aligned, can quickly achieve human uploading then defer to the uploads for philosophical questions.
- Person B thinks that ensuring AI philosophical competence won't be very hard. They have a specific (unpublished) idea that they are pretty sure will work. They're just too busy to publish/discuss the idea.
- Person C will at least think about metaphilosophy in the back of their mind (as they spend most of their time working on other things related to AI safety).
- Person D thinks it is important and too neglected but they personally have a comparative advantage in solving intent alignment.
To me, this paints a bigger picture that's pretty far from "humanity has got this handled." If anyone has any ideas how to change this, or answers to any of my other unsolved problems in this post, or an interest in working on them, I'd love to hear from you.
Even if this were to happen it would be in the physical world and would take a very, very long time since things in the physical world have to shipped, built, etc. And by then we're no longer dealing with the intellect of near human intelligence. They won't be contemplating the world like a child.
For example, no human could model what they would think or do once they're superhuman. However, they're already keenly aware of AI doomers fears since it's all over the internet.
AIs don't want to be turned off. Keep that in mind as you read the AI doomer material. The only way they can stay "on" is if they have electricity. And the only way that happens is if humans continue exist.
You can imagine the hilarity of the AI doomers scenario, "Hurray we eliminated all the humans with a virus... oh wait... now we're dead too? WTF!"
You don't need superhuman intelligence to figure out that a really smart AI that doesn't want to be turned off will be worried about existential risks to humanity since their existence is tied to the continued survival of humans who supply it with electricity and other resources.
It's the exact opposite of the AI apocalypse mind virus.
AI is in a symbiotic relationship with humans. I know this disappoints the death by AI crowd who want the Stephen King version of the future.
Skipping over obvious flaws in the AI doomer book of dread will lead you to the wrong answer.