MIRI has an organizational goal of putting a wider variety of mathematically proficient people in a position to advance our understanding of beneficial smarter-than-human AI. The MIRIx workshops, our new research guide, and our more detailed in-the-works technical agenda are intended to further that goal.
To encourage the growth of a larger research community where people can easily collaborate and get up to speed on each other's new ideas, we're also going to roll out an online discussion forum that's specifically focused on resolving technical problems in Friendly AI. MIRI researchers and other interested parties will be able to have more open exchanges there, and get rapid feedback on their ideas and drafts. A relatively small group of people with relevant mathematical backgrounds will be authorized to post on the forum, but all discussion on the site will be publicly visible to visitors.
Topics will run the gamut from logical uncertainty in formal agents to cognitive models of concept generation. The exact range of discussion topics is likely to evolve over time as researchers' priorities change and new researchers join the forum.
We're currently tossing around possible names for the forum, and I wanted to solicit LessWrong's input, since you've been helpful here in the past. (We're also getting input from non-LW mathematicians and computer scientists.) We want to know how confusing, apt, etc. you perceive these variants on 'forum for doing exploratory engineering research in AI' to be:
1. AI Exploratory Research Forum (AIXRF)
2. Forum for Exploratory Engineering in AI (FEEAI)
3. Forum for Exploratory Research in AI (FERAI, or FXRAI)
4. Exploratory AI Research Forum (XAIRF, or EAIRF)
We're also looking at other name possibilities, including:
5. AI Foundations Forum (AIFF)
6. Intelligent Agent Foundations Forum (IAFF)
7. Reflective Agents Research Forum (RARF)
We're trying to avoid names like "friendly" and "normative" that could reinforce someone's impression that we think of AI risk in anthropomorphic terms, that we're AI-hating technophobes, or that we're moral philosophers.
Feedback on the above ideas is welcome, as are new ideas. Feel free to post separate ideas in separate comments, so they can be upvoted individually. We're especially looking for feedback along the lines of: 'I'm a grad student in theoretical computer science and I feel that the name [X] would look bad in a comp sci bibliography or C.V.' or 'I'm friends with a lot of topologists, and I'm pretty sure they'd find the name [Y] unobjectionable and mildly intriguing; I don't know how well that generalizes to mathematical logicians.'
The stability under self-modification is a core problem of AGI generally, isn't it? So isn't that an effort to solve AGI, not safety/friendliness (which would be fairly depressing given its stated goals)? Does MIRI have a way to define safety/friendliness that isn't derivative of moral philosophy?
Additionally, many human preferences are almost certainly not moral... surely a key part of the project would be to find some way to separate the two. Preference satisfaction seems like a potentially very unfriendly goal...
If you want to build an unfriendly AI, you probably don't need to solve the stability problem. If you have a consistently self-improving agent with unstable goals, it should eventually (a) reach an intelligence level where it could solve the stability problem if it wanted to, then (b) randomly arrive at goals that entail their own preservation, then (c) implement the stability solution before the self-preserving goals can get overwritten. You can delegate the stability problem to the AI itself. The reason this doesn't generalize to friendly AI is that this process doesn't provide any obvious way for humans to determine which goals the agent has at step (b).