This is an experiment in short-form content on LW2.0. I'll be using the comment section of this post as a repository of short, sometimes-half-baked posts that either:
- don't feel ready to be written up as a full post
- I think the process of writing them up might make them worse (i.e. longer than they need to be)
I ask people not to create top-level comments here, but feel free to reply to comments like you would a FB post.
High Stakes Value and the Epistemic Commons
I've had this in my drafts for a year. I don't feel like the current version of it is saying something either novel or crisp enough to quite make sense as a top-level post, but wanted to get it out at least as a shortform for now.
There's a really tough situation I think about a lot, from my perspective as a LessWrong moderator. These are my personal thoughts on it.
The problem, in short:
Sometimes a problem is epistemically confusing, and there are probably political ramifications of it, such that the most qualified people to debate it are also in conflict with billions of dollars on the line and the situation is really high stakes (i.e. the extinction of humanity) such that it really matters we get the question right.
Political conflict + epistemic murkiness means that it's not clear what "thinking and communicating sanely" about the problem look like, and people have (possibly legitimate) reasons to be suspicious of each other's reasoning.
High Stakes means that we can't ignore the problem.
I don't feel like our current level of rationalist discourse patterns are sufficient for this combo of high stakes, political conflict, and epistemic murkiness.
Spelling out some concrete examples
Interventions that help with AI extinction risk are often hard to evaluate. Reasonable people can disagree whether a project ranges from "highly net positive" to "highly net negative". Smart people I know have fairly different strategic perspectives on how humanity can survive the 21st century.
Sometimes these disagreements are political – is pivotal acts a helpful frame or a harmful one? How suspicious should we be of safetywashing?
Sometimes these disagreements are more technical. How will differential technology play out? I've heard some arguments that improving alignment techniques on current-generation ML systems may be negative, because a) it won't actually help align powerful AI systems past the sharp left turn, and meanwhile b) makes it easier and more profitable to deploy AI in ways that could start to escalate beyond our control (killing us in slightly more mundane ways than the fast takeoff scenarios).
I've heard arguments that even interpretability, which you'd think is a purely positive source of information, is also helpful for capabilities (in particular if the interpretability is actually any good). And maybe you actually need a lot of interpretability before the alignment benefits outweigh the capability gains.
Some disagreements are the intersection of political, technical, and psychological.
You might argue that people in AGI companies are motivated by excitement over AI, or making money, and are only paying lip service to safety. Your beliefs about this might include "their technical agenda doesn't make any sense to you" as well as "you have a strong guess about what else might be motivating their agenda."
You might think AI Risk advocates are advocating pivotal acts because of a mix of trauma, finding politics distasteful, and technical mistakes regarding the intersection of boundaries and game theory.
This is all pretty gnarly, because
I have some guesses about how to think about this, but I feel some confusion about them. And it feels pedagogically bad for this to end with "and therefore, [Insert some specific policy or idea]" rather than "okay, what is the space of considerations and desiderata here?" a la Hold Off On Proposing Solutions.
Your memory is fine, I was writing badly -- I meant the ideas I would propose rather than the ideas I have proposed by "proposed ideas." The flavor would be something super-empiricist like this, not that I endorse that as perfect. I do think ideas without empirical restraint loom too large in the collective.