Consider reading this instead.
Here is some obvious advice.
I think a common failure mode when working on AI alignment[1] is to not focus on the hard parts of the problem first. This is a problem when generating a research agenda, as well as when working on any specific research agenda. Given a research agenda, there are normally many problems that you know how to make progress on. But blindly working on what seems tractable is not a good idea.
Let's say we are working on a research agenda about solving problems A, B, and C. We know that if we find solutions to A, B, and C we will solve alignment. However, if we can't solve even one subproblem, the agenda would be doomed. If C seems like a very hard problem, that you are not sure you can solve, it would be a bad idea to flinch away from C and work on problem A instead, when A seems so much more manageable.
If solving A takes a lot of time and effort, all of that time and effort would be wasted, if you can't solve C in the end. It's especially worrisome when A has tight fightback loops, such that you constantly feel like you are making progress. Or when it is just generally fun to work on A.
Of course, it can make sense to work on A first if you expect this to help you solve C, or at least give you more information on its tractability. The general version of this is illustrated by considering that you have a large list of problems that you need to solve. In this case, focusing on problems that will provide you with information that will be helpful for solving many of the other problems can be very useful. But even then you should not lose sight of the hard problems that might block you down the road.
The takeaway is that these two things are very different:
- Solving A as an instrumental subgoal in order to make progress on C, when C is a potential blocker.
- Avoiding C, because it seems hard, and instead working on A because it seems tractable.
Though I expect this to be a general problem that comes up all over the place. ↩︎
This is another very good heuristic I think, that I agree is good to do first.
I think in alignment 2 is normally not the case, and if 2 is not the case 3 will not really help you. That's why I think it is a reasonable assumption that you need A, B, and C solved.
Your childcare example is weird because the goal is to make money, which is a continuous success metric. You can make money without solving really any of the problems you listed. I did not say it in the original article (and I should have) but this technique is for problems where the actual solution requires you to solve all the problems. It would be like making money with childcare at all requires you to have solve all the problems. You can't solve one problem a bit and then make a bit more money. If C is proven to be impossible, or way harder than some other problem set for a different way to make money, then you should switch to a different way to make money.
In alignment, if you don't solve the problem you die. You can't solve alignment 90% and then deploy an AI build with this 90% level of understanding because then the AI will still be approximately 0% aligned and kill you. We can't figure out everything about a neural network, even what objective function corresponds to human values, except whether it is deceptively misaligned and live.
Your alignment example is very strange. C is basically "Solve Alignment" whereas A and B taken together do not constitute an alignment solution at all. The idea is that you have a set of subproblems that when taken together will constitute a solution to alignment. Having "solve alignment" in this set, breaks everything. The set should be such that when we trim all the unnecessary elements (elements that are not required because some subset of our set already constitutes a solution) you don't remove anything, because all elements are necessary. Otherwise, I could add anything to the set and still end up with a set that solves alignment. The set should be (locally) minimal in size. If we trim the set that contains solve alignment, we just end up with a single element "solve alignment" and we have not simplified the problem by factoring it at all.
Or even better you make the set a tree instead, such that each node is a task, and you split nodes that are large into (ideally independent) subtasks until you can see how to solve each subpart. I guess this is superior to the original formulation. Ideally, there is not even a hardest part in the end. It should be obvious what you need to do to solve every leave node. The point where you look for the hardest part now is looking at what node to split next (the splitting might take a significant amount of time).
Thank you for telling me about the CAP problem, I did not know about it.