Someone asked me about this, so here are my quick thoughts.
Although I've learned a lot of math over the last year and a half, it still isn't my comparative advantage. What I do instead is,
Find a problem
that seems plausibly important to AI safety (low impact), or a phenomenon that's secretly confusing but not really explored (instrumental convergence). If you're looking for a problem, corrigibility strikes me as another thing that meets these criteria, and is still mysterious.
Think about the problem
Stare at the problem on my own, ignoring any existing thinking as much as possible. Just think about what the problem is, what's confusing about it, what a solution would look like. In retrospect, this has helped me avoid anchoring myself. Also, my prior for existing work is that it's confused and unhelpful, and I can do better by just thinking hard. I think this is pretty reasonable for a field as young as AI alignment, but I wouldn't expect this to be true at all for e.g. physics or abstract algebra. I also think this is likely to be true in any field where philosophy is required, where you need to find the right formalisms instead of working from axioms.
Therefore, when thinking about whether "responsibility for outcomes" has a simple core concept, I nearly instantly concluded it didn't, without spending a second glancing over the surely countless philosophy papers wringing their hands (yup, papers have hands) over this debate. This was the right move. I just trusted my own thinking. Lit reviews are just proxy signals of your having gained comprehension and coming to a well-considered conclusion.
Concrete examples are helpful: at first, thinking about vases in the context of impact measurement was helpful for getting a grip on low impact, even though it was secretly a red herring. I like to be concrete because we actually need solutions - I want to learn more about the relationship between solution specifications and the task at hand.
Make simplifying assumptions wherever possible. Assume a ridiculous amount of stuff, and then pare it down.
Don't formalize your thoughts too early - you'll just get useless mathy sludge out on the other side, the product of your confusion. Don't think for a second that having math representing your thoughts means you've necessarily made progress - for the kind of problems I'm thinking about right now, the math has to sing with the elegance of the philosophical insight you're formalizing.
Forget all about whether you have the license or background to come up with a solution. When I was starting out, I was too busy being fascinated by the problem to remember that I, you know, wasn't allowed to solve it.
Obviously, there are common-sense exceptions to this, mostly revolving around trying to run without any feet. It would be pretty silly to think about logical uncertainty without even knowing propositional logic. One of the advantages of immersing myself in a lot of math isn't just knowing more, but knowing what I don't know. However, I think it's rare to secretly lack the basic skills to even start on the problem at hand. You'll probably know if you are, because all your thoughts keep coming back to the same kind of confusions about a formalism, or something. Then, you look for ways to resolve the confusion (possibly by asking a question on LW or in the MIRIx Discord), find the thing, and get back to work.
Stress-test thoughts
So you've had some novel thoughts, and an insight or two, and the outlines of a solution are coming into focus. It's important not to become enamored with what you have, because it stops you from finding the truth and winning. Therefore, think about ways in which you could be wrong, situations in which the insights don't apply or in which the solution breaks. Maybe you realize the problem is a bit ill-defined, so you refactor it.
The process here is: break the solution, deeply understand why it breaks, and repeat. Don't get stuck with patches; there's a rhythm you pick up on in AI alignment, where good solutions have a certain flavor of integrity and compactness. It's OK if you don't find it right away. The key thing to keep in mind is that you aren't trying to pass the test cases, but rather to find brick after brick of insight to build a firm foundation of deep comprehension. You aren't trying to find the right equation, you're trying to find the state of mind that makes the right equation obvious. You want to understand new pieces of the world, and maybe one day, those pieces will make the difference.
ETA: I think a lot of these skills apply more broadly. Emotional trust in one's own ability to think seems important for taking actions that aren't e.g. prescribed by an authority figure. Letting myself just think lets me be light on my mental feet, and bold in where those feet lead me.
ETA 2: Apparently simulating drop-caps:
ike this
isn't the greatest idea. Formatting edit.
This is a straw interpretation of what I'm trying to communicate. This argument isn't addressing the actual norm I plan on enforcing, and seems to instead cast me as walling myself off from anything I might not like.
The norm I'm actually proposing is that if you see an easy way to make an abrasive thing less abrasive, you take it. If the thing still has to be abrasive, that's fine. Remember, I said
Am I to believe that if people can't say the thing that first comes to mind in the heat of the moment, there isn't any way to express the same information? What am I plausibly losing out on by not hearing "stooping to X" instead of "resorting to X"? That Said first thought of "stooping"? That he's passionate about typography?
I don't see many reasonable situations where this kind of statement
doesn't suffice (again, this isn't a "template"; it's just an example of a reasonable level of frankness).
I've been actively engaged with LW for a good while, and this issue hasn't come up until now (which makes me think it's uncommon, thankfully). Additionally, I've worked with people in the alignment research community. No one has seemed to have trouble communicating efficiently in a minimally abrasive fashion, and I don't see why that will change.
I don't currently plan on commenting further on this thread, although I thank both you and Said for your contributions and certainly hope you keep commenting on my posts!