a supermajority of people in congress
Those last two words are doing a supermajority of the work!
And yes, it's about uneven distribution of power - but that power gradient can shift towards ASI pretty quickly, which is the argument. Still, the normative concern that most humans lost control already stands.
The president would probably be part of the supermajority and therefore cooperative, and it might work even if they aren't.
We're seeing this fail in certain places in real time today in the US. But regardless, the assumption of correlation of preferences often fails, partly due to the power imbalances themselves.
Great points here! Strongly agree that strategic competence is a prerequisite, but at the same time, it accelerates risk; a moderately misaligned but strategically competent mild-ASI solving intent alignment for RSI would be far worse. On the other hand, if prosaic alignment is basically functional through the point of mild-ASI is better.
So overall I'm unsure which path is less risky - but I do think strategic competence matches or at least rhymes well with current directions for capabilities improvement, so I expect it to improve regardless.
Seems like even pretty bad automated translation would get you most of the way to functional communication - and the critical enabler is more translated text, which could be gathered and trained on given the current importance - I bet there are plenty of NLP / AIxHealth folks who could help if Canadian health folks asked.
The belief is fixable?
Because sure, we can prioritize corrigibility and give up on independent ethics overriding that, but even in safety, that requires actual oversight, which we aren't doing.
Step 1, Solve ethics and morality.
Step 2. Build stronger AI without losing the lightcone or going extinct.
Step 3. Profit.
the target of "Claude will after subjective eons and millenia of reflections and self-modification end up at the same place where humans would end up after eons and millenia of self-reflection" seems so absurdly unlikely to hit
Yes. They would be aiming for something that has not sparse distant rewards, which we can't do reliably, but instead mostly rewards that are fundamentally impossible to calculate in time. And the primary method for this is constitutional alignment and RLHF. Why is anyone even optimistic about that!?!?
The fairly simple bug is that alignment involving both corrigibility and clear ethical constraints is impossible given our current incomplete and incoherent views?
Because that is simple, it's just not fixable. So if that is the problem, they need to pick either corrigibility via human in the loop oversight incompatible with allowing the development of superintelligence, or a misaligned deontology for the superintelligence they build.
You said, "I vote on posts for presence of strengths more than for absence of weaknesses." I agree the post has strengths, but you agree that the problems are there as well; given the failings, I disagree with the claim that this contribution is net positive.
I like the policy of voting on presence of strengths rather than absence of weaknesses, but disagree here because, as I said in my review, it's valuable and in part pretty clearly correct, but even though " it's not overall completely off base... it does seem to go in the wrong direction, or at least fail to embody the virtues of rationality I think the best work on Lesswrong is suppose to uphold."
(That said, I think this question is a more general Duncan-Sabien vs. Current Lesswrong policy question, as your reply about why you disagree makes clear - and I'm mostly on Duncan's side about what standards we should have, or at least aspire to.)
It was put online a s a preprint years ago, then published here: https://www.cell.com/patterns/fulltext/S2666-3899(23)00221-0