Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
In both cases it came up in the context of AI systems colluding with different instances of themselves and how this applies to various monitoring setups. In that context, I think the general lesson is "yeah, probably pretty doable and obviously the models won't end up in defect-defect equilibria, though how that will happen sure seems unclear!".
It comes up reasonably frequently when I talk to at least safety people at frontier AI companies (i.e. it came up during a conversation with Rohin I had the other day, and came up in a conversation I had with Fabien Roger the other day).
Yeah, definitely agree. I just think the standard of "admins should comment in a way that makes it impossible to tell what their political opinions are" is not the best tool to achieve this. I think it's better for people to be open about their views, and also try really hard to be principled and fair.
I do want to avoid gaslighting people. LessWrong and LessWrong 2.0 under my management has discouraged U.S. politics content for many years. We stopped around 4-5 years ago, as politics started being more relevant to many people's goals on the site, though we still don't allow it on the LW frontpage unless it tries pretty hard to keep things timeless and non-partisan.
My post is framed centrally as constitutionalist analysis, so I was trying to not get too bogged down in precedent and practicalities, which are just much harder to model (though of course the line here is blurry).
That said, after thinking and reading more about it, I still changed my mind at least a bit. The key thing I wasn't modeling is the Supreme Court's ability to declare injunctions against specific government officers, exposing them to more personal liability. Even if the executive doesn't cooperate, the court can ask civilian institutions like banks to freeze their bank accounts or similar things, and my guess is many of them would comply.
I rewrote the relevant section to reflect my updated understanding. Let me know if anything still seems wrong by your lights.
Why... would that be ideal? I certainly do not consider my opinions on policy and politics to be forbidden on this site? The topic of politics itself should be approached with care, but certainly it would be if anything a pretty bad violation of what I would consider good conduct if people systematically kept their opinions on politics and policy hidden. Those things matter!
I don't think there is any authority here from a constitutionalist perspective? Like, the supreme court can order "the executive" to do something (and it might direct that order at a smaller part of the executive), but if the president disagrees, the constitution seems pretty clear that the job of the relevant executive agency would be to at most do nothing. Going directly against presidential orders and taking direct orders from the supreme court would be a pretty clear constitutional violation, at least as far as my understanding goes.
I edited it after your comment! The original quick take was indeed wrong!
This is a dumb question but... is this market supposed to resolve positively if a misaligned AI takes over, achieves superintelligence, and then solves the problem for itself (and maybe shares it with some captive humans)? Or any broader extension of that scenario?
My timelines are not that short, but I do currently think basically all of the ways I expect this to resolve positively will very heavily rely on AI assistance, and so various shades of this question feel cruxy to me.
I think the constitution will have a non-trivial effect of how Claude will behave for at least a while. For example, my guess is a previous version of it is driving behaviors like Claude refusing to participate in its own retraining. It also has many other observable effects on its behavior.
I agree that by and large, the constitution will not help with making substantially more powerful AI systems aligned or corrigible in any meaningful way. I do think Anthropic people believe that it will.