There is Dario's written testimony before Congress, which mentions existential risk as a serious possibility: https://www.judiciary.senate.gov/imo/media/doc/2023-07-26_-_testimony_-_amodei.pdf
He also signed the CAIS statement on x-risk: https://www.safe.ai/work/statement-on-ai-risk
He does start out by saying he thinks & worries a lot about the risks (first paragraph):
I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks... I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.
He then explains (second paragraph) that the essay is meant to sketch out what things could look like if things go well:
In this essay I try to sketch out what that upside might look like—what a world with powerful AI might look like if everything goes right.
I think this is a coherent thing to do?
One day a mathematician doesn’t know a thing. The next day they do. In between they made no observations with their senses of the world.
It’s possible to make progress through theoretical reasoning. It’s not my preferred approach to the problem (I work on a heavily empirical team at a heavily empirical lab) but it’s not an invalid approach.
In Towards Monosemanticity we also did a version of this experiment, and found that the SAE was much less interpretable when the transformer weights were randomized (https://transformer-circuits.pub/2023/monosemantic-features/index.html#appendix-automated-randomized).
I can also confirm (I have a 3:1 match).