Thank you for this comment. I'm curious to understand the source of disagreement between us, given that you generally agree with each of the sub-points. Do you really think that the chances of misalignment with LM-based AI systems is above 90%? What exactly do you mean by misalignment in this context and why do you think it's the most likely result with such AI? Do you think it will happen even if humanity sticks with the paradigm I described (of chaining pure language models while avoiding training models on open-ended tasks)?
I want to also note that my argument is less about "developing language models was counterfactually a good thing" and more "given that language models have been developed (which is now a historic fact), the safest path towards human-level AGI might be to stick with pure language models".
Thank you!
I don't have any concrete plans, but maybe.
Thank you for this comment!
I first want to note that your comment implies my post is saying things which I don't think it does (or at least I didn't mean it to):
- I didn't argue that language models are internally human-like, only externally. I do think however that compositional AI systems made of language models should be internally human-like.
- I didn't advocate for training AGI end-to-end (but rather taking the factored cognition approach).
I agree with you that a human-ish-level AGI would be VERY dangerous regardless of how aligned it is (mostly because there are 8 billion people on this planet, and it's sufficient for just a few of them to use it in stupid/evil ways to put us all in danger).