I wouldn't say that my experience with ChatGPT is in total agreement with your conclusion yet you're raising a good point and the distinction is helpful. I remember of conversations in which the chatbot would both acknowledge and challenge my viewpoint, which I must admit is quite appreciated and not systematic in the biological realm. On the other hand, indeed it is common that pushing the chatbot to buy my arguments and adopt my stance be fairly easy.
Somehow it's very related to humanlike intelligence; that is, when training an LLM-based chatbot[1] by reinforcement, the positive (rewarding) feedback comes from both confirmation of the interlocutor's beliefs and matters like veracity, ethics, ... It's also what we humans have been experiencing.
Why and how does it rise to a whole new level when it comes to AI? I tend to think that we must understand the technologies we are using, so it's our responsibility to use chatbots properly and leverage their capabilities. When talking with a child, or a yound student, or generally someone you know is a newcomer, we adapt our questions, arguments, and the way we process their responses. It's not an exact science for sure, but there's no reason to expect so with chatbots.
It seems more accurate than LLMs as those have not yet been trained to have a chat with you
Nice work, I'm looking forward to a full-fledged paper.
Pardon me if the following comments are inaccurate. It's non-expert curiosity.
I really like the endeavor of estimating tail-risk, which you say being excited about as well. Yet it seems that you quickly reduce tail-risk to catastrophe and thus the problem is reduced to a binary classification. Given the very low probability of triggerring a catastrophes the problem may be intractable. You might be making it a problem of critical threshold detection whereas tail-risk is most likely not a binary switch.
Could you instead, say, generalize the problem and consider that the risk is not necessarily catastrophic? This might also find many more applications as the topic is very relevant to a large part of AI models while, in contrast, aspiring superintelligences able of designing doomsday machines remain a tiny proportion of applications. Finally, this would also simplify your empirical investigations.
That is, could you develop a general framework that includes X-risk but that is not limited to it?
Relatable.
Giorgio Parisi mentionned this in his book; he said that the ah-ah moments tend to spark randomly when doing something else. Bertrand Russell had a very active social life (he praised leisure) and believed it is an active form of idleness that could reveal very productive. A good balance might be the best way to leverage it.