Where can I find a post or article arguing that the internal cognitive model of contemporary LLMs is quite alien, strange, non-human, even though they are trained on human text and produce human-like answers, which are rendered "friendly" by RLHF?
To be clear, I am not asking about the following, which I am familiar with:
- The original of the shoggoth meme and its relation to H.P. Lovecraft's shoggoth
- The notion that the space of possible minds is very large, with humanity only a small part
- Eliezer Yudkowsky's description of evolution as Azathoth, the blind idiot god, as a way of showing that "intelligences" can be quite incomprehensible
- The difference in environments between the training and the runtime phase of an LLM
- The fact that machine-learning systems like LLMs are not really neuromorphic; they are structured differently from human brains (though that fact does not exclude the possibility of similarly on a logical level)
Rather, I am looking for a discussion of evidence that the LLMs internal "true" motivation or reasoning system is very different from human, despite the human output, and that in outlying environmental conditions, very different from the training environment, it will behave very differently. A good argument might analyze bits of weird inhuman behavior to try to infer the internal model.
(All I found on the shoggoth idea on LessWrong is this article contrasts the idea of the shoggoth with the idea that there is no coherent model, but does not explain why we might think that there is an alien cognitive model. This one likewise mentions the idea but does not argue for its correctness.)
[Edit: Another user corrected my spelling: shoggoth, not shuggoth.]
Can you say more about what you mean by "Where can I find a post or article arguing that the internal cognitive model of contemporary LLMs is quite alien, strange, non-human, even though they are trained on human text and produce human-like answers, which are rendered "friendly" by RLHF?"
Like, obviously it's gonna be alien in some ways and human-like in other ways. Right? How similar does it have to be to humans, in order to count as not an alien? Surely you would agree that if we were to do a cluster analysis of the cognition of all humans alive today + all LLMs, we'd end up with two distinct clusters (the LLMs and then humanity) right?
OK, thanks.
Your answer to my first question isn't really an answer -- "they will, if sufficiently improved, be quite human--they will behave in a quite human way." What counts as "quite human?" Also are we just talking about their external behavior now? I thought we were talking about their internal cognition.
You agree about the cluster analysis thing though -- so maybe that's a way to be more precise about this. The claim you are hoping to see argued for is "If we magically had access to the cognition of all current humans and LLMs, with mechinterp ... (read more)