Activations in LLMs are linearly mappable to activations in the human brain. Imo this is strong evidence for the idea that LLMs/NNs in general acquire extremely human like cognitive patterns, and that the common "shoggoth with a smiley face" meme might just not be accurate
That surprisingly straight line reminds me of what happens when you use noise to regularise an otherwise decidedly non linear function: https://www.imaginary.org/snapshot/randomness-is-natural-an-introduction-to-regularisation-by-noise
I think this is a really cool research agenda. I can also try to give my "skydiver's perspective from 3000 miles in the air" overview of what I think expected free energy minimisation means, though I am by no means an expert. Epistemic status: this is a broad extrapolation of some intuitions I gained from reading a lot of papers, it may be very wrong.
In general, I think of free energy minimisation as a class of solutions for the problem of predicting complex systems behaviour, in line with other variational principles in physics. Thus, it is an attempt to use simple physical rules like "the ball rolls down the slope" to explain very complicated outcomes like "I decide to build a theme park with roller coasters in it". In this case, the rule is "free energy is minimised", but unlike a simple physical system whose dimensionality is very literally visible, VFE is minimised in high dimensional probability spaces.
Consider the concrete case below: there are five restaurants in a row and you have to pick one to go to. The intuitive physical interpretation is that you can be represented by a point particle moving to one of five coordinates, all relatively close by in the three dimensional XYZ coordinate space. However, if we assume that this is just some standard physical process you'll end up with highly unintuitive behaviour (why does the particle keep drifting right and left in the middle of these coordinates, and then eventually go somewhere that isn't the middle?). Instead we might say that in an RL sense there is a 5 dimensional action space and you must pick a dimension to maximise expected reward. Free energy minimisation is a rule that says that your action is the one that minimises variation between the predicted outcome your brain produces and the final outcome that your brain observes---which can happen either if your brain is very good at predicting the future or if you act to make your prediction come true. A preference in this case is a bias in the prediction (you can see yourself going to McDonald's more, in some sense, and you feel some psychological aversion/repulsive force moving you away from Burger King) that is then satisfied by you going to the restaurant you are most attracted to. Of course this is just a single agent interpretation and with multiple subagents you can imagine valleys and peaks in the high dimensional probability space, which is resolved when you reach some minima that can be satisfied by action.
It's hard to empathise with dry numbers, whereas a lively scenario creates an emotional response so more people engage. But I agree that this seems to be very well done statistical work.
Hey, thank you for taking the time to reply honestly and in detail as well. With regards to what you want, I think that this is in many senses also what I am looking for, especially the last item about tying in collective behaviour to reasoning about intelligence. I think one of the frames you might find the most useful is one you've already covered---power as a coordination game. As you alluded to in your original post, people aren't in a massive hive mind/conspiracy---they mostly want to do what other successful people seem to be doing, which translates well to a coordination game and also explains the rapid "board flips" once a critical mass of support/rejection against some proposition is reached. For example, witness the rapid switch to majority support of gay marriage in the 2010s amongst the population in general.
Would also love to discuss this with you in more detail (I trained as an English student and also studied Digital Humanities). I will leave off with a few book suggestions that, while maybe not directly answering your needs, you might find interesting.
P.S. Re: the point about Yarvin being right, betting on the dominant group in society embracing a dangerous delusion is a remarkably safe bet. (E.g. McCarthyism, the aforementioned Bavarian Witch Hunts, fascism, lysenkoism etc.)
Hey, really enjoyed your triple review on power lies trembling, but imo this topic has been... done to death in the humanities, and reinventing terminology ad hoc is somewhat missing the point. The idea that the dominant class in a society comes from a set of social institutions that share core ideas and modus operandi (in other words "behaving as a single organisation") is not a shocking new phenomenon of twentieth century mass culture, and is certainly not a "mystery". This is basically how every country has developed a ruling class/ideology since the term started to have a meaning, through academic institutions that produce similar people. Yale and Harvard are as Oxford and Cambridge, or Peking University and Renmin University. (European universities, in particular, started out as literal divinity schools, and hence are outgrowths of the literal Catholic church, receiving literal Papal bulls to establish themselves as one of the studia generalia.) [Retracted, while the point about teaching religious law and receiving literal papal bulls is true the origins of the universities are much more diverse. But my point about the history of cultural hegemony in such institutions still stands.]
What Yarvin seems to be annoyed by is that the "Cathedral consensus" featured ideas that he dislikes, instead of the quasi-feudal ideology of might makes right that he finds more appealing. That is also not surprising. People largely don't notice when they are part of a dominant class and their ideas are treated as default: that's just them being normal, not weird. However, when they find themselves at the edge of the overton window, suddenly what was right and normal becomes crushing and oppressive. The natural dominance of sensible ideas and sensible people becomes a twisted hegemony of obvious lies propped up by delusional power-brokers. This perspective shift is also extremely well documented in human culture and literature.
In general, the concept that a homogenous ruling class culture can then be pushed into delusional consensuses which ultimately harms everyone is an idea as old as the Trojan War. The tension between maintaining a grip on power and maintaining a grip on reality is well explored in Yuval Noah Harari's book Nexus (which also has an imo pretty decent second half on AI). In particular I direct you to his account of the Bavarian witch hunts. Indeed, the unprecedented feature of modern society is the rapid divergence in ideas that is possible thanks to information technology and the cultivation of local echo chambers. Unfortuantely, I have few simple answers to offer to this age old question, but I hope that recognising the lineage of the question helps with disambiguation somewhat. I look forward to your ideas about new liberalisms.
Yeah, I'm not gonna do anything silly (I'm not in a position to do anything silly with regards to the multitrillion param frontier models anyways). Just sort of "laying the groundwork" for when AIs will cross that line, which I don't think is too far off now. The movie "Her" is giving a good vibe-alignment for when the line will be crossed.
Ahh, I was slightly confused why you called it a proposal. TBH I'm not sure why only 0.1% instead of any arbitrary percentage between (0, 100]. Otherwise it makes good logical sense.
Hey, the proposal makes sense from an argument standpoint. I would refine slightly and phrase as "the set of cognitive computations that generate role emulating behaviour in a given context also generate qualia associated with that role" (sociopathy is the obvious counterargument here, and I'm really not sure what I think about the proposal of AIs as sociopathic by default). Thus, actors getting into character feel as if they are somehow sharing that character's emotions.
I take the two problems a bit further, and would suggest that being humane to AIs may necessarily involve abandoning the idea of control in the strict sense of the word, so yes treating them as peers or children we are raising as a society. It may also be that the paradigm of control necessarily means that we would as a species become more powerful (with the assistance of the AIs) but not more wise (since we are ultimately "helming the ship"), which would be in my opinion quite bad.
And as for the distinction between today and future AI systems, I think the line is blurring fast. Will check out Eleos!
This has shifted my perceptions of what is in the wild significantly. Thanks for the heads up.