One-line version of this post: What do Wittgensteinian language games and NLP word embeddings have in common?

 

Four-line version of this post: Relational, praxis-based connections between concepts, represented as “distances” in multidimensional space, capture meaning. The shorter the distance, the more related the concepts. This is how Word2vec works, what Wittgenstein was describing with “language games,” and also the way cell biologists are analyzing the peripheral blood these days. Are these relational maps the way to think about thinking?

 

Multi-line version of this post: This is my first post on LessWrong. (Hi!) I’d love to be less wrong about it.

I was sitting in a meeting that was 50% biologists and 50% computer scientists. The topic of the meeting was about ways to process multi-parametric datasets, where each cell in the peripheral blood was tagged by multiple surface markers that related back to its phenotype and therefore its identity. (The algorithm in question was t-Distributed Stochastic Neighbor Embedding.) Immunologists used to think a T-cell was a T-cell. But in that meeting, we were considering a smear of T-cells in a 32-dimensional T-cell space, clustered by their properties and functional status (activated or exhausted; killer or memory etc).

In the meeting, as I was looking at colored 2D and 3D representations that abstracted features of that higher dimensional space (activated killer T cells on the bottom left in blue; resting memory cells on top in orange; what’s that weird purple cluster in the bottom left? and so on), it occurred to me that this technique was probably good at capturing meaning across the board.

Abstracting meaning from measured distances between mapped concepts isn’t a new idea. It’s described beautifully in The Cluster Structure of Thingspace. I just wonder if we can ride it a little further into the fog.

Wittgenstein is often quoted in the Venn diagram overlap region between meaning and computation. The strongest applicable Wittgensteinian concept to this particular space is his idea of a language game. A language game is a process in which words are used according to specific rules and contexts, shaping how we understand meaning and communicate. LessWrong has discussions on the relationship between language games and truth, such as in Parasitic Language Games: maintaining ambiguity to hide conflict while burning the commons, but searching the site reveals less content directly connecting Wittgenstein to phase space, vector space, or thingspace than I’d expect.

Clustering of things in thingspace isn’t a direct Wittgensteinian language game (I don’t think). It seems more like what you’d get if you took a Wittgensteinian approach (praxis-based, relational) and used it to build a vector space for topologies of concept (i.e. for “chairness” and “birdness” and “Golden Gate Bridgeness”).

Word2vec, a natural language processing model, does a simple form of this when it represents words with similar meanings close together in vector space. It seems LLMs do a version of this, with Golden Gate Claude supporting the idea that within LLMs concepts can be topologically localized.

I don’t think there’s enough understood about language processing in the brain to say with certainty that the brain also clusters concepts like this, but I’m guessing it's quite likely.

Short distances between conceptual nodes in a vast relational web seems like a good way to convey meaning. It works for an understanding of concrete words and literal T-cell properties, but it’s also a relational process that maps back to more abstract concepts. In a way, traversing such maps, building patterns within them, running patterns through them, operating on the data they contain, is probably the best operational definition of “thinking” that I can think of.

…Thoughts?

New Comment
2 comments, sorted by Click to highlight new comments since:

Something related I haven't heard get much attention is the concept of hierarchical clustering, groups of groups of groups of nodes, in the context of language/concept space. I think that, and the idea of "what is the remaining error from prediction on level x? Can I solve some that error by predicting on a more abstract level x+1?" are two of the main organizing patterns going on in the cortex. https://en.m.wikipedia.org/wiki/Hierarchical_clustering Specifically, I think there is promise in looking at how concepts cluster in hierarchies under different randomized starting conditions, different bootstraps of data, different clustering algorithms. My prediction is that clusters which are robust to such permutations are more likely to represent clean cleavings of reality at the joints, and thus more likely to accurately represent natural abstractions, and be found in a variety of general AI models as well as in a variety of human cultures.

I'm figure out if what I think of "thinking" is fundamentally different from what you describe here - you make me think that these LLMs have a more impressionistic view of the world, where "5 + 3" gets the right answer if you've related that expression to "8" tightly enough, or you have a tight enough definition of "fiveness" "threeness" and "plus-age" you'll arrive at "8". When I do 5 + 3 or a geometric proof or analyze an if-then statement, I feel like I'm doing more than just this impressionistic closeness ranking. Do you think I'm fooling myself, or is what I'm doing fundamentally different from what an LLM is doing? 

I also feel like given this definition of thinking, an LLM is limited to creating linear combinations of existing knowledge, some combinations of which we haven't thought of yet, so it would be extremely valuable. But it still seems like it would lack the ability to forge new ground. But again, maybe I'm fooling myself as to what "creativity" truly is.