This post is styled after conversations we’ve had in the course of our research, put together in a way that hopefully highlights a bunch of relatively recent and (ironically) hard-to-articulate ideas around natural abstractions.
John: So we’ve been working a bit on semantics, and also separately on fluid mechanics. Our main goal for both of them is to figure out more of the higher-level natural abstract data structures. But I’m concerned that the two threads haven’t been informing each other as much as they should.
David: Okay…what do you mean by “as much as they should”? I mean, there’s the foundational natural latent framework, and that’s been useful for our thinking on both semantics and fluid mechanics. But beyond that, concretely, in what ways do (should?) semantics and fluid mechanics inform each other?
John: We should see the same types of higher-level data structures across both - e.g. the “geometry + trajectory” natural latents we used in the semantics post should, insofar as the post correctly captures the relevant concepts, generalize to recognizable “objects” in a fluid flow, like eddies (modulo adjustments for nonrigid objects).
David: Sure, I did think it was intuitive to think along those lines as a model for eddies in fluid flow. But in general, why expect to see the same types of data structures for semantics and fluid flow? Why not expect various phenomena in fluid flow to be more suited to representation in some data structures which aren’t the exact same type as those used for the referrents of human words?
John: Specifically, I claim that the types of high-level data structures which are natural for fluid flow should be a subset of the types needed for semantics. If there’s a type of high-level data structure which is natural for fluid flow, but doesn’t match any of the semantic types (noun, verb, adjective, short phrases constructed from those, etc), then that pretty directly disproves at least one version of the natural abstraction hypothesis (and it’s a version which I currently think is probably true).
David: Woah, hold up, that sounds like a very different form of the natural abstraction hypothesis than our audience has heard before! It almost sounds like you’re saying that there are no “non-linguistic concepts”. But I know you actually think that much/most of human cognition routes through “non-linguistic concepts”.
John: Ok, there’s a couple different subtleties here.
First: there’s the distinction between a word or phrase or sentence vs the concept(s) to which it points. Like, the word “dog” evokes this whole concept in your head, this whole “data structure” so to speak, and that data structure is not itself linguistic. It involves visual concepts, probably some unnamed concepts, things which your “inner simulator” can use, etc. Usually when I say that “most human concepts/cognition are not linguistic”, that’s the main thing I’m pointing to.
Second: there’s concepts for which we don’t yet have names, but could assign names to. One easy way to find examples is to look for words in other languages which don’t have any equivalent in our language. The key point about those concepts is that they’re still the same “types of concepts” which we normally assign words to, i.e. they’re still nouns or adjectives or verbs or…, we just don’t happen to have given them names.
Now with both of those subtleties highlighted, I’ll once again try to state the claim: roughly speaking, all of the concepts used internally by humans fall into one of a few different “types”, and we have standard ways of describing each of those types of concept with words (again, think nouns, verbs, etc, but also think of the referents of short phrases you can construct from those blocks, like “dog fur” or “the sensation of heat on my toes”). And then one version of the Natural Abstraction Hypothesis would say: those types form a complete typology of the data structures which are natural in our world.
David: Alright, let me have a crack at it. New N.A.H. just dropped: The human mind is a sufficiently general simulator of the world, and fidelitous representations of the world “naturally” decompose into few enough basic types of data structures, that human minds operate all of the data structure types which naturally (efficiently, sufficiently accurately, …) are “found” in the world. When we use language to talk about the world, we are pointing words at these (convergent!) internal data structures. Maybe we don’t have words for certain instances of these data structures, but in principle we can make new words whenever this comes up; we don't need whole new types of structures.
I have some other issues to bring up, but first: Is this version of the N.A.H. actually true? Do humans actually wield the full set of basic data structures natural for modeling the whole world?
John: Yeah, so that’s a way in which this hypothesis could fail (which, to be clear, I don’t actually expect to be an issue): there could be whole new types of natural concepts which are alien to human minds. In principle, we could discover and analyze those types mathematically, and subjectively they’d be a real mindfuck.
That said, if those sorts of concepts are natural in our world, then it’s kinda weird that human minds weren’t already evolved to leverage them. Of course it’s hard to tell for sure, without some pretty powerful mathematical tools, but I think the evolutionary pressure argument should make us lean against. (Of course a counterargument could be that whole new concept-types have become natural, or will become natural, as a result of major changes in our environment - like e.g. humans or AI taking over the world.)
David: Second genre of objections which seem obvious: Part of the claim here is, “The internal data structures which language can invoke form a set that includes all the natural data-structure types useful/efficient/accurate for representing the world.” But how do we know whether or not our language is so deficient that a fully fleshed out Interoperable Semantics of human languages still has huge blind spots? What if we don’t yet know how to talk about many of the concepts in human cognition, even given the hypothesis that human minds contain all the basic structures relevant for modeling the world? What if nouns, adjectives, verbs, etc.. are an impoverished set of semantic types?
John: That’s the second way the hypothesis could fail: maybe humans already use concepts internally which are totally un-pointable-to using language (or at least anything like current language). Probably many people who are into Eastern spiritual woo would make that claim. Mostly, I expect such woo-folk would be confused about what “pointing to a concept” normally is and how it’s supposed to work: the fact that the internal concept of a dog consists of mostly nonlinguistic stuff does not mean that the word “dog” fails to point at it. And again here, I think there’s a selection pressure argument: a lot of effort by a lot of people, along with a lot of memetic pressure, has gone into trying to linguistically point to humans’ internal concepts.
Suppose there is a whole type of concept which nobody has figured out how to point at (talk about.) Then, either:
- Those concepts are not of a natural type so interoperability doesn’t hold and our models of semantics make no guarantees that it should be communicable.
- It is a natural type and so is communicable in the Interoperable Semantics sense and so…it’s weird and confusing that people have failed to point to it in this hypothetical?
So basically I claim that human internal concepts are natural and we have spent enough effort as a species trying to talk about them that we’ve probably nailed down pointers to all the basic types.
David: And if human internal concepts are importantly unnatural, well then the N.A.H. fails. Sounds right.
The Piraha can't count and many of them don't appear to be able to learn to count, not even as motivated adults, past a critical period (when (I've heard but haven't found a way to nail down for sure from clean eye witness reports) they have sometimes attended classes because they wish to be able to count the "money" they make from sex work, for example).
Are the Piraha in some meaningful sense "not fully human" due to environmental damage or are "counting numbers" not a natural abstraction or... or what?
On the other end of the spectrum, Ithkuil is a probably-impossible-for-humans-to-master conlang whose creator sorta tried to give it EVERY feature that has shown up in at least one human language that the creator of the language could find.
Does that mean that once an AI is fluent in Ithkuil (which surely will be possible soon, if it is not already) maybe the AI will turn around and see all humans sorta the way that we see the Piraha?
...
My current working model of the essential "details AND limits" of human mental existence puts a lot of practical weight and interest on valproic acid because of the paper "Valproate reopens critical-period learning of absolute pitch".
Also, it might be usable to cause us to intuitively understand (and fluently and cleanly institutionally wield, in social groups, during a political crisis) untranslatable 5!
Like, in a deep sense, I think that the "natural abstractions" line of research leads to math, both discovered, and undiscovered, especially math about economics and cooperation and agency, and it also will run into the limits of human plasticity in the face of "medicalized pedagogy".
And, as a heads up, there's a LOT of undiscovered math (probably infinitely much of it, based on Goedel's results) and a LOT of unperfected technology (that could probably change a human base model so much that the result crosses some lines of repugnance even despite being better at agency and social coordination).
...
Speaking of "the wisdom of repugnance".
In my experience, studying things where normies experience relatively unmediated disgust, I can often come up with pretty simply game theory to explain both (1) why the disgust would evolutionarily arise and also (2) why it would be "unskilled play within the game of being human in neo-modern times" to talk about it.
That is to say, I think "bringing up the wisdom of repugnance" is often a Straussian(?) strategy to point at coherent logic which, if explained, would cause even worse dogpiles than the current kerfuffle over JD Vance mentioning "cat ladies".
This leads me to two broad conclusions.
(1) The concepts of incentive compatible mechanism design and cooperative game theory in linguistics both suggest places to predictably find concepts that are missing from polite conversation that are deeply related to competition between adult humans who don't naturally experience storge (or other positive attachments) towards each other as social persons, and thus have no incentive to tell each other certain truths, and thus have no need for certain words or concepts, and thus those words don't exist in their language. (Notice: the word "storge" doesn't exist in English except as a loan word used by philosophers and theologians, but the taunt "mama's boy" does!)
(2) Maybe we should be working on "artificial storge" instead of a way to find "words that will cause AI to NOT act like a human who only has normal uses for normal human words"?
...
I've long collected "untranslatable words" and a fun "social one" is "nemawashi" which literally means "root work", and it started out as a gardening term meaning "to carefully loosen all the soil around the roots of a plant prior to transplanting it".
Then large companies in Japan (where the Plutocratic culture is wildly different than in the US) use nemawashi to mean something like "to go around and talk to the lowest status stakeholders about proposed process changes first, in relative confidence, so they can veto stupid ideas without threatening their own livelihood or publicly threatening the status of the managers above them, so hopefully they can tweak details of a plan before the managers synthesize various alternative plans into a reasonable way for the whole organization to improve its collective behavior towards greater Pareto efficiency"... or something?
The words I expect to not be able to find in ANY human culture are less wholesome than this.
English doesn't have "nemawashi" itself for... reasons... presumably? <3
...
Contrariwise... the word "bottom bitch" exists, which might go against my larger claim? Except in that case it involves a kind of stabilized multi-shot social "compatibility" between a pimp and a ho, that at least one of them might want to explain to third parties, so maybe it isn't a counter-example?
The only reason I know the word exists is that Chappelle had to explain what the word means, to indirectly explain why he stopped wanting to work on The Chappelle Show for Comedy Central.
Oh! Here's a thing you might try! Collect some "edge-case maybe-too-horrible-to-exist" words, and then check where they are in an embedding space, and then look for more words in that part of the space?
Maybe you'll be able to find-or-construct a "verbal Loab"?
(Ignoring the sense in which "Loab was discovered" and that discovery method is now part of her specific meaning in English... Loab, in content, seems to me to be a pure Jungian Vampire Mother without any attempt at redemption or social usefulness, but I didn't notice this for myself. A friend who got really into Lacan noticed it and I just think he might be right.)
And if you definitely cannot construct any "verbal Loab", then maybe that helps settle some "matters of theoretical fact" in the field of semantics? Maybe?
Ooh! Another thing you might try, based on this sort of thing, is to look for "steering vectors" where "The thing I'm trying to explain, in a nutshell, is..." completes (at low temperature) in very very long phrases? The longer the phrase required to "use up" a given vector, the more "socially circumlocutionary" the semantics might be? This method might be called "dowsing for verbal Loabs".
Alright! I'm going to try to stick to "biology flavored responses" and "big picture stuff" here, maybe? And see if something conversational happens? <3
(I attempted several responses in the last few days and each sketch turned into a sprawling messes that became a "parallel comment". Links and summaries at the bottom.)
The thing that I think unifies these two attempts at comments is a strong hunch that "human language itself is on the borderland of being anti-epistemic".
Like... like I think humans evolved. I think we are animals. I think we individually g... (read more)