This post is styled after conversations we’ve had in the course of our research, put together in a way that hopefully highlights a bunch of relatively recent and (ironically) hard-to-articulate ideas around natural abstractions.
John: So we’ve been working a bit on semantics, and also separately on fluid mechanics. Our main goal for both of them is to figure out more of the higher-level natural abstract data structures. But I’m concerned that the two threads haven’t been informing each other as much as they should.
David: Okay…what do you mean by “as much as they should”? I mean, there’s the foundational natural latent framework, and that’s been useful for our thinking on both semantics and fluid mechanics. But beyond that, concretely, in what ways do (should?) semantics and fluid mechanics inform each other?
John: We should see the same types of higher-level data structures across both - e.g. the “geometry + trajectory” natural latents we used in the semantics post should, insofar as the post correctly captures the relevant concepts, generalize to recognizable “objects” in a fluid flow, like eddies (modulo adjustments for nonrigid objects).
David: Sure, I did think it was intuitive to think along those lines as a model for eddies in fluid flow. But in general, why expect to see the same types of data structures for semantics and fluid flow? Why not expect various phenomena in fluid flow to be more suited to representation in some data structures which aren’t the exact same type as those used for the referrents of human words?
John: Specifically, I claim that the types of high-level data structures which are natural for fluid flow should be a subset of the types needed for semantics. If there’s a type of high-level data structure which is natural for fluid flow, but doesn’t match any of the semantic types (noun, verb, adjective, short phrases constructed from those, etc), then that pretty directly disproves at least one version of the natural abstraction hypothesis (and it’s a version which I currently think is probably true).
David: Woah, hold up, that sounds like a very different form of the natural abstraction hypothesis than our audience has heard before! It almost sounds like you’re saying that there are no “non-linguistic concepts”. But I know you actually think that much/most of human cognition routes through “non-linguistic concepts”.
John: Ok, there’s a couple different subtleties here.
First: there’s the distinction between a word or phrase or sentence vs the concept(s) to which it points. Like, the word “dog” evokes this whole concept in your head, this whole “data structure” so to speak, and that data structure is not itself linguistic. It involves visual concepts, probably some unnamed concepts, things which your “inner simulator” can use, etc. Usually when I say that “most human concepts/cognition are not linguistic”, that’s the main thing I’m pointing to.
Second: there’s concepts for which we don’t yet have names, but could assign names to. One easy way to find examples is to look for words in other languages which don’t have any equivalent in our language. The key point about those concepts is that they’re still the same “types of concepts” which we normally assign words to, i.e. they’re still nouns or adjectives or verbs or…, we just don’t happen to have given them names.
Now with both of those subtleties highlighted, I’ll once again try to state the claim: roughly speaking, all of the concepts used internally by humans fall into one of a few different “types”, and we have standard ways of describing each of those types of concept with words (again, think nouns, verbs, etc, but also think of the referents of short phrases you can construct from those blocks, like “dog fur” or “the sensation of heat on my toes”). And then one version of the Natural Abstraction Hypothesis would say: those types form a complete typology of the data structures which are natural in our world.
David: Alright, let me have a crack at it. New N.A.H. just dropped: The human mind is a sufficiently general simulator of the world, and fidelitous representations of the world “naturally” decompose into few enough basic types of data structures, that human minds operate all of the data structure types which naturally (efficiently, sufficiently accurately, …) are “found” in the world. When we use language to talk about the world, we are pointing words at these (convergent!) internal data structures. Maybe we don’t have words for certain instances of these data structures, but in principle we can make new words whenever this comes up; we don't need whole new types of structures.
I have some other issues to bring up, but first: Is this version of the N.A.H. actually true? Do humans actually wield the full set of basic data structures natural for modeling the whole world?
John: Yeah, so that’s a way in which this hypothesis could fail (which, to be clear, I don’t actually expect to be an issue): there could be whole new types of natural concepts which are alien to human minds. In principle, we could discover and analyze those types mathematically, and subjectively they’d be a real mindfuck.
That said, if those sorts of concepts are natural in our world, then it’s kinda weird that human minds weren’t already evolved to leverage them. Of course it’s hard to tell for sure, without some pretty powerful mathematical tools, but I think the evolutionary pressure argument should make us lean against. (Of course a counterargument could be that whole new concept-types have become natural, or will become natural, as a result of major changes in our environment - like e.g. humans or AI taking over the world.)
David: Second genre of objections which seem obvious: Part of the claim here is, “The internal data structures which language can invoke form a set that includes all the natural data-structure types useful/efficient/accurate for representing the world.” But how do we know whether or not our language is so deficient that a fully fleshed out Interoperable Semantics of human languages still has huge blind spots? What if we don’t yet know how to talk about many of the concepts in human cognition, even given the hypothesis that human minds contain all the basic structures relevant for modeling the world? What if nouns, adjectives, verbs, etc.. are an impoverished set of semantic types?
John: That’s the second way the hypothesis could fail: maybe humans already use concepts internally which are totally un-pointable-to using language (or at least anything like current language). Probably many people who are into Eastern spiritual woo would make that claim. Mostly, I expect such woo-folk would be confused about what “pointing to a concept” normally is and how it’s supposed to work: the fact that the internal concept of a dog consists of mostly nonlinguistic stuff does not mean that the word “dog” fails to point at it. And again here, I think there’s a selection pressure argument: a lot of effort by a lot of people, along with a lot of memetic pressure, has gone into trying to linguistically point to humans’ internal concepts.
Suppose there is a whole type of concept which nobody has figured out how to point at (talk about.) Then, either:
- Those concepts are not of a natural type so interoperability doesn’t hold and our models of semantics make no guarantees that it should be communicable.
- It is a natural type and so is communicable in the Interoperable Semantics sense and so…it’s weird and confusing that people have failed to point to it in this hypothetical?
So basically I claim that human internal concepts are natural and we have spent enough effort as a species trying to talk about them that we’ve probably nailed down pointers to all the basic types.
David: And if human internal concepts are importantly unnatural, well then the N.A.H. fails. Sounds right.
Alright! I'm going to try to stick to "biology flavored responses" and "big picture stuff" here, maybe? And see if something conversational happens? <3
(I attempted several responses in the last few days and each sketch turned into a sprawling messes that became a "parallel comment". Links and summaries at the bottom.)
The thing that I think unifies these two attempts at comments is a strong hunch that "human language itself is on the borderland of being anti-epistemic".
Like... like I think humans evolved. I think we are animals. I think we individually grope towards learning the language around us and always fail. We never "get to 100%". I think we're facing a "streams of invective" situation by default.
I think prairie dogs have some kind of chord-based chirp system that works like human natural language noun phrases do because noun-phrases are convergently useful. And they are flexible-and-learned enough for them to have regional dialects.
I think elephants have personal names to help them manage moral issues and bad-actor-detection that arise in their fission-fusion social systems, roughly as humans do, because personal names are convergently useful for managing reputation and tracking loyalty stuff in very high K family systems.
I think humans evolved under Malthusian conditions and that there's lots of cannibalism in our history and that we use social instincts to manage groups that manage food shortages (who semi-reliably go to war when hungry). If you're not tracking such latent conflict somehow then you're missing something big.
I think human languages evolve ON TOP of human speech capacities, and I follow McWhorter in thinking that some languages are objectively easy (because of being learned by many as a second language (for trade or slavery or due to migration away from the horrors of history or whatever)) and others are objectively hard (because of isolation and due to languages naturally becoming more difficult over time, after a disruption-caused-simplification).
Like it isn't just that we never 100% learn our own language. It is also that adults make up new stuff a lot, and it catches on, and it becomes default, and the accretion of innovation only stabilizes when humans hit their teens and refuse to learn "the new and/or weird shit" of "the older generation".
Maybe there can be language super-geniuses who can learn "all the languages" very easily and fast, but language are defined, in a deep sense, by a sort of "20th percentile of linguistic competence performance" among people who everyone wants to be understood by.
And the 20th percentile "ain't got the time" to learn 100% of their OWN language.
But also: the 90th percentile is not that much better! There's a ground floor where human beings who can't speak "aren't actually people" and they're weeded out, just like the fetuses with 5 or 3 heart chambers are weeded out, and the humans who'd grow to be 2 feet tall or 12 feet tall die pretty fast, and so on.
On the "language instincts" question, I think: probably yes? If Neanderthals spoke, it was probably with a very high pitch, but they had Sapiens-like FOXP2 I think? But even in modern times there are probably non-zero alleles to help recognize tones in regions where tonal languages are common.
Tracking McWhorter again, there are quite a few languages spoken in mountain villages or tiny islands with maybe 500 speakers (and the village IQ is going to be pretty stable, and outliers don't matter much), where children simply can't speak properly until they are maybe 12.
(This isn't something McWhorter talks about at all, but usually puberty kicks in, and teens refuse to learn any more arbitrary bullshit... but also accents tend to freeze around age 12 (especially in boys, maybe?) which might have something to do with shibboleths and "immutable sides" in tribal wars?)
Those languages where 11 year olds are just barely fluent are at the limit of isolated learnable complexity.
For an example of a seriously tricky language, my understanding (not something I can cite, just gossip from having friends in Northern Wisconsin and a Chippewa chromosome or two) is that in Anishinaabemowin they are kinda maybe giving up on retaining all the conjugations and irregularities that only show up very much in philosophic or theological or political discussions by adults, even as they do their best to retain as much as they can in tribal schools that also use English (for economic rather than cultural reasons)?
So there are still Ojibwe grandparents who can "talk fancy", but the language might be simplifying because it somewhat overshot the limits of modern learnability!
Then there's languages like nearly all the famous ones including English, where almost everyone masters it by age 7 or 8 or maybe 9 for Russian (which is "one of the famous ones" that might have kept more of the "weird decorative shit" that presumably existed in Indo-European)?
...and we kinda know which features in these "easy well known languages" are hard based on which features become "nearly universal" last. For example, rhotics arrive late for many kids in America (with quite a few kindergartners missing an "R" that the teacher talks to their parents about, and maybe they go to speech therapy) but which are also just missing in many dialects, like the classic accents of Boston, New York City, and London... because "curling your tongue back for that R sound" is just kinda objectively difficult.
In my comment laying out a hypothetical language like "Lyapunese" all the reasons that it would never be a real language don't relate to philosophy, or ethics, or ontics, or epistemology, but to language pragmatics. Chaos theory is important, and not in language, and its the fault of humans having short lives (and being generally shit at math because of nearly zero selective pressure on being good at it), I think?
In my comment talking about the layers and layers of difficulty in trying (and failing!) to invent modal auxialiary verbs for all the moods one finds in Nenets, I personally felt like I was running up against the wall of my own ability to learn enough about "those objects over there (ie weird mood stuff in other languages and even weird mood stuff in my own)" to grok the things they took for granted enough to go meta on each thing and become able to wield them as familiar tools that I could put onto some kind of proper formal (mathematical) footing. I suspect that if it were easy for an adult to learn that stuff, I think the language itself would have gotten more complex, and for this reason the task was hard in the way that finding mispricings in a market is hard.
Humans simply aren't that smart, when it comes to serial thinking. Almost all of our intelligence is cached.