Where do these crisp ontologies come from, if (under the signalling theory of meaning) symbols only have probabilistic meanings?
There are two things here which are at least potentially distinct: The meaning of symbols in thinking, and their meaning in communication. I'd expect these mechanisms to have a fair bit on common, but specifically the problem of alignment of the speakers which is adressed here would not seem to apply to the former. So I dont think we need to wonder here where those crisp ontologies came from.
This is the type of thinking that can't tell the difference between "a implies b" and "a, and also b" -- because people almost always endorse both "a" and "b" when they say "a implies b".
One way to eliminate this particular problem is to focus on whether the speaker agrees with a sentence if asked, rather than spontaneaus assertions. This fails when the speaker is systematically wrong about something, or when Cartesian boundaries are broken, but other than that it seems to take out a lot of the "association" problems.
None of this is literally said, but a cloud of conversational implicature surrounds the literal text. The signalling analysis can't distinguish this cloud from the literal meaning.
Even what we would consider literal speech can depend on implicature. Consider: "Why don't we have bacon?" "The cat stole it". Which cat "the cat" is requires Gricean reasoning, and the phrase isn't compositional, either.
To hint at my opinion, I think it relates to learning normativity.
I think one criterion of adequacy for explanations of level 1 is to explain why it is sometimes rational to interpret people literally. Why would you throw away all that associated information? Your proposal in that post is quite abstract, could you outline how it would adress this?
Interrestingly I did think of norms when you drew up the problem, but in a different way, related to enforcement. We hold each other responsible for our assertions, and this means we need an idea of when a sentence is properly said. Now such norms can't require speakers to be faithful to all the propabilistic associations of a sentence. That would leave us with too few sentences to describe all situations, and if the norms are to be reponsive to changing expectations, it could never reach equilibrium. So we have to pick some subset of the associations to enforce, and that would then be the "literal meaning". We can see why it would be useful for this to incorporate some compositionality: assertions are much more useful when you can combine multiple, possibly from different sources, into one chain of reasoning.
Interrestingly I did think of norms when you drew up the problem, but in a different way, related to enforcement. We hold each other responsible for our assertions, and this means we need an idea of when a sentence is properly said. Now such norms can't require speakers to be faithful to all the propabilistic associations of a sentence. That would leave us with too few sentences to describe all situations, and if the norms are to be reponsive to changing expectations, it could never reach equilibrium. So we have to pick some subset of the associations to enforce, and that would then be the "literal meaning". We can see why it would be useful for this to incorporate some compositionality: assertions are much more useful when you can combine multiple, possibly from different sources, into one chain of reasoning.
Good points! I'll have to think on this.
If Alice says something to Bob, Alice (in general) has a plan, where Alice's own speech act is part of the plan, and Bob interpreting the speech in a certain way is also part of the plan. If the plan concludes "...and then Bob concludes that I am in his in-group", then that's level 3. If the plan concludes "...and then Bob knows not to swim in the shark-infested river", then that's level 1. Etc.
Without the speaker modeling the listener and incorporating the listener's expected interpretation into a plan, you can't have different levels, I don't think.
I'll grant that this approximately works, but I don't think it works in detail.
For example, I've operated at level 1 in cases where I have a low expectation of being believed -- I simply say true things. My "plan" is more "Alice will know that I believe X" or something. I might be doing it because voicing thoughts aloud helps me to think things through, or because I value speaking the truth as a policy. These seem like they can be level 1 concerns. (Valuing truth as a policy could be a level 3 concern in some cases, if I value it for identity-type reasons. But this is also a counterexample to your pattern.)
Hmm, I think levels 2-4 absolutely require simulating the person you're talking to, almost by definition, so if you're just taking without thinking about how it will be understood by the person you're talking to, I'd say that's either level 1 or "none of the above". (Like, singing in the shower or muttering under your breath are "none of the above", probably.)
Also, speech acts (like all our other actions) have a messy mixture of various motivations. So, like, if you're talking to rationalists, saying true and profound things about the world presumably works on both level 3 and level 1. I don't think there's an answer to "what level is it really?" It can be mostly level 1 or mostly level 3, but it's unlikely to be 100% pure one or the other, at least for neurotypical speakers, I think.
I believe the last section of this post is pointing to something central and important which is really difficult to articulate. Which is ironic, since "how does articulating concepts work?" is kinda part of it.
To me, it feels like Bayesianism is missing an API. Getting embeddedness and reflection and communication right all require the model talking about its own API, and that in turn requires figuring out what the API is supposed to be - like how the literal meanings of things passed in and out actually get tied to the world.
I agree, it's important to create, or at least detect, well-aligned agents. You suggest we need an honesty API.
Nope, that is not what I'm talking about here. At least I don't think so. The thing I'm talking about applies even when there's only one agent; it's a question of how that agent's own internal symbols end up connected to physical things in the world, for purposes of the agent's own reasoning. Honesty when communicating with other agents is related, but sort of tangential.
Aren't the symbols hardcoded to mean sth? Your parents keep using "Apple" to refer to an apple, and you hardcode that symbol to stand for apples. Of course, the devil is in the details, but I think developmental linguistics should probably have some existing literature, and the question doesn't seem that mysterious to me.
You have a concept of apples before learning the word (otherwise you wouldn't know which thing in our very-high-dimensional world to tie the word to; word-learning does not require nearly enough examples to narrow down the concept space without some pre-existing concept). Whatever data structure your brain uses to represent the concept is separate from the word itself, and that's the thing I'm talking about here.
Well, really I'm talking about the idealized theoretical Bayesian version of that thing. Point is, it should not require other agents in the picture, including your parents.
You have a concept of apples before learning the word (otherwise you wouldn't know which thing in our very-high-dimensional world to tie the word to; word-learning does not require nearly enough examples to narrow down the concept space without some pre-existing concept).
That doesn't seem right, intuitively. People (humans) have pre-existing capabilities ('instincts'), by the time they're learning words, and one of them is the ability to 'follow pointing', i.e. look at something someone else is pointing at. In practice, that can involve considerable iteration, e.g. 'no not that other round red (or green) thing; this one right here'.
The parts of our minds that learn words also seem to have access to an API for analyzing and then later recognizing specific visual patterns, e.g. shapes, colors, materials, and faces. The internals of that visual-system API are pretty sophisticated too.
Well, really I'm talking about the idealized theoretical Bayesian version of that thing. Point is, it should not require other agents in the picture, including your parents.
Learning language must require other agents, at least indirectly, tho – right? It only exists because some agents use (or used) it.
But I'm skeptical that an 'idealized theoretical Bayesian agent' could learn language on its own – there is no such thing as "an ideal philosophy student of perfect emptiness".
I'm not talking about learning language, I'm talking about how we chunk the world into objects. It's not about learning the word "tree", it's about recognizing the category-of-things which we happen to call trees. It's about thinking that maybe the things I know about one of the things-we-call-trees are likely to generalize to other things-I-call-trees. We must do that before attaching the word "tree" to the concept, because otherwise it would take millions of examples to hone in on which concept the word is trying to point to.
I agree that chunking precedes naming – historically. But I think most (a lot?) of people learn the name first and have to (try to) reverse engineer the chunking. Some of this definitely happens iteratively and interactively, e.g. when teaching children.
And I'm very unsure that there is one simple way for "how we chunk the world into objects". I think that might explain why some people chunk the same words so differently: there's no (obvious) unique best way to chunk some ideas for everyone.
I know that people that are relatively competent at chess reliably chunk board states in a way that I know that I don't (as I'm not at all good at chess).
Similarly, people that already knows a variety of different plants (at least) seems to chunk them in a way that I don't.
We must do that before attaching the word "tree" to the concept, because otherwise it would take millions of examples to hone in on which concept the word is trying to point to.
I don't think this is true. If anything, some ideas/concepts seem to start with very coarse chunking based on a very small number of prototypical examples, and then it does take 'millions' of subsequent examples to refine the chunking. And that is definitely sometimes mediated directly via language.
I think there is a lot of pre-verbal or non-verbal chunking involved in thinking.
But I also think it's very common to not have a chunk ("concept") before learning the word, even of something like apples.
Tho I also think the opposite is pretty common – 'Oh, that's the word for those!'.
There's an attention component to chunking. I could chunk some set of things into neat categories – if I examined it closely for a sufficient duration. But I mostly don't – relative to all possible things I could be examining.
I think you're not getting something about why the question is an interesting one.
The meaning of "meaning" is a contentious philosophical issue, and although developmental psychology could provide some inspiration, I highly doubt they'd have provided a rigorous formal answer. Saying the word "hardcoded" hardly sheds any light (especially since "hardcoded" usually contrasts with "learned", and you're somehow suggesting that we learn hardcoded answers...).
Are you saying that a probabilistic analysis of communication that treats communication as evidence for hidden variables/information cannot deal with the meaning of words? If so, why can't the meaning itself be such a hidden information revealed by the message?
What does "the meaning itself" mean, in order for us to do probabilistic inference on it? In order to do reliable probabilistic inference, we need to have a sufficient starting theory of the thing (or we need gold-standard data so that it's not inaccessible information; or we need a working theory of inaccessible information which allows us to extract it from learning systems).
That is to say: humans succeed at doing probabilistic inference about this, but in order to construct a machine that would do it, we need more information than just "do probabilistic inference".
One such theory is that the meaning of an utterance is just the probabilistic inferences you can make from it; but, I'm rejecting that theory.
So your criticism is that you are looking for the underlying assumption/theory that help humans do this probabilistic inference, and the signalling analysis of meaning tells you there is no such theory?
No, my criticism is that the signalling theory isn't very good. It doesn't allow for our intuitive concept of lying, and it doesn't have an account of literal meaning vs implicature.
Communication of meaning, signaling of truth. I'm not sure what essential difficulty remains if we merely make sure to distinguish between communicating ideas (which in this role are to be made clear, not yet compared against the world), and providing evidence for their relevance to the discussion or for their correspondence-to-the-world truth. Fireflies won't be able to communicate ideas, only signal truths, so this analysis doesn't naturally apply to them. But language can communicate ideas without implication of their truth, and at this point signaling helps if truth is to be extracted somewhat directly from other actors and not evaluated in other ways.
we assume the remark is relevant to the conversation
For example, in this case the assumption is part of how meaning is guessed, but is in general unrelated to how its truth (or truth of its relevance) is to be evaluated. The intermingling of the two aspects of communication is mostly shorthand, it can be teased apart.
Are you making a hypothetical claim that if we could differentiate between communication of ideas vs their truth, then we could distinguish connotation from denotation? Or are you claiming that the implication holds and we can distinguish clearly between those to things, so we can in fact distinguish between connotation and denotation?
I don't (currently) see the argument for either part -- I don't get why the implication would be hypothetically true, and I also don't see how the signalling analysis of meaning helps us establish the distinction for your premise.
I don't see the distinction between connotation and denotation as an effective way of carving this muddle. The problem with signaling theory of meaning is that it explains communication of meaning as communication of truth, mixing up these different things. But meaning is often communicated using the whole palette of tools also used for signaling truth. In particular, communication of meaning (that is the kinds of things used to construct models and hypotheses) with utterances that look like vague reasoning by association shouldn't in itself make it more difficult to reason lawfully and clearly about that meaning.
So the method I'm proposing is to consider any utterance in either capacity in turn, with separate questions like "Which idea is this drawing attention to?" and "What weight is implied for relevant assertions about this idea?" But at this point I'm not sure what the essential difficulty is that remains, because I don't perceive the motivation for the post clearly enough.
I bet this is a side effect of having a large pool of bounded rational agents that all need to communicate with each other, but not necessarily frequently. When two agents only interact briefly, neither agent has enough data to work out what the "meaning" of the other's words. Each word could mean too many different things. So you can probably show that under the right circumstances, it's beneficial for agents in a pool to have a protocol that maps speech-acts to inferences the other party should make about reality (amongst other things, such as other actions). For instance, if all agents have shared interests, but only interact briefly with limited bandwidth, both agents would have an incentive to implement either side of the protocol. Furthermore, it makes sense for this protocol to be standardized, because the more standard the protocol, the less bandwidth and resources the agents will need to spend working out the quirks of each others protocol.
This is my model of what languages are.
Now that you have a well defined map from speech-acts to inferences, the notion of lying becomes meaningful. Lying is just when you use speech acts and the current protocol to shift another agents map of reality in a direction that does not correspond to your own map of reality.
I'm with you on the deficiency of the signalling frame when talking about human communication and communication more generally. Skyrms and others who developed the signalling frame explicitly tried to avoid having a notion of of intentionality in order to explore questions like "how could the simplest things that still make sense to call 'communication' develop in systems that don't have human level intelligence?", which means the model has a gaping hole when trying to talk about what people do.
I wrote a post about the interplay between the intentional aspects of meaning and what you're calling the probabilistic information. It's doesn't get too into the weeds, but might provoke more ideas in you.
I think part of the story is that language is compositional. If someone utters the words "maroon hexagon", you can make a large update in favor of a specific hypothesis even if you haven't previously seen a maroon hexagon, or heard those words together, or judged there to be anything special about that hypothesis. "Maroon" has been sufficiently linked to a specific narrow range of colors, and "hexagon" to a specific shape, so you get to put those inferences together without needing additional coordination with the speaker.
This seems related to the denotation/connotation distinction, where compositional inferences are (typically?) denotations. Although the distinction seems kind of fuzzy, as it seems that connotations can (gradually?) become denotations over time, e.g. "goodbye" to mean that a departure is imminent, or an image of a red octagon to mean "stop" (although I'd say that the words "red octagon" still only have the connotation of "stop"). And "We should get together more often" is interesting because the inferences you can draw from it aren't that related to the inferences you typically draw from the phrases "get together" and "more often".
This is the type of thinking that can't tell the difference between "a implies b" and "a, and also b" -- because people almost always endorse both "a" and "b" when they say "a implies b".
This is the type of thinking where disagreement tends to be regarded as a social attack, because disagreement is associated with social attack.
This is the type of thinking where we can't ever have a phrase meaning "honestly" or "literally" or "no really, I'm not bulshitting you on this one" because if such a phrase existed then it would immediately be co-opted by everyone else as a mere intensifier.
This "type of thinking" sure seems very accurate to me.
In particular, the third paragraph quoted above seems spectacularly accurate, e.g. the euphemism treadmill.
Alice: "I just don't understand why I don't see Cedrick any more."
Bob: "He's married now."
We infer from this that the marriage creates some kind of obstacle. Perhaps Cedrick is too busy to come over. Or Bob is implying that it would be inappropriate for Cedrick to frequently visit Alice, a single woman. None of this is literally said, but a cloud of conversational implicature surrounds the literal text. The signalling analysis can't distinguish this cloud from the literal meaning.
I'm not sure this is quite true. Just because every utterance produces a 'cloud of implicature' doesn't mean 'literal meaning' isn't also a component of the signal.
And, in practice, it doesn't seem like there is any general way to distinguish the cloud from the literal meaning. One problem being which literal meaning should be considered the literal meaning?
Like logical uncertainty, I see this as a challenge in the integration of logic and probability. In some sense, the signalling theory only allows for reasoning by association rather than structured logical reasoning, because the meaning of any particular thing is just its probabilistic associations.
I'm confused why this is a 'challenge' – or a surprising one anyways. It certainly seems (again!) astonishingly accurate to describe most people as "reasoning by association".
Where do these crisp ontologies come from, if (under the signalling theory of meaning) symbols only have probabilistic meanings?
Wouldn't they come from mostly (or 'almost perfectly') certain meanings? Practically, words seem to almost never correspond to a particularly crisp ontology (compare to, e.g. the elements or subject of a mathematical theory). I don't think there's any word that would – under all circumstances or in all situations – have a (unique) 'literal meaning'.
The explanation of how communication can (reliably) convey 'literal meanings' seems to boil down to 'with great effort, arbitrary depths of circumlocution, and (still) only ever approximately'.
This "type of thinking" sure seems very accurate to me.
In particular, the third paragraph quoted above seems spectacularly accurate, e.g. the euphemism treadmill.
OK, but I claim there is a difference between "literally" and a mere intensifier.
Also, people who can't tell the difference between "A->B" and "A, and also B" are pretty frustrating to talk to.
I'm not sure this is quite true. Just because every utterance produces a 'cloud of implicature' doesn't mean 'literal meaning' isn't also a component of the signal.
And, in practice, it doesn't seem like there is any general way to distinguish the cloud from the literal meaning. One problem being which literal meaning should be considered the literal meaning?
I totally agree that the literal meaning is a component.
I agree that there isn't some general method to distinguish the cloud from the literal meaning, or pick out which literal meaning, but I claim people do anyway, sometimes making quite a strong distinction.
OK, but I claim there is a difference between "literally" and a mere intensifier.
I'm confused. Perhaps we're writing past each other!
There is a meaning or sense of 'literally' that is not an intensifier – I believe this is true.
In most cases, for myself personally (and subject to all of the limitations of this kind of memory and for myself personally), I seem to be able to interpret specific uses of "literally" unambiguously.
There are occasional exceptions tho!
Also, people who can't tell the difference between "A->B" and "A, and also B" are pretty frustrating to talk to.
I agree!
I agree that there isn't some general method to distinguish the cloud from the literal meaning, or pick out which literal meaning, but I claim people do anyway, sometimes making quite a strong distinction.
I agree – people do it (pretty reliably) anyway and there can be arbitrarily strong distinctions maintained.
In some sense, the signalling theory only allows for reasoning by association rather than structured logical reasoning, because the meaning of any particular thing is just its probabilistic associations.
ES: Uncertain.
To properly assess the probabilistic associations that a certain set of symbols has, we humans need to first unpack the set to its literal/usual meaning. So when I say "A -> B; Not B.", this first gets parsed and its logical meaning extracted, then this meaning plus the symbols themselves get used to find the probabilistic meaning.
Of course, this process doesn't happen neatly, and some people might use more heuristical methods and skip parsing the symbols partially (i.e., they pattern-match on the current uttering and previous utterances, and directly use the nearest cached Bayesian meaning available). This seems to be pretty common among normal people, and a constant source of friction with intellectuals.
A common Bayesian account of communication analyzes signalling games: games in which there is hidden information, and some actions can serve to communicate that information between players. The meaning of a signal is precisely the probabilistic information one can infer from it.
I'll call this the signalling analysis of meaning. (Apparently, it's also been called Gricean communication.)
In Maybe Lying Can't Exist, Zack Davis points out that the signalling analysis has some counterintuitive features. In particular, it's not clear how to define "lying"!
Either agents have sufficiently aligned interests, in which case the agents find a signalling system (an equilibrium of the game in which symbols bear a useful relationship with hidden states, so that information is communicated) or interests are misaligned, in which case no such equilibrium can develop.
We can have partially aligned interests, in which case a partial signalling system develops (symbols carry some information, but not as much as you might want). Zack gives the example of predatory fireflies who imitate a mating signal. The mating signal still carries some information, but it now signals danger as well as a mating opportunity, making the world more difficult to navigate.
But the signalling analysis can't call the predator a liar, because the "meaning" of the signal includes the possibility of danger.
Zach concludes: Deception is an ontologically parasitic concept. It requires a pre-existing notion of truthfulness. One possibility is given by Skyrms and Barrett: we consider only the subgame where sender and receiver have common goals. This gives us our standard of truth by which to judge lies.
I conclude: The suggested solution seems OK to me, but maybe we want to throw out the signalling analysis of meaning altogether. Maybe words don't just mean what they probabilistically imply. Intuitively, there is a distinction between connotation and denotation. Prefacing something with "literally" is more than just an intensifier.
The signalling analysis of meaning seems to match up rather nicely with simulacrum level 3, where the idea that words have meaning has been lost, and everyone is vibing.
Level 3 and Signalling
Out of the several Simulacra definitions, my understanding mainly comes from Simulacra Levels and their Interactions. Despite the risk of writing yet-another-attempt-to-explain-simulacra-levels, here's a quick summary of my understanding:
Here are some facts about the signalling analysis of meaning.
This sounds an awful lot like level-3 thinking to me.
I'm not saying that signalling theory can only analyze level-three phenomena! On the contrary, I still think signalling theory includes honest communication as a special case. I still think it's a theory of what information can be conveyed through communication, when incentives are not necessarily aligned. After all, signalling theory can examine cases of perfectly aligned incentives, where there's no reason to lie or manipulate.
What I don't think is that signalling theory captures everything that's going on with truthfulness and deceit.
Signalling theory now strikes me as a level 3 understanding of language. It can watch levels 1 and 2 and come to some understanding of what's going on. It can even participate. It just doesn't understand the difference between levels 1 and 2. It doesn't see that words have meanings beyond their associations.
This is the type of thinking that can't tell the difference between "a implies b" and "a, and also b" -- because people almost always endorse both "a" and "b" when they say "a implies b".
This is the type of thinking where disagreement tends to be regarded as a social attack, because disagreement is associated with social attack.
This is the type of thinking where we can't ever have a phrase meaning "honestly" or "literally" or "no really, I'm not bulshitting you on this one" because if such a phrase existed then it would immediately be co-opted by everyone else as a mere intensifier.
The Skyrms & Barrett Proposal
What about the proposal that Zack Davis mentioned:
This is the sort of proposal I'm looking for. It's promising. But I don't think it's quite right.
First of all, it might be difficult to define the hypothetical scenario in which all interests are aligned, so that communication is honest. Taking an extreme example, how would we then assign meaning to statements such as "our interests are not aligned"?
More importantly, though, it still doesn't make sense of the denotation/connotation distinction. Even in cases where interests align, we can still see all sorts of probabilistic implications of language, such as Grice's maxims. If someone says "frogs can't fly" in the middle of a conversation, we assume the remark is relevant to the conversation, and form all kinds of tacit conclusions based on this. To be more concrete, here's an example conversation:
Alice: "I just don't understand why I don't see Cedrick any more."
Bob: "He's married now."
We infer from this that the marriage creates some kind of obstacle. Perhaps Cedrick is too busy to come over. Or Bob is implying that it would be inappropriate for Cedrick to frequently visit Alice, a single woman. None of this is literally said, but a cloud of conversational implicature surrounds the literal text. The signalling analysis can't distinguish this cloud from the literal meaning.
The Challenge for Bayesians
Zach's post (Maybe Lying Can't Exist, which I opened with) feels to me like one of the biggest challenges to classical Bayesian thinking that's appeared on LessWrong in recent months. Something like the signalling theory of meaning has underpinned discussions about language among rationalists since before the sequences.
Like logical uncertainty, I see this as a challenge in the integration of logic and probability. In some sense, the signalling theory only allows for reasoning by association rather than structured logical reasoning, because the meaning of any particular thing is just its probabilistic associations.
Worked examples in the signalling theory of meaning (such as Alice and Bob communicating about colored shapes) tend to assume that the agents have a pre-existing meaningful ontology for thinking about the world ("square", "triangle" etc). Where do these crisp ontologies come from, if (under the signalling theory of meaning) symbols only have probabilistic meanings?
How can we avoid begging the question like that? Where does meaning come from? What theory of meaning can account for terms with definite definitions, strict logical relationships, and such, all alongside probabilistic implications?
To hint at my opinion, I think it relates to learning normativity.