Fly-by comment but I claim this paper (and synchronisation in computational mechanics in general) is relevant to reference maintenance and what teleosemantics and also to how two people speaking two different languages will learn to communicate without a dictionary
Upvoted because there's something interesting here, but my reaction to most of the points in the post was either "this seems obvious, why is it interesting?" or "I don't get this at all", so I know I didn't really get it, but I trust that if you find this worthwhile then it likely is. In light of that, I would like a more detailed, in-depth post, so I could understand what this is about.
If it's helpful, the idea of teleosemantics is, in my judgement, the same thing I'm trying to point people at when I write about the problem of the criterion.
I go into some depth about the stuff around purpose in this post. I think that's the deepest and trickiest part to grok, and failing to grok it I think the stuff about truth being contingent will fall flat.
I'm also writing a book about this topic. The chapter I hope to post this week tackles the problem of the criterion more head on, and the next chapter I'll write is about purpose (telos). You might find that helpful to in understanding the idea, but alas it's not done yet.
ok, I reread the essay. I no longer feel like there's a bunch of things I don't understand. One point I still don't understand is why the map/territory distinction commits a Homunculus Fallacy (even after reading your Homunculus Problem post). But I also don't feel like understand the notion of teleosemantics yet, or why it's important/special. So by the end of the post I don't feel like I truly understand this sentence (or why it's significant):
Teleosemantics identifies the semantics of a symbolic construct as what the symbolic construct has been optimized to accurately reflect.
Also "reflect" seems to do some heavy lifting in that sentence which I wouldn't usually object to, but it seems similar to "correspondence" which this essay has some objections to.
Later in the essay, I don't understand why all the distinctions you make require teleosemantics.
My current understanding of the argument is something like this:
The map reflects the territory. How does it reflect the territory? Because it can be interpreted by somebody/something as reflecting the territory/conveying information about the territory. But the act of interpretation itself instantiates a belief that refers to (reflects) the territory. So we're back to square one unless the kind of reference relationship between a literal map (or more generally, an information-about-territory-carrier external to the mind) and the territory is importantly different than that between a belief and the territory. In that case, the belief-as-map analogy/way of thinking doesn't make sense.
What does it mean to optimize for the map to fit the territory, but not the other way around? (After all: we can improve fit between map and territory by changing either map or territory.) Maybe it's complicated, but primarily what it means is that the map is the part that's being selected in the optimization. When communicating, I'm not using my full agency to make my claims true; rather, I'm specifically selecting the claims to be true.
I don't know whether you are familiar with it, but most speech acts or writing acts are considered to have either a "word-to-world" direction of fit, e.g. statements, or a "world-to-word" direction of fit, e.g. commands. Only with the former the agents optimize the speech act ("word") to fit the world; in the latter case they optimize the world to fit the speech act. The fit would be truth in the case of a statement, execution in the case of a command.
There is an analogous but more basic distinction for intentional states ("propositional attitudes"), where the "intentionality" of a mental state is its aboutness. Some have a mind-to-world direction of fit, e.g. beliefs, while others have a world-to-mind direction of fit, e.g. desires or intentions. The former are satisfied when the mind is optimized to fit the world, the latter when the world is optimized to fit the mind.
(Speech acts seem to be honest only insofar the speaker/writer holds an analogous intentional state. So someone who states that snow is white is only honest if they believe that snow is white. For lying the speaker would, apart from being dishonest, also need a deceptive intention with the speech act, i.e. intenting the listener to believe that the speaker believes that snow is white.)
So it seems in the above paragraph you are only considering the word-to-world / mind-to-world direction of fit?
It's a good point. I suppose I was anchored by the map/territory analogy to focus on world-to-word fit. The part about Communicative Action and Rational Choice at the very end is supposed to gesture at the other direction.
Intuitively, I expect it's going to be a bit easier to analyze world-to-word fit first. But I agree that a full picture should address both.
I think this post is important because it brings old insights from cybernetics into a modern frame that relates to how folks are thinking about AI safety today. I strongly suspect that the big idea in this post, that ontology is shaped by usefulness, matters greatly to addressing fundamental problems in AI alignment.
If you pin down what a thing refers to according to what that thing was optimized to refer to, then don't you have to look at the structure of the one who did the optimizing in order to work out what a given thing refers to? That is, to work out what the concept "thermodynamics" refers to, it may not be enough to look at the time evolution of the concept "thermodynamics" on its own, I may instead need to know something about the humans who were driving those changes, and the goals held within their minds. But, if this is correct, then doesn't it raise another kind of homunculus-like regression where we were trying to directly explain semantics, but we ended up needing to inquire into yet another mind, the complete understanding of which would require further unpacking of the frames and concepts held in that mind, and the complete understanding of those frames and concepts requiring even further inquiry into a yet earlier mind that was responsible for doing the optimization of those frames and concepts?
The trick is that for some of the optimisations, a mind is not necessary. There is a sense perhaps in which the whole history of the universe (or life on earth, or evolution, or whatever is appropriate) will become implicated for some questions, though.
On your analysis, do you think it would be fair to say that it's at least reasonably possible, and perhaps even probable, that nothing whatsoever which human concepts entertain exists in reality?
For example, I take you as implying that tables, chairs, and centers of gravity do not exist, and are only convenient abstractions.
But I think this same attitude suggests that cells and molecules do not exist, and even atoms, since in reality there are only particles.
But the particles may really be vibrating strings, or other such things; and certainly particle-wave duality casts some doubt on the reality of particles as opposed to, say, disturbances in the quantum-mechanical wave-function.
So it seems plausible to me that this way of using "exists" implies that only very few of human concepts point at things that exist in reality, and perhaps even none, depending on how much fundamental physics there is left to learn.
This is eliminative materialism.
My only objection here is that this seems like a not-so-useful way to decide the word "exists" works. Plausibly eliminative materialism does point to a scientifically and pragmatically important truth, something roughly like "whenever lower-level calculations conflict with higher-level, it's the lower-level that you should expect to be correct". But this applies on all levels; EG, chemistry supersedes biology, even though both of these are high-level abstractions relative to quantum mechanics. So my personal take is that the insistence that tables and chairs don't exist is an ultimately unrefined attempt to communicate a very important empirical fact about the sort of universe we're in (IE, a reductive universe).
But I think this same attitude suggests that cells and molecules do not exist, and even atoms, since in reality there are only particles.
That's mereological nihilism, not eliminative materialism.
This is eliminative materialism.
Kinda nitpicky, but it seems to me that the term "eliminative materialism" usually refers to is policed/was optimized by the philosophical linguistic community to refer to something like "deep skepticism about mind-related folk concepts being very reality-at-joint-carving" (SEP, Wikipedia)
Interesting comparison!
To spell it out a little,
Silmer's thesis is that desire more or less covers the simple notion of subjective beauty, IE, liking what you see. But when a second player enters the game, optimizing for desirability, things get much more interesting; this phenomenon has its own special indicators, such as bright colors and symmetry. Often, "beauty" is much more about this special phenomenon.
My thesis is that mutual information captures a simple, practical notion of "aboutness"; but optimizing for mutual information carries its own special signature, such as language (IE, codified mappings). Often, "aboutness" is much more about this special phenomenon.
I just read an article that reminded me of this post. The relevant section starts with "Bender and Manning’s biggest disagreement is over how meaning is created". Bender's position seems to have some similarities with the thesis you present here, especially when viewed in contrast to what Manning claims is the currently more popular position that meaning can arise purely from distributional properties of language.
This got me wondering: if Bender is correct, then there is a fundamental limitation in how well (pure) language models can understand the world; are there ways to test this hypothesis, and what does it mean for alignment?
Thoughts?
This got me wondering: if Bender is correct, then there is a fundamental limitation in how well (pure) language models can understand the world; are there ways to test this hypothesis, and what does it mean for alignment?
Well, obviously, there's a huge problem right now with LLMs having no truth-grounding, IE not being able to distinguish between making stuff up vs trying to figure things out. I think that's a direct consequence of only having a 'correlational' picture (IE the 'manning' view).
I very much agree and really like the coining of the term "teleosemantics". I might steal it! :-)
I'm not sure how much you've read my work on this topic or how much it influenced you, but in case you're not very aware of it I think it's worth pointing out some things I've been working on in this space for a while that you might find interesting.
I got nervous about how truth works when I tried to tackle the alignment problem head on. I ended up having to write a sequence of posts to sort out my ideas. At the time, I really failed to appreciate how deep telos ran.
My later work on alignment led me down the path of trying to understand human values as a prerequisite for verifying if an alignment scheme could work in theory or if a particular AI was aligned, which took me on a side quest into metaethical uncertainty. During that side quest I learned about epistemic circularity and the problem of the criterion.
I've tried to explain these ideas but they seem tricky for folks to grok. They either bounce off because it seems obvious and like they have nothing to learn (I think this is not true because while it's obvious because it's just how things are, understanding why things are the way they are in detail is where all the value is) or they have a worldview that strongly rejects the idea that things like truth might be teleological. So I'm now writing a book about the subject to help people both appreciate the depth of the teleology of truth and argue that it's how the world is by breaking down the ideas that prop up the worldview that disagrees. I hope to publish the first draft of the chapter on the problem of the criterion this week, and then I'll get starting on writing the chapter about telos.
I think understanding these ideas are essential for solving alignment because we can't even agree on what we're fundamentally trying to align to if we don't adequately consider this problem. Maybe we get lucky and schemes we attempt to keep AI aligned end up accounting for the inherently teleological nature of knowledge, but I'm not will to leave it to chance.
One thing I see as different between your perspective and (my understanding of) teleosemantics, so far:
You make a general case that values underlie beliefs.
Teleosemantics makes a specific claim that the meaning of semantic constructs (such as beliefs and messages) is pinned down by what it is trying to correspond to.
Your picture seems very compatible with, EG, the old LW claim that UDT's probabilities are really a measure of caring - how much you care about doing well in a variety of scenarios.
Teleosemantics might fail to analyze such probabilities as beliefs at all; certainly not beliefs about the world. (Perhaps beliefs about how important different scenarios are, where "importance" gets some further analysis...)
The teleosemantic picture is that epistemic accuracy is a common, instrumentally convergent subgoal; and "meaning" (in the sense of semantic content) arises precisely where this subgoal is being optimized.
That's my guess at the biggest difference between our two pictures, anyway.
The teleosemantic picture is that epistemic accuracy is a common, instrumentally convergent subgoal; and "meaning" (in the sense of semantic content) arises precisely where this subgoal is being optimized.
I think this is exactly right. I often say things like "accurate maps are extremely useful to things like survival, so you and every other living thing has strong incentives to draw accurate maps, but this is contingent on the extent to which you care about e.g. survival".
So to see if I have this right, the difference is I'm trying to point at a larger phenomenon and you mean teleosemantics to point just at the way beliefs get constrained to be useful.
So to see if I have this right, the difference is I'm trying to point at a larger phenomenon and you mean teleosemantics to point just at the way beliefs get constrained to be useful.
This doesn't sound quite right to me. Teleosemantics is a purported definition of belief. So according to the teleosemantic picture, it isn't a belief if it's not trying to accurately reflect something.
The additional statement I prefaced this with, that accuracy is an instrumentally convergent subgoal, was intended to be an explanation of why this sort of "belief" is a common phenomenon, rather than part of the definition of "belief".
In principle, there could be a process which only optimizes accuracy and doesn't serve any larger goal. This would still be creating and maintaining beliefs according to the definition of teleosemantics, although it would be an oddity. (How did it get there? How did a non-agentic process end up creating it?)
(Following some links...) What's the deal with Holons?
Your linked article on epistemic circularity doesn't really try to explain itself, but rather links to this article, which LOUDLY doesn't explain itself.
I haven't read much else yet, but here is what I think I get:
Not something you wrote, but Viliam trying to explain you:
There is an "everything of everythings", exceeding all systems, something like the highest level Tegmark multiverse only much more awesome, which is called "holon", or God, or Buddha. We cannot approach it in far mode, but we can... somehow... fruitfully interact with it in near mode. Rationalists deny it because their preferred far-mode approach is fruitless here. But you can still "get it" without necessarily being able to explain it by words. Maybe it is actually inexplicable by words in principle, because the only sufficiently good explanation for holon/God/Buddha is the holon/God/Buddha itself. If you "get it", you become the Kegan-level-5 meta-rationalist, and everything will start making sense. If you don't "get it", you will probably construct some Kegan-level-4 rationalist verbal argument for why it doesn't make sense at all.
I'm curious whether you see any similarity between holons and object oriented ontology (if you're at all familiar with that).
I was vibing with object oriented ontology when I wrote this, particularly the "nontrivial implication" at the end.
Here's my terrible summary of OOO:
I find OOO to be an odd mix of interesting ideas and very weird ideas.
Feel free to ignore the OOO comparison if it's not a terribly useful comparison for holons.
Oh man I kind of wish I could go back in time and wipe out all the cringe stuff I wrote when I was trying to figure things out (like why did I need to pull in Godel or reify my confusion?). With that said, here's some updated thoughts on holons. I'm not really familiar with OOO, so I'll be going off your summary here.
I think I started out really not getting what the holon idea points at, but I understood enough to get myself confused in new ways for a while. So first off there's only ~1 holon, such that it doesn't make sense to talk about it as anything other than the whole world. Maybe you could make some case for many overlapping holons centered around each point in the universe expanding out to it's Hubble volume, but I think that's probably not helpful. Better to think of the holon as just the whole world, so really it's just a weird cybernetics term for talking about the world.
The trouble was I really didn't fully grasp the way that relative and absolute truth are not one and the same. So I was actually still fully trapped within my ontology, but holons seemed like a way to pull pre-ontological reality existing on its own inside of ontology.
OOO mostly sounds like being confused about ontology, specifically a kind of reification of the confusion that comes from not realizing that it's maps all the way down, i.e. you only experience the world through, and it's only through experiencing non-experience that you get to taste reality, which is an extremely mysterious answer trying to point at a thing that happens all the time but we literally can't notice it because noticing it destroys it.
OK. So far it seems to me like we share a similar overall take, but I disagree with some of your specific framings and such. I guess I'll try and comment on the relevant posts, even though this might imply commenting on some old stuff that you'll end up disclaiming.
Cool. For what it's worth, I also disagree with many of my old framings. Basically anything written more than ~1 year ago is probably vaguely but not specifically endorsed.
One advantage of this over map-territory correspondence is that it explains the asymmetry between map and territory. Mutual information is symmetric. So why is the map about the territory, but not the other way around? Because the map has been optimized to fit the territory, not the other way around. ("Fit" in the sense of carrying high mutual information, which can be decoded via some specific intended correspondence - a symbolic language.)
loose thought: I wonder what would be the active inference crowd's take on this
In this example, teleosemantics vs probabilistic implication matches fairly well with literal meaning vs connotation. However, there are some cases which present more difficulties:
- A more socially savvy communicator will understand the connotations of their speech, and optimize for these as well.
- One of my main criticisms of the probabilistic-implication account of meaning was its inability to properly define lying. However, this also appears to be a problem for the current account!
My thoughts on this which (I think) (at least partially) align with yours but also take a different perspective
Re (2) lying:
Say sender S is sending a message M to receiver R and M is a lie in the sense that S sends M to R because S believes that it will sway R's beliefs away from (what S thinks sees as) truth. Here, M denotes something else to S and R. To S, M denotes that R will start believing some untrue proposition P (or allocate more probability mass on P) upon receiving M. S has some M-related metadata/background assumptions that prevent them from taking M at face value. R does not have those assumptions, so they do take M at face value.
S optimized M to mean (to S themself) "R will start believing proposition P more upon receiving M". This also means that (if S's assumptions that caused them to send X to R are correct) it will mean something else to R (namely "P").
Re (1) connotation optimization:
I think there are (at least) two ways we can disentangle this.
First, if the sender optimizes the connotations but the receiver assumes that only the denotations were optimized, then we have again the distinction between what the signal means to the sender and to the receiver.
Second, if we have communication where both sides assume there may be some non-trivial optimization of connotations for (honest) communicative purposes, we have the distinction between the communication channels that are typically optimized to convey the information (as assumed by the society/linguistic community) and the communication channels that are actually optimized to convey the message in the case of these two speakers.
(This brings to my mind the distinction between genetic inheritance and other inheritance systems. In particular, both words/literal meaning/denotation and genes seem to be more unambiguously decodable than the other stuff. If the literal/denotationary communication channel is symbolic (as is the case with natural language), then we also have the similarity in that both genetic inheritance and language are ~discrete, which not only makes them easier to analyze but also makes them naturally preferable channels for communication (inheritance is kinda communication between generations).)
I wanted to write a long, detailed, analytic post about this, somewhat like my Radical Probabilism post (to me, this is a similarly large update). However, I haven't gotten around to it for a long while. And perhaps it is better as a short, informal post in any case.
I think my biggest update over the past year has been a conversion to teleosemantics. Teleosemantics is a theory of semantics -- that is, "meaning" or "aboutness" or "reference".[1]
To briefly state the punchline: Teleosemantics identifies the semantics of a symbolic construct as what the symbolic construct has been optimized to accurately reflect.
Previously, something seemed mysterious about the map/territory relationship. What could possibly imbue 'symbols' with 'meaning'? The map/territory analogy seems inadequate to answer this question. Indeed, to analogize "belief" with "map" and "the subject of belief" with "territory" commits a homunculus fallacy! The meaning-makers are the map-readers and map-writers; but they can only make meaning by virtue of the beliefs within their own heads. So the map/territory analogy seems to suggest that an infinite regress of meaning-makers would be required.
You probably won't believe me at first. Perhaps you'll say that the lesson of the map/territory analogy is the correspondence between the map and the territory, which exists independently of the map-reader who uses the correspondence to evaluate the map.
I have several objections.
But my point here isn't to denounce the map/territory picture! I still think it is a good framework. Rather, I wanted to gesture at how I still felt confused, despite having the map/territory picture.
I needed a different analogy, something more like a self-drawing map, to get rid of the homunculus. A picture which included the meaning-maker, rather than just meaning come from nowhere.
Teleosemantics reduces meaning-making to optimization. Aboutness becomes a type of purpose a thing can have.
One advantage of this over map-territory correspondence is that it explains the asymmetry between map and territory. Mutual information is symmetric. So why is the map about the territory, but not the other way around? Because the map has been optimized to fit the territory, not the other way around. ("Fit" in the sense of carrying high mutual information, which can be decoded via some specific intended correspondence - a symbolic language.)
What does it mean to optimize for the map to fit the territory, but not the other way around? (After all: we can improve fit between map and territory by changing either map or territory.) Maybe it's complicated, but primarily what it means is that the map is the part that's being selected in the optimization. When communicating, I'm not using my full agency to make my claims true; rather, I'm specifically selecting the claims to be true.
I take Teleosemantics to be the same idea as 'reference maintenance', and in general, highly compatible with the ideas laid out in On the Origin of Objects by Brian Cantwell Smith.
I think a further good feature of a language is that claims are individually optimized to be true. To get an accurate answer to a specific question, I want that answer to be optimized to be accurate; I don't want the whole set of possible answers to have been jointly optimized. Unfortunately, in realistic communication, we do somewhat optimize to present a simple, coherent view, rather than only optimizing each individual statement to be accurate -- doing so helps our perspective to be understood easily by the listener/reader. But my intuition is that this does violate some ideal form of honesty. (This is one of the ways I think the concept of optimizing fit may be complicated, as I mentioned earlier.)
Connotation vs Denotation
I've previously argued that the standard Bayesian world-view lacks a sufficient distinction between connotation (the probabilistic implications of a communication) and denotation (the literal meaning). Teleosemantics provides something close, since we can distinguish between what a communication probabilistically implies vs what the communication was optimized to correspond to.
For example, I might notice that someone's hair is a mess and spontaneously tell them so, out of a simple drive to comment on notable-seeming facts. I chose my words in an attempt to accurately reflect the state of affairs, based on my own observations. So the teleosemantic-meaning of my words is simply the literal: "your hair is a mess".
However, the listener will try to work out the probabilistic implications of the utterance. Why might I tell them that their hair is a mess? Perhaps I dislike them and took the opportunity to insult them. Or perhaps I consider them a close enough friend that I think a gentle insult will be taken in a sporting way. These are possible conversational implications.
In this example, teleosemantics vs probabilistic implication matches fairly well with literal meaning vs connotation. However, there are some cases which present more difficulties:
I think the best way to handle these issues is to give up on a single account of the "meaning" of an utterance, and instead invent some useful distinctions.
Obviously, often what we care about most is the raw informational content. For example, it seems plausible that in the case of ELK, that's what we care about.
Another thing we often care about is what something has been optimized for. Understanding the "purpose" of something is generally very useful for understanding and manipulating our environment, even though it's not a "physical" fact. Intended meaning is a subspecies of purpose - a symbol is supposed to represent something. This notion of meaning can include both denotation and connotation, depending on the author's intent.
But, we can further split up authorial intent:
This accounts for lying, more or less. Lying means you're optimizing for a different belief in the audience than the belief you have. But we still haven't completely pinned down what "denotation" could mean.
Another important type of intent is the intended meaning of a word in a broader, societal context. A language is meaningful in the context of a linguistic community. To a large extent, a linguistic community is setting about the business of creating a shared map of reality, and the language is the medium for the map.
This makes a linguistic community into a sort of super-agent. The common subgoal of accurate beliefs is being pooled into a group resource, which can be collectively optimized.
Obviously, the "intended meaning" of a word in this collective sense will always be somewhat vague. However, I think humans very often concern ourselves with this sort of "meaning". A linguistic community has to police its intended map-territory correspondence. This includes rooting out lies, but it also includes pedantry - policing word-meanings to keep the broader language coherent, even when there's no local intelligibility problem (so pedantry seems pointless in the moment).
One way of looking at the goal of ELK, and AI transparency more generally, is that we need an answer to the question how can we integrate AIs into our linguistic community?
The book Communicative Action and Rational Choice discusses how the behavior of a linguistic community is hard to analyze in a traditional rational-agent framework (particularly the selfish rationality of economics). Within a consequentialist framework, it seems as if communicative acts would always be optimized for their consequences, so, never be optimized for accuracy (hence, would lack meaning in the teleosemantic sense).[3] This mirrors many of the concerns for AI -- why wouldn't a highly capable AI be deceptive when it suited the AI's goals?
Even humans are often deceptive when we can get away with it. (So, eg, "raise the AI like a human" solutions don't seem very reassuring.) But humans are also honest much more often than naive consequentialism would suggest. Indeed, I think humans often communicate in the teleosemantic sense, IE optimizing accuracy.
A linguistic community also tends to become a super-agent in a stronger sense (discussed in Communicative Action and Rational Choice): coordinating actions. A member of a linguistic community is able to give and receive reasons for taking specific actions (eg, following and enforcing specific norms), rather than only swapping reasons for beliefs.
Allowing AIs to participate fully in a linguistic community in this broader sense could also be an interesting framework for thinking about alignment.
Thanks to Steve Petersen for telling me about it.
The proponent of a goodness-of-fit theory would, I think, have to argue that false beliefs harm the correspondence only a little. If you imagine holding a semi-transparent map over the territory, and rotating/sliding it into the best-fit location, the "false beliefs" would be the features which still don't fit even after we've found the best fit.
This theory implies that beliefs lose meaning at the point where accumulated errors stop us from locating a satisfying best-fit.
I think this is not quite true. For example, a blindfolded person who believes they are in London when they are in fact in Paris could have a very detailed mental map of their surroundings which is entirely wrong. You might reasonably insist that the best-fit interpretation of those beliefs is nailed down by years of more accurate beliefs about their surroundings. I'm skeptical that the balance needs to work out that way.
Moreover, it seems unfortunate for the analysis to be so dependent on global facts. Because a goodness-of-fit theory ascribes semantics based on an overall best fit, interpreting the semantics of one corner of the map depends on all corners. Perhaps some of this is unavoidable; but I think teleosemantics provides a somewhat more local theory, in which the meaning of an individual symbol depends only on what that symbol was optimized to reflect.
(For example, imagine someone writing a love letter on a map, for lack of blank paper. I think this creates somewhat more difficulty for a goodness-of-fit theory than for teleosemantics.)
To be clear, I'm not currently on board with this conclusion - I think consequentialist agents can engage in cooperative behavior, and coordinate to collectively optimize common subgoals. Just because you are, in some sense, "optimizing your utility" at all times, doesn't mean you aren't optimizing statements for accuracy (as a robust subgoal in specific circumstances).
However, it would be nice to have a more detailed picture of how this works.