The problem is that the types of hypotheses considered by Solomonoff induction are not explanations, but rather computer programs which output predictions.
Indeed. Solomonoff Inductors contain computer programmes, not explanations, not hypotheses and not beliefs. That makes it quite hard to understand the sense in which they are dealing with probability.
Probability of what? Hypotheses and beliefs have a probability of being true, of succeeding in corresponding. What does it mean to say that one programme is more probable than another? That it is short? A shorter bitstring is more likely to be found in a random sequence, but what has that to do with constructing a true model of the universe?
If you are dealing with propositions instead of programmes, it is easy to explain the relationship between simplicity and probability-of-corrrsponding-to-reality: the probability of a small conjunction of propositions is generally higher than the probability of a large number.
What does it mean to say that one programme is more probable than another? That it is short?
For a Solomonoff inductor, yes. [Basically you have "programs that have failed to predict the past" and "programs that have predicted the past", and all of the latter group that are of equal lengths are equally probable, and it must be the case that longer programs are less probable than shorter programs for the total probability to sum to 1, tho you could have a limited number of exceptions if you wanted.]
That said, in the SI paradigm, the probability of individual programs normally isn't very interesting; what's interesting is the probability of the next token, and the probability of 'equivalence classes' of programs. You might, for example, have programs that are pairs of system dynamics and random seeds (or boundary conditions or so on), and so care about "what's the aggregate remaining probability of dynamical systems of type A vs type B?", where perhaps 99% of the random seeds for type A have been ruled out, but 99.99% of the random seeds for type B have been ruled out, meaning that you think it's 100x more likely that type A describes the data source than type B.
[SI, in its full version, doesn't need to do any compression tricks like think about 'types of dynamical systems', instead just running all of them, but this is the sort of thing you might expect from a SI approximator.]
So..someone pointed out the shortcomings of a core LW belief 10 years ago...and nothing much happened. In accordance with the usual pattern. As I keep saying. But nothing much happens when I do that , either.
I'm sympathetic, as this has happened to me too. Have you considered writing a post laying our your beliefs? I expect that this will be easier than comments for people to substantively engage with.
Ok, but that isn't answering the question. I know that shortness is the criterion for saying that a programme is probable . The question is about the upshot, what that means...other than shortness. If the upshot is that a short programme is more likely to correspond to reality, then SI is indeed formalised epistemology. But why should an SI have the ability to correspond to reality, when the only thing it is is designed to do is predict observations? And how can a programme correspond when it is not semantically interpretable?
Maybe it's a category error to say of programmes that they have some level of probability.
If the upshot is that a short programme is more likely to correspond to reality, then SI is indeed formalised epistemology.
I think there are two different things going on:
First, if you want to use probabilities, then you need your total probability to sum to one, and there's not a way to make that happen unless you assign higher probabilities to shorter programs than longer programs.
Second, programs that don't correspond to observations get their probability zeroed out.
All of SI's ability to 'locate the underlying program of reality' comes from the second point. The first point is basically an accounting convenience / necessity.
Incidentally, it seems worth pointing out that I think critical rationalists and bayesian rationalists are mostly talking past each other when it comes to Solomonoff Induction. There's an interesting question of how you manage your stable of hypotheses under consideration, and the critrats seem to think it's philosophically untenable, and so induction is suspect, whereas the bayesians seem to think that it's philosophically trivial, as you can just assume away the problem by making your stable of hypotheses infinitely large. From the perspective of psychology / computer science, the details seem very interesting! But from the perspective of philosophy, I think the bayesians mostly have it right, and in fact induction in the abstract is pretty simple.
But why should an SI have the ability to correspond to reality, when the only thing it is is designed to do is predict observations?
By assumption, your observations are generated by reality. What grounds that assumption out is... more complicated, but I'm guessing not the thing you're interested in?
Maybe it's a category error to say of programmes that they have some level of probability.
I mean, I roughly agree with this in my second paragraph, but I think 'category error' is too harsh. Like, there are lots of equivalent programs, right? [That is, if I do SI by considering all text strings interpreted as python source code by a hypercomputer, then for any python-computable mathematical function, there's an infinite class of text strings that implement that function.] And so actually what we care about is closer to "the integrated probability of an upcoming token across all programs", and if you looked at your surviving programs for a sufficiently complicated world, you would likely notice that they have some deep structural similarities that suggest they're implementing roughly the same function.
[I believe it's possible to basically ignore this, and only consider the simplest implementation of each function, but this gets you into difficulty with enumerating all possible functions which you could ignore when enumerating all possible text strings. The accounting convenience pays off, at the price of making the base unit not very interesting; you want the formula underneath physics, not an infinite family of text strings that all implement the formula underneath physics, perhaps in an opaque way.]
To summarise, I interpret TAG as saying something like "when SI assigns a probability of x to a program P, what does that mean; how can we cash that out in terms of reality? And Vaniver is saying "It means that, if you sum up the probabilities assigned to all programs which implement roughly the same function, then you get the probability that this function is 'the underlying program of reality'".
I think there are three key issues with this response (if I've understood it correctly):
I agree with those issues.
I think the way you expressed issue 3 makes it too much of a clone of issue 1; if I tell you the bounds for the question in terms of programs, then I think there is a general way to apply SI to get a sensible bounded answer. If I tell you the bounds in terms of functions, then there would be a general way to incorporate that info into SI, if you knew how to move between functions and programs.
The way I think about those issues that (I think?) separates them more cleanly is that we both have to figure out the 'compression' problem of how to consider 'models' as families of programs (at some level of abstraction, at least) and the 'elaboration' problem of how to repopulate our stable of candidates when we rule out too many of the existing ones. SI bypasses the first and gives a trivial answer to the second, but a realistic intelligence will have interesting answers to both.
There’s no general way to apply SI to answer a bounded question with a sensible bounded answer. Hence, when you say “you can make your stable of hypotheses infinitely large”, this is misleading: programs aren’t hypotheses, or explanations, in the normal sense of the word, for almost all of the questions we’d like to understand
And it's also unclear, to say the least , that the criterion that an SI uses to prefer and discard hypotheses/programmes actually is a probability, despite being labelled as such.
I think there are two different things going on:
First, if you want to use probabilities, then you need your total probability to sum to one, and there’s not a way to make that happen unless you assign higher probabilities to shorter programs than longer programs.
Second, programs that don’t correspond to observations get their probability zeroed out.
You still haven't answered the question "probability of what?".
You have a process that assigns a quantity to a thing. The details of how the quantity gets assigned are not the issue. The issues are whether the quantity, which you have called a probability , actually is probability , and whether the thing you at treating as a model of reality, is actually a such a model , in the sense of scientific realism, or merely something that churns out predictions, in the sense of instrumentalism.
Labelling isn't enough.
All of SI’s ability to ‘locate the underlying program of reality’ comes from the second point. The first point is basically an accounting convenience / necessity.
You haven't shown that it has any such ability. Prediction is not correspondence.
the bayesians seem to think that it’s philosophically trivial, as you can just assume away the problem by making your stable of hypotheses infinitely large.
...and casually equating programmes and hypotheses and casually equating prediction and correspondence...
The fact that bayesians don't have an containing every possible hypothesis, combined with the fact that they also dont have a method of hypothesis w formation is a problem...but it's not the problem I am talking about today.
But why should an SI have the ability to correspond to reality, when the only thing it is is designed to do is predict observations?
By assumption, your observations are generated by reality.
That doesn't answer the question. The issue is not whether reality exists, the question is which theories correspond to it. What reality is, not whether reality is.
What grounds that assumption out is… more complicated, but I’m guessing not the thing you’re interested in?
" why should an SI have the ability to correspond to reality, when the only thing it is is designed to do is predict observations?"
You still haven't told me. It's possible for a predictive theory to fail to correspond, so there is no link of necessity between prediction and correspondence.
Maybe it’s a category error to say of programmes that they have some level of probability.
I mean, I roughly agree with this in my second paragraph, but I think ‘category error’ is too harsh. Like, there are lots of equivalent programs, right? [That is, if I do SI by considering all text strings interpreted as python source code by a hypercomputer, then for any python-computable mathematical function, there’s an infinite class of text strings that implement that function.] And so actually what we care about is closer to “the integrated probability of an upcoming token across all programs”,
What I care about is finding a correct ontological model of reality. Caring about which programmes predict upcoming tokens is a means to that end. There is a well defined and conventional sense in which upcoming tokens have a certain probability, because the arrival of a token is an event, and conventional probability theory deals with events.
But the question is about the probability of the programme itself.
Even if programmes are actually making distinct claims about reality , which has not been shown, then some "integration" of different programmes is not going to be a clear model!
and if you looked at your surviving programs for a sufficiently complicated world, you would likely notice that they have some deep structural similarities that suggest they’re implementing roughly the same function.
No. In general it's possible for completely different algorithms to produce equivalent results.
But are you saying that the "deep structure" is the ontologcal content?
The issues are whether the quantity, which you have called a probability , actually is probability , and whether the thing you at treating as a model of reality, is actually a such a model , in the sense of scientific realism, or merely something that churns out predictions, in the sense of instrumentalism.
I'm not quite sure how to respond to this; like, I think you're right that SI is not solving the hard problem, but I think you're wrong that SI is not solving the easy problem. Quite possibly we're in violent agreement on both points, and disagree on whether or not the easy problem is worth solving?
For example, I think the quantity actually is a probability, in that it satisfies all of the desiderata that probability theory places on quantities. Do I think it's the probability that the actual source code of the universe is that particular implementation? Well, it sure seems shaky to have as an axiom that God prefers shorter variable names, but since probability is in the mind, I don't want to rule out any programs a priori, and there are more programs with longer variable names than programs with shorter variable names, I don't see any other way to express what my state of ignorance would be given infinite cognitive resources.
Also, I'm not really sure what a model of reality in the sense of scientific realism is, whereas I'm pretty sure I know what python programs are. So SI doesn't make the problem of finding the scientific realism models any easier if you're confident that those models are not just programs.
But are you saying that the "deep structure" is the ontologcal content?
I think the answer to this is "yes." That is, I generated an example to try to highlight what I think SI can do and what its limitations are, and that those limitations are fundamental to not being able to observe all of the world, and then realized I had written the example in terms of models instead of in terms of programs. (For this toy example, switching between them was easy, but obviously that doesn't generalize.)
My current suspicion is that we're having this discussion, actually; it feels to me like if you were the hypercomputer running SI, you wouldn't see the point of the ontological content; you could just integrate across all the hypotheses and have perfectly expressed your uncertainty about the world. But if you're running an algorithm that uses caching or otherwise clumps things together, those intermediate variables feel like they're necessary objects that need to be explained and generated somehow.
[Like, it might be interesting to look at the list of outputs that a model in the sense of scientific realism could give you, and ask if SI could also give you those outputs with minimal adjustment.]
The issues are whether the quantity, which you have called a probability , actually is probability , and whether the thing you at treating as a model of reality, is actually a such a model , in the sense of scientific realism, or merely something that churns out predictions, in the sense of instrumentalism.
I’m not quite sure how to respond to this; like, I think you’re right that SI is not solving the hard problem, but I think you’re wrong that SI is not solving the easy problem.
What are the hard and easy problems? Realism and instrumentalism? I haven't said that SI is incapable of instrumentalism (prediction). Indeed, that might be the only thing it can do.
For example, I think the quantity actually is a probability, in that it satisfies all of the desiderata that probability theory places on quantities.
I think the mathematical constraints are clearly insufficient to show that something is a probability, even if they are necessary. If I have a cake of 1m^2, and I cut it up. Then the pieces sum to 1. But pieces of cake aren't probabilities.
Do I think it’s the probability that the actual source code of the universe is that particular implementation? Well, it sure seems shaky to have as an axiom that God prefers shorter variable names, but since probability is in the mind, I don’t want to rule out any programs a priori, and there are more programs with longer variable names than programs with shorter variable names, I don’t see any other way to express what my state of ignorance would be given infinite cognitive resources.
So every hypothesis has the same probability of "not impossible". Well, no, several times over. You haven't shown that programmes are hypotheses, and what an SI is doing is assigning different non zero order probabilities, not a uniform one, and it is doing so based on programme length, although we don't know that reality is a programme, and so on.
Also, I’m not really sure what a model of reality in the sense of scientific realism is,
Do you think scientists are equally troubled?
But are you saying that the “deep structure” is the ontologcal content?
My current suspicion is that we’re having this discussion, actually; it feels to me like if you were the hypercomputer running SI, you wouldn’t see the point of the ontological content; you could just integrate across all the hypotheses and have perfectly expressed your uncertainty about the world.
Even if I no longer have a instrumental need for something, I can terminally value it.
But it isn't about me.
The rational sphere in general value realism, and make realistic claims. Yudkowsky has made claims about God not existing, and MWI being true that are explicitly based on SI style reasoning. So the cat is out of the bag... SI cannot be defended as something that was only ever intended as an instrumentalist predictor without walking back those claims.
But if you’re running an algorithm that uses caching or otherwise clumps things together, those intermediate variables feel like they’re necessary objects that need to be explained and generated somehow.
You'.re saying realism is an illusion? Maybe that's your philosophy, but it's not the less wrong philosophy.
[Like, it might be interesting to look at the list of outputs that a model in the sense of scientific realism could give you, and ask if SI could also give you those outputs with minimal adjustment.]
It's obvious that it could, but so what?
You haven't shown that programmes are hypotheses, and what an SI is doing is assigning different non zero order probabilities, not a uniform one, and it is doing so based on programme length, although we don't know that reality is a programme, and so on.
SI only works for computable universes; otherwise you're out of luck. If you're in an uncomputable universe... I'm not sure what your options are, actually. [If you are in a computable universe, then there must be a program that corresponds to it, because otherwise it would be uncomputable!]
You can't assign a uniform probability to all the programs, because there are infinitely many, and while there is a mathematically well-defined "infinitely tall function" there isn't a mathematically well-defined "infinitely flat function."
[Like, I keep getting the sense you want SI to justify the assumption that God prefers shorter variable names, and not liking the boring answer that in a condition of ignorance, our guess has to be that way because there are more ways to have long variable names than short variable names, and that our guess isn't really related to the implementation details of the actual program God wrote.]
Do you think scientists are equally troubled?
I mean, I don't know how muscles work, outside of the barest details, and yet I can use mine just fine. If I had to build some without being able to order them from a catalog, I'd be in trouble. I think AI researchers trying to make artificial scientists are equally troubled, and that's the standard I'm trying to hold myself to.
You'.re saying realism is an illusion?
I don't know what you mean by realism or illusion? Like, there's a way in which "my hand" is an illusion, in that it's a concept that exists in my mind, only somewhat related to physical reality. When I probe the edges of the concept, I can see the seams and how it fails to correspond to the underlying reality.
Like, the slogan that seems relevant here is "the map is not the territory." If realism means "there's a territory out there", I'm sold on there being a territory out there. If realism means "there is a map in the territory", I agree in the boring sense that a map exists in your head which exists in the territory, but think that makes some classes of arguments about the map confused.
For example, I could try to get worked up over whether, when I put hand sanitizer on my hands, the sanitizer becomes "part of my hand", but it seems wiser to swap out the coarse "hand" model with a finer "molecules and permeable tissue" model, where the "part of my hand" question no longer feels sensible, and "where are these molecules?" question does feel sensible.
One of the ways in which SI seems conceptually useful is that it lets me ask questions like "would different programs say 'yes' and 'no' to the question of 'is the sanitizer part of my hand'?". If I can't ground out the question in some deep territorial way, then it feels like the question isn't really about the territory.
SI only works for computable universes; otherwise you’re out of luck
SI cannot generate realistic hypotheses about uncomputable universes , but it doesn't follow that it can generate realistic hypotheses about computable universes.
You can’t assign a uniform probability to all the programs, because there are infinitely many, and while there is a mathematically well-defined “infinitely tall function” there isn’t a mathematically well-defined “infinitely flat function.”
The fact that an SI must sort and filter candidate functions does not mean it s doing so according to probability.
our guess has to be that way because there are more ways to have long variables
Given the assumptions that you have an infinite number of prgrammes, and that you need to come to a determinate result in finite time, then you need to favour shorter programmes. That's a reasonable justification for the operation of an SI which happens to have nothing to do truth or probability or reference or realism. (You lapsed into describing the quantity an SI sorts programmes by as "probability"...that has not, of course, been established)
If I can’t ground out the question in some deep territorial way, then it feels like the question isn’t really about the territory.
You haven't shown that an SI is capable of anything deep and territorial. After all,it's only trying to predict observations.
Given the assumptions that you have an infinite number of prgrammes, and that you need to come to a determinate result in finite time, then you need to favour shorter programmes.
If you need to come to a determinate result in a finite number of computational steps (my replacement for 'time'), then SI isn't the tool for you. It's the most general and data-efficient predictor possible, at the cost of totally exploding the computational budget.
I think if you are trying to evaluate a finite set of programs in finite time, it's not obvious that program length is the thing to sort them by; I think the speed prior makes more sense, and I think actual humans are doing something meaningfully different.
---
I currently don't see all that much value in responding to "You haven't shown / established" claims; like, SI is what it is, you seem to have strong opinions about how it should label particular things, and I don't think those opinions are about the part of SI that's interesting, or about why it's only useful as a hypothetical model (I think attacks from this angle are more compelling on that front). If you're getting value out of this exchange, I can give responding to your comments another go, but I'm not sure I have new things to say about the association between observations and underlying reality or aggregation of possibilities through the use of probabilities. (Maybe I have elaborations that would either more clearly convey my point, or expose the mistakes I'm making?)
If you need to come to a determinate result in a finite number of computational steps (my replacement for ‘time’), then SI isn’t the tool for you
It isn't any a tool for anybody because it's uncomputable. Whatever interest it has must be theoretical.
I'm responding to claims that SI can solve long standing philosophical puzzles such as the existence of God or the correct interpretation of quantum mechanics. The claims have been made, and they have been made here but they may not have been made by you.
I'm responding to claims that SI can solve long standing philosophical puzzles such as the existence of God or the correct interpretation of quantum mechanics.
Ah, I see. I'm not sure I would describe SI as 'solving' those puzzles, rather than recasting them in a clearer light.
Like, a program which contains Zeus and Hera will give rather different observations than a program which doesn't. On the other hand, when we look at programs that give the same observations, one of which also simulates a causally disconnected God and the other of which doesn't, then it should be clear that those programs look the same from our stream of observations (by definition!) and so we can't learn anything about them through empirical investigation (like with p-zombies).
So in my mind, the interesting "theism vs. atheism?" question is the question of whether there are activist gods out there; if Ares actually exists, then you (probably) profit by not displeasing him. Beliefs should pay rent in anticipated experiences, which feels like a very SI position to have.
Of course, it's possible to have a causally disconnected afterlife downstream of us, where things that we do now can affect it and nothing we do there can affect us now. [This relationship should already be familiar from the relationship between the past and the present.] SI doesn't rule that out--it can't until you get relevant observations!--but the underlying intuition notes that the causal disconnection makes it pretty hard to figure out which afterlife. [This is the response to Pascal's Wager where you say "well, but what about anti-God, who sends you to hell for being a Christian and sends you to heaven for not being one?", and then you get into how likely it is that you have an activist God that then steps back, and arguments between Christians as to whether or not miracles happen in the present day.]
But I think the actual productive path, once you're moderately confident Zeus isn't on Olympus, is not trying to figure out if invisi-Zeus is in causally-disconnected-Olympus, but looking at humans to figure out why they would have thought Zeus was intuitively likely in the first place; this is the dissolving the question approach.
With regard to QM, when I read through this post, it is relying pretty heavily on Occam's Razor, which (for Eliezer at least) I assume is backed by SI. But it's in the normal way where, if you want to postulate something other than the simplest hypothesis, you have to make additional choices, and that each choice that could have been different loses you points in the game of Follow-The-Improbability. But a thing that I hadn't noticed before this conversation, which seems pretty interesting to me, is that whether you prefer MWI might depend on whether you use the simplicity prior or the speed prior, and then I think the real argument for MWI rests more on the arguments here than on Occam's Razor grounds (except for the way in which you think a physics that follows all the same principles is more likely because of Occam's Razor on principles, which might be people's justification for that?).
Ah, I see. I’m not sure I would describe SI as ‘solving’ those puzzles, rather than recasting them in a clearer light
The claim has been made , even if you don't believe it.
Beliefs should pay rent in anticipated experiences, which feels like a very SI position to have
Rationalists don't consistently believe that, because if they did , they would be indfferent about MW versus Copenhagen , since all interpretations make the same predictions. Lesswrongian epistemology isn't even consistent.
But I think the actual productive path, once you’re moderately confident Zeus isn’t on Olympus, is not trying to figure out if invisi-Zeus is in causally-disconnected-Olympus, but looking at humans to figure out why they would have thought Zeus was intuitively likely in the first place; this is the dissolving the question approach
If you can have a non empirical reason to believe in non interacting branches of the universal wave function, your theist opponents can have a non empirical reason to believe in non interacting gods.
With regard to QM, when I read through this post, it is relying pretty heavily on Occam’s Razor, which (for Eliezer at least) I assume is backed by SI
Of course not. SI can't tell you why simplicity matters, epistemologically. At the same time. It is clear that simplicity is no additional help in making predictions. Once you have filtered out the non predictive order programmes, the remaining ones are all equally predictive ... so whatever simpliciy is supplying, it isn't extra productiveness. The obvious answer is that it's some ability to show that, out of N equally predictive theories, one corresponds to reality.
That's a standard defence of Occam's razor. It isn't given by SI, as we have seen. SI just needs the simplicity criterion in order to be able to spit something out.
But there are other defenses of Occam's razor.
And the traditional versions don't settle everything in favour of MWI and against (sophisticated versions of) God..those are open questions.
And SI isnt a new improved version of Occam's razor. In fact , it is unable to relate simplicity to truth.
But a thing that I hadn’t noticed before this conversation, which seems pretty interesting to me, is that whether you prefer MWI might depend on whether you use the simplicity prior or the speed prior, a
These old problems are open problems because we can't agree on which kind of simplicity is relevant. SI doesn't help because it introduces yet another simplicity measure. Or maybe two,the speed prior and the space prior.
I think the real argument for MWI rests more on the arguments here
Wrongly conflates Copenhagen with Objective Reduction.
Wrongly assumes MW is the only alternative to "Copenhagen".
Science aims to come up with good theories about the world - but what makes a theory good? The standard view is that the key traits are predictive accuracy and simplicity. Deutsch focuses instead on the concepts of explanation and understanding: a good theory is an explanation which enhances our understanding of the world. This is already a substantive claim, because various schools of instrumentalism have been fairly influential in the philosophy of science. I do think that this perspective has a lot of potential, and later in this essay explore some ways to extend it. First, though, I discuss a few of Deutsch's arguments which I don't think succeed, in particular when compared to the bayesian rationalist position defended by Yudkowsky.
To start, Deutsch says that good explanations are “hard to vary”, because every part of the explanation is playing a role. But this seems very similar to the standard criterion of simplicity. Deutsch rejects simplicity as a criterion because he claims that theories like “The gods did it” are simple. Yet I’m persuaded by Yudkowsky’s argument that a version of “The gods did it” theory which could actually predict a given set of data would essentially need to encode all that data, making it very complex. I’m not sold on Yudkowsky’s definition of simplicity in terms of Kolmogorov complexity (for reasons I’ll explain later on) but re-encoding a lot of data should give rise to a complex hypothesis by any reasonable definition. So it seems most parsimonious to interpret the “hard to vary” criterion as an implication of the simplicity criterion.
Secondly, Deutsch says that good explanations aren’t just predictive, but rather tell us about the underlying mechanisms which generate those predictions. As an illustration, he argues that even if we can predict the outcome of a magic trick, what we really want to know is how the trick works. But this argument doesn’t help very much in adjudicating between scientific theories - in practice, it’s often valuable to accept purely predictive theories as stepping-stones to more complete theories. For example, Newton’s inverse square law of gravity was a great theory despite not attempting to explain why gravity worked that way; instead it paved the way for future theories which did so (and which also made better predictions). If Deutsch is just arguing that eventually science should aim to identify all the relevant underlying mechanisms, then I think that most scientific realists would agree with him. The main exception would be in the context of foundational physics. Yet that’s a domain in which it’s very unclear what it means for an underlying mechanism to “really exist”; it’s so far removed from our everyday intuitions that Deutsch’s magician analogy doesn’t seem very applicable.
Thirdly, Deutsch says that we can understand the importance of testability in terms of the difference between good and bad explanations:
But this doesn’t help us distinguish between explanations which have themselves been tested, versus explanations which were formulated afterwards to match the data from those same tests. Both are equally constrained by existing knowledge - why should we be more confident in the former? Without filling in this step of the argument, it’s hard to understand the central role of testability in science. I think, again, that Yudkowsky provides the best explanation: that the human tendency towards hindsight bias means we dramatically overestimate how well our theories explain observed data, unless we’re forced to make predictions in advance.
Having said all this, I do think that Deutsch’s perspective is valuable in other ways. I was particularly struck by his argument that the “theory of everything” which fundamental physicists search for would be less interesting than a high-level “theory of everything” which forges deep links between ideas from many disciplines (although I wish he’d say a bit more about what it means for a theory to be “deep”). This argument (along with the rest of Deutsch’s framework) pushes back against the longstanding bias in philosophy of science towards treating physics as the central example of science. In particular, thinking of theories as sets of equations is often appropriate for physics, but much less so for fields which are less formalism-based - i.e. almost all of them.[0] For example, the theory of evolution is one of the greatest scientific breakthroughs, and yet its key insights can’t be captured by a formal model. In Chapman’s terminology, evolution and most other theories are somewhat nebulous. This fits well with Deutsch’s focus on science as a means of understanding the world - because even though formalisms don’t deal well with nebulosity, our minds do.
Another implication of the nebulosity of scientific theories is that we should move beyond the true-false dichotomy when discussing them. Bayesian philosophy of science is based on our credences about how likely theories are to be true. But it’s almost never the case that high-level theories are totally true or totally false; they can explain our observations pretty well even if they don’t account for everything, or are built on somewhat leaky abstractions. And so assigning probabilities only to the two outcomes “true” and “false” seems simplistic. I still consider probabilistic thinking about science to be valuable, but I expect that thinking in terms of degrees of truth is just as valuable. And the latter comes naturally from thinking of theories as explanations, because we intuitively understand that the quality of explanations should be evaluated in a continuous rather than binary way.[1]
Lastly, Deutsch provides a good critique of philosophical positions which emphasise prediction over explanation. He asks us to imagine an “experiment oracle” which is able to tell us exactly what the outcome of any specified experiment would be:
Although I assume it isn’t intended as such, this is a strong critique of Solomonoff induction, a framework which Yudkowsky defends as an idealised model for how to reason. The problem is that the types of hypotheses considered by Solomonoff induction are not explanations, but rather computer programs which output predictions. This means that even a hypothesis which is assigned very high credence by Solomonoff induction might be nearly as incomprehensible as the world itself, or more so - for example, if it merely consists of a simulation of our world. So I agree with Deutsch: even idealised Solomonoff induction (with infinite compute) would lack some crucial properties of explanatory science.[2]
Extending the view of science as explanation
How could Deutsch’s identification of the role of science as producing human-comprehensible explanations actually improve science in practice? One way is by making use of the social science literature on explanations. Miller identifies four overarching lessons:
We can apply some of these lessons to improve scientific explanations. Consider that scientific theories are usually formulated in terms of existing phenomena. But to formulate properly contrastive explanations, science will need to refer to counterfactuals. For example, in order to fully explain the anatomy of an animal species, we’ll need to understand other possible anatomical structures, and the reasons why those didn’t evolve instead. Geoffrey West’s work on scaling laws in biology provides a good example of this type of explanation. Similarly, we shouldn’t think of fundamental physics as complete until we understand not only how our universe works, but also which counterfactual laws of physics could have generated other universes as interesting as ours.
A second way we can try to use Deutsch’s framework to improve science: what does it mean for a human to understand an explanation? Can we use findings from cognitive science, psychology or neuroscience to make suggestions for the types of theories scientists work towards? This seems rather difficult, but I’m optimistic that there’s some progress to be made. For example, analogies and metaphors play an extensive role in everyday human cognition, as highlighted by Lakoff’s Metaphors we live by. So instead of thinking about analogies as useful ways to communicate a scientific theory, perhaps we should consider them (in some cases) to be a core part of the theory itself. Focusing on analogies may slightly reduce those theories’ predictive power (because it’s hard to cash out analogies in terms of predictions) while nevertheless increasing the extent to which they allow us to actually understand the world. I’m reminded of the elaborate comparison between self-reference in mathematics and self-replication in biology drawn by Hofstadter in Godel, Escher, Bach - if we prioritise a vision of science as understanding, then this sort of work should be much more common. However, the human tendency towards hindsight bias is a formidable opponent, and so we should always demand that such theories also provide novel predictions, in order to prevent ourselves from generating an illusion of understanding.
[0]. As an example of this bias, see the first two perspectives on scientific theories discussed here; my position is closest to the third, the pragmatic view.
[1]. Work on logical induction and embedded agency may partly address this issue; I’m not sure.
[2]. I was originally planning to go on to discuss Deutsch’s broader critiques of empiricism and induction. But Deutsch makes it hard to do this, because he doesn’t refer very much to the philosophical literature, or specific people whose views he disagrees with. It seems to me that this leads to a lot of linguistic disagreements. For example, when he critiques the idea of knowledge being “derived” from experience, or scientific theories being “justified” by empirical experience, I feel like he’s using definitions of these terms which diverge both from what most people take them to mean, and also from what most philosophers take them to mean. Nor do I think that his characterisation of observation as theory-laden is inconsistent with standard inductivism; he seems to think it is, but doesn’t provide evidence for that. So I’ve decided not to go deeper on these issues, except to note my skepticism about his position.