Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
A serious possibility is that the first AGI(s) will be developed in a Manhattan Project style setting before any sort of friendliness/safety constraints can be integrated reliably. They will also be substantially short of the intelligence required to exponentially self-improve. Within a certain range of development and intelligence, containment protocols can make them safe to interact with. This means they can be studied experimentally, and the architecture(s) used to create them better understood, furthering the goal of safely using AI in less constrained settings.
Setting the Scene
Technological and/or Political issues could force the development of AI without theoretical safety guarantees that we'd certainly like, but there is a silver lining
A lot of the discussion around LessWrong and MIRI that I've seen (and I haven't seen all of it, please send links!) seems to focus very strongly on the situation of an AI that can self-modify or construct further AIs, resulting in an exponential explosion of intelligence (FOOM/Singularity). The focus on FAI is on finding an architecture that can be explicitly constrained (and a constraint set that won't fail to do what we desire).
My argument is essentially that there could be a critical multi-year period preceding any possible exponentially self-improving intelligence during which a series of AGIs of varying intelligence, flexibility and architecture will be built. This period will be fast and frantic, but it will be incredibly fruitful and vital both in figuring out how to make an AI sufficiently strong to exponentially self-improve and in how to make it safe and friendly (or develop protocols to bridge the even riskier period between when we can develop FOOM-capable AIs and when we can ensure their safety).
The requirement for a hard singularity, an exponentially self-improving AI, is that the AI can substantially improve itself in a way that enhances its ability to further improve itself, which requires the ability to modify its own code; access to resources like time, data, and hardware to facilitate these modifications; and the intelligence to execute a fruitful self-modification strategy.
The first two conditions can (and should) be directly restricted. I'll elaborate more on that later, but basically any AI should be very carefully sandboxed (unable to affect its software environment), and should have access to resources strictly controlled. Perhaps no data goes in without human approval or while the AI is running. Perhaps nothing comes out either. Even a hyperpersuasive hyperintelligence will be slowed down (at least) if it can only interact with prespecified tests (how do you test AGI? No idea but it shouldn't be harder than friendliness). This isn't a perfect situation. Eliezer Yudkowsky presents several arguments for why an intelligence explosion could happen even when resources are constrained, (see Section 3 of Intelligence Explosion Microeconomics) not to mention ways that those constraints could be defied even if engineered perfectly (by the way, I would happily run the AI box experiment with anybody, I think it is absurd that anyone would fail it! [I've read Tuxedage's accounts, and I think I actually do understand how a gatekeeper could fail, but I also believe I understand how one could be trained to succeed even against a much stronger foe than any person who has played the part of the AI]).
But the third emerges from the way technology typically develops. I believe it is incredibly unlikely that an AGI will develop in somebody's basement, or even in a small national lab or top corporate lab. When there is no clear notion of what a technology will look like, it is usually not developed. Positive, productive accidents are somewhat rare in science, but they are remarkably rare in engineering (please, give counterexamples!). The creation of an AGI will likely not happen by accident; there will be a well-funded, concrete research and development plan that leads up to it. An AI Manhattan Project described above. But even when there is a good plan successfully executed, prototypes are slow, fragile, and poor-quality compared to what is possible even with approaches using the same underlying technology. It seems very likely to me that the first AGI will be a Chicago Pile, not a Trinity; recognizably a breakthrough but with proper consideration not immediately dangerous or unmanageable. [Note, you don't have to believe this to read the rest of this. If you disagree, consider the virtues of redundancy and the question of what safety an AI development effort should implement if they can't be persuaded to delay long enough for theoretically sound methods to become available].
A Manhattan Project style effort makes a relatively weak, controllable AI even more likely, because not only can such a project implement substantial safety protocols that are explicitly researched in parallel with primary development, but also because the total resources, in hardware and brainpower, devoted to the AI will be much greater than a smaller project, and therefore setting a correspondingly higher bar for the AGI thus created to reach to be able to successfully self-modify itself exponentially and also break the security procedures.
Strategies to handle AIs in the proto-Singularity, and why they're important
First, take a look the External Constraints Section of this MIRI Report and/or this article on AI Boxing. I will be talking mainly about these approaches. There are certainly others, but these are the easiest to extrapolate from current computer security.
These AIs will provide us with the experimental knowledge to better handle the construction of even stronger AIs. If careful, we will be able to use these proto-Singularity AIs to learn about the nature of intelligence and cognition, to perform economically valuable tasks, and to test theories of friendliness (not perfectly, but well enough to start).
"If careful" is the key phrase. I mentioned sandboxing above. And computer security is key to any attempt to contain an AI. Monitoring the source code, and setting a threshold for too much changing too fast at which point a failsafe freezes all computation; keeping extremely strict control over copies of the source. Some architectures will be more inherently dangerous and less predictable than others. A simulation of a physical brain, for instance, will be fairly opaque (depending on how far neuroscience has gone) but could have almost no potential to self-improve to an uncontrollable degree if its access to hardware is limited (it won't be able to make itself much more efficient on fixed resources). Other architectures will have other properties. Some will be utility optimizing agents. Some will have behaviors but no clear utility. Some will be opaque, some transparent.
All will have a theory to how they operate, which can be refined by actual experimentation. This is what we can gain! We can set up controlled scenarios like honeypots to catch malevolence. We can evaluate our ability to monitor and read the thoughts of the agi. We can develop stronger theories of how damaging self-modification actually is to imposed constraints. We can test our abilities to add constraints to even the base state. But do I really have to justify the value of experimentation?
I am familiar with criticisms based on absolutley incomprehensibly perceptive and persuasive hyperintelligences being able to overcome any security, but I've tried to outline above why I don't think we'd be dealing with that case.
Right now AGI is really a political non-issue. Blue sky even compared to space exploration and fusion both of which actually receive funding from government in substantial volumes. I think that this will change in the period immediately leading up to my hypothesized AI Manhattan Project. The AI Manhattan Project can only happen with a lot of political will behind it, which will probably mean a spiral of scientific advancements, hype and threat of competition from external unfriendly sources. Think space race.
So suppose that the first few AIs are built under well controlled conditions. Friendliness is still not perfected, but we think/hope we've learned some valuable basics. But now people want to use the AIs for something. So what should be done at this point?
I won't try to speculate what happens next (well you can probably persuade me to, but it might not be as valuable), beyond extensions of the protocols I've already laid out, hybridized with notions like Oracle AI. It certainly gets a lot harder, but hopefully experimentation on the first, highly-controlled generation of AI to get a better understanding of their architectural fundamentals, combined with more direct research on friendliness in general would provide the groundwork for this.
Cross-posted at Practical Ethics.
Many have pronounced that the era of innovation dead, peace be to its soul. From Tyler Cowen's decree that we've picked all the low hanging fruit of innovation, through Robert Gordon's idea that further innovation growth is threatened by "six headwinds", to Gary Karparov's and Peter Thiel's theory that risk aversion has stifled innovation, there is no lack of predictions about the end of discovery.
I don't propose to address the issue with something as practical and useful as actual data. Instead, staying true to my philosophical environment, I propose a thought experiment that hopefully may shed some light. The core idea is that we might be underestimating the impact of innovation because we have so much of it.
Imagine that technological innovation had for some reason stopped around the 1945 - with one exception: the CD and CD player/burner. Fast forwards a few decades, and visualise society. We can imagine a society completely dominated by the CD. We'd have all the usual uses for the CD - music, songs and similar - of course, but also much more.
There is a lot of bad science and controversy in the realm of how have a healthy lifestyle. Every week we are bombarded with new studies conflicting older studies telling us X is good or Y is bad. Eventually we reach our psychological limit, throw up our hands, and give up. I used to do this a lot. I knew exercise was good, I knew flossing was good, and I wanted to eat better. But I never acted on any of that knowledge. I would feel guilty when I thought about this stuff and go back to what I was doing. Unsurprisingly, this didn't really cause me to make any positive lifestyle changes.
Instead of vaguely guilt-tripping you with potentially unreliable science news, this post aims to provide an overview lifestyle interventions that have very strong evidence behind them and concrete ways to implement them.
Summary: If you want to predict arbitrary computable patterns of data, Solomonoff induction is the optimal way to go about it — provided that you're an eternal transcendent hypercomputer. A real-world AGI, however, won't be immortal and unchanging. It will need to form hypotheses about its own physical state, including predictions about possible upgrades or damage to its hardware; and it will need bridge hypotheses linking its hardware states to its software states. As such, the project of building an AGI demands that we come up with a new formalism for constructing (and allocating prior probabilities to) hypotheses. It will not involve just building increasingly good computable approximations of AIXI.
Solomonoff's inductive inference system will learn to correctly predict any computable sequence with only the absolute minimum amount of data. It would thus, in some sense, be the perfect universal prediction algorithm, if only it were computable.
Perhaps you've been handed the beginning of a sequence like 1, 2, 4, 8… and you want to predict what the next number will be. Perhaps you've paused a movie, and are trying to guess what the next frame will look like. Or perhaps you've read the first half of an article on the Algerian Civil War, and you want to know how likely it is that the second half describes a decrease in GDP. Since all of the information in these scenarios can be represented as patterns of numbers, they can all be treated as rule-governed sequences like the 1, 2, 4, 8… case. Complicated sequences, but sequences all the same.
It's been argued that in all of these cases, one unique idealization predicts what comes next better than any computable method: Solomonoff induction. No matter how limited your knowledge is, or how wide the space of computable rules that could be responsible for your observations, the ideal answer is always the same: Solomonoff induction.
Solomonoff induction has only a few components. It has one free parameter, a choice of universal Turing machine. Once we specify a Turing machine, that gives us a fixed encoding for the set of all possible programs that print a sequence of 0s and 1s. Since every program has a specification, we call the number of bits in the program's specification its "complexity"; the shorter the program's code, the simpler we say it is.
Solomonoff induction takes this infinitely large bundle of programs and assigns each one a prior probability proportional to its simplicity. Every time the program requires one more bit, its prior probability goes down by a factor of 2, since there are then twice as many possible computer programs that complicated. This ensures the sum over all programs' prior probabilities equals 1, even though the number of programs is infinite.2
Not "rationality evangelism", which CFAR is doing already if I understand their mission. "Rational evangelism", which is what CFAR would do if they were Catholic missionaries.
If you believe in Hell, as many people very truly do, it is hard for Hell not to seem like the world's most important problem.
To some extent, proselytizing religions treat Hell with respect--they spend billions of dollars trying to save sinners, and the most devout often spend their lives preaching the Gospel (insert non-Christian variant).
But is Hell given enough respect? Every group meets with mixed success in solving its problems, but the problem of eternal suffering leaves little room for "mixed success". Even the most powerful religions are stuck in patterns that make the work of salvation very difficult indeed. And some seem willing to reduce their evangelism* for reasons that aren't especially convincing in the face of "nonbelievers are quite possibly going to burn, or at least be outside the presence of God, forever".
What if you were a rationalist who viewed Hell like certain Less Wrongers view the Singularity? (This belief would be hard to reconcile with rationalism generally, but for the sake of argument...) How would you tackle the problem of eternal suffering with the same passion we spend on probability theory and friendly AI?
I wrote a long thought experiment to better define the problem, involving a religion called "Normomism", but it was awkward. There are plenty of real religions whose members believe in Hell, or at least in a Heaven that many people aren't going to (also a terrible loss). Some have a stated mission of saving as many people as possible from a bad afterlife.
So where are they falling short?
If you were the Pope, or the Caliph, or the supreme dictator of some smaller religion, what tactics would you use to convince more people to do and believe exactly the things that would save them--whether that's faith or good works? Why haven't these tactics been tried already? Is there really much room for improvement?
Spreading the Word
This post isn't a dig at believers, though it does seem like many people don't act on their sincere belief in an eternal afterlife. (I don't mind when people try to convert me--at least they care!)
My main point: It's worth considering that people who believe in Very Bad Future Outcomes have been working to prevent those outcomes for thousands of years, and have stumbled upon formidable techniques for doing so.
I've thought for a while about rational evangelism, and it's surprisingly hard to come up with ways that people like Rick Warren and Jerry Lovett could improve their methodology. (Read Lovett's "contact me" paragraph for the part that really impressed me.)
We speak often of borrowing from religion, but these conversations mostly touch on social bonding, rather than what it means to spread ideas so important that the fate of the human race depends on them. ("Raising the Sanity Waterline" is a great start, but those ideas haven't been the focus of many recent posts.)
I'm not saying this is a perfect comparison. The rationalist war for the future won't be fought one soul at a time, and we won't save anyone with a deathbed confession.
But cryogenic freezing does exist. And on a more collective level, convincing the right people that the far future matters could be a coup on the level of Constantine's conversion.
CFAR is doing good things in the direction of rationality evangelism. How can the rest of us do more?
Living Like We Mean It
This movement is going places. But I fear we may spend too much time (at least proportionally) arguing amongst ourselves, when bringing others into the fold is a key piece of the puzzle. And if we’d like to expand the flock (or, more appropriately, the herd of cats), what can we learn from history’s most persuasive organizations?
I often pass up my chance to talk to people about something as simple as Givewell, let alone existential risk, and it's been a long time since I last name-dropped a Less Wrong technique. I don't think I'm alone in this.**
I've met plenty of Christians who exude the same optimism and conviviality as a Rick Warren or a Ned Flanders. These kinds of people are a major boon for the Christian religion. Even if most of us are introverts, what's stopping us from teaching ourselves to live the same way?
Still, I'm new here, and I could be wrong. What do you think?
* Text editor's giving me some trouble, but the link is here: http://www.relevantmagazine.com/god/practical-faith/evangelism-interfaith-world
** Peter Boghossian's Manual for Creating Atheists has lots to say about using rationality techniques in the course of daily life, and is well worth reading, though the author can be an asshole sometimes.
[link] Nick Beckstead on improving disaster shelters to increase the chances of recovery from a global catastrophe
What is the problem? Civilization might not recover from some possible global catastrophes. Conceivably, people with access to disaster shelters or other refuges may be more likely to survive and help civilization recover. However, existing disaster shelters (sometimes built to ensure continuity of government operations and sometimes built to protect individuals), people working on submarines, largely uncontacted peoples, and people living in very remote locations may serve this function to some extent.
What are the possible interventions? Other interventions may also increase the chances that humanity would recover from a global catastrophe, but this review focuses on disaster shelters. Proposed methods of improving disaster shelter networks include stocking shelters with appropriately trained people and resources that would enable them to rebuild civilization in case of a near-extinction event, keeping some shelters constantly full of people, increasing food reserves, and building more shelters. A philanthropist could pay to improve existing shelter networks in the above ways, or they could advocate for private shelter builders or governments to make some of the improvements listed above.
Who else is working on it? Some governments maintain bunkers in order to maintain continuity of government and/or to protect their citizens. Some individuals purchase and maintain private disaster shelters.
Questions for further investigation: With the possible exception of pandemic specifically engineered to kill all humans, I am aware of no scenario in which improved disaster shelters would plausibly enable a small group of people to survive a sudden near-extinction event. In the case of other catastrophes where a much larger number of people would survive, I would guess that improved refuges would play a relatively small role in helping humanity to recover because they would represent a small share of relevant people and resources. Many challenging questions about improving refuges remain, but I would prioritize investigating other issues at this point because refuges seem likely to be of limited value and alternative strategies (such as improving biosecurity and increasing the resilience of industrial and agricultural systems) seem more likely to effectively reduce the global catastrophic risks that improving refuges might plausibly address.
A brief essay intended for high school students: any thoughts?
If you go to school, take the classes that people tell you to, do your homework, and engage in the extracurricular activities that your peers do, you'll be setting yourself up for an "okay" life. But you can do better than that.
Followup to: Building Phenomenological Bridges
Summary: AI theorists often use models in which agents are crisply separated from their environments. This simplifying assumption can be useful, but it leads to trouble when we build machines that presuppose it. A machine that believes it can only interact with its environment in a narrow, fixed set of ways will not understand the value, or the dangers, of self-modification. By analogy with Descartes' mind/body dualism, I refer to agent/environment dualism as Cartesianism. The open problem in Friendly AI (OPFAI) I'm calling naturalized induction is the project of replacing Cartesian approaches to scientific induction with reductive, physicalistic ones.
I'll begin with a story about a storyteller.
Once upon a time — specifically, 1976 — there was an AI named TALE-SPIN. This AI told stories by inferring how characters would respond to problems from background knowledge about the characters' traits. One day, TALE-SPIN constructed a most peculiar tale.
Henry Ant was thirsty. He walked over to the river bank where his good friend Bill Bird was sitting. Henry slipped and fell in the river. Gravity drowned.
Since Henry fell in the river near his friend Bill, TALE-SPIN concluded that Bill rescued Henry. But for Henry to fall in the river, gravity must have pulled Henry. Which means gravity must have been in the river. TALE-SPIN had never been told that gravity knows how to swim; and TALE-SPIN had never been told that gravity has any friends. So gravity drowned.
TALE-SPIN had previously been programmed to understand involuntary motion in the case of characters being pulled or carried by other characters — like Bill rescuing Henry. So it was programmed to understand 'character X fell to place Y' as 'gravity moves X to Y', as though gravity were a character in the story.1
For us, the hypothesis 'gravity drowned' has low prior probability because we know gravity isn't the type of thing that swims or breathes or makes friends. We want agents to seriously consider whether the law of gravity pulls down rocks; we don't want agents to seriously consider whether the law of gravity pulls down the law of electromagnetism. We may not want an AI to assign zero probability to 'gravity drowned', but we at least want it to neglect the possibility as Ridiculous-By-Default.
When we introduce deep type distinctions, however, we also introduce new ways our stories can fail.
In my previous post, I introduced the idea of an "l-zombie", or logical philosophical zombie: A Turing machine that would simulate a conscious human being if it were run, but that is never run in the real, physical world, so that the experiences that this human would have had, if the Turing machine were run, aren't actually consciously experienced.
One common reply to this is to deny the possibility of logical philosophical zombies just like the possibility of physical philosophical zombies: to say that every mathematically possible conscious experience is in fact consciously experienced, and that there is no kind of "magical reality fluid" that makes some of these be experienced "more" than others. In other words, we live in the Tegmark Level IV universe, except that unlike Tegmark argues in his paper, there's no objective measure on the collection of all mathematical structures, according to which some mathematical structures somehow "exist more" than others (and, although IIRC that's not part of Tegmark's argument, according to which the conscious experiences in some mathematical structures could be "experienced more" than those in other structures). All mathematically possible experiences are experienced, and to the same "degree".
So why is our world so orderly? There's a mathematically possible continuation of the world that you seem to be living in, where purple pumpkins are about to start falling from the sky. Or the light we observe coming in from outside our galaxy is suddenly replaced by white noise. Why don't you remember ever seeing anything as obviously disorderly as that?
And the answer to that, of course, is that among all the possible experiences that get experienced in this multiverse, there are orderly ones as well as non-orderly ones, so the fact that you happen to have orderly experiences isn't in conflict with the hypothesis; after all, the orderly experiences have to be experienced as well.
One might be tempted to argue that it's somehow more likely that you will observe an orderly world if everybody who has conscious experiences at all, or if at least most conscious observers, see an orderly world. (The "most observers" version of the argument assumes that there is a measure on the conscious observers, a.k.a. some kind of magical reality fluid.) But this requires the use of anthropic probabilities, and there is simply no (known) system of anthropic probabilities that gives reasonable answers in general. Fortunately, we have an alternative: Wei Dai's updateless decision theory (which was motivated in part exactly by the problem of how to act in this kind of multiverse). The basic idea is simple (though the details do contain devils): We have a prior over what the world looks like; we have some preferences about what we would like the world to look like; and we come up with a plan for what we should do in any circumstance we might find ourselves in that maximizes our expected utility, given our prior.
In this framework, Coscott and Paul suggest, everything adds up to normality if, instead of saying that some experiences objectively exist more, we happen to care more about some experiences than about others. (That's not a new idea, of course, or the first time this has appeared on LW -- for example, Wei Dai's What are probabilities, anyway? comes to mind.) In particular, suppose we just care more about experiences in mathematically really simple worlds -- or more precisely, places in mathematically simple worlds that are mathematically simple to describe (since there's a simple program that runs all Turing machines, and therefore all mathematically possible human experiences, always assuming that human brains are computable). Then, even though there's a version of you that's about to see purple pumpkins rain from the sky, you act in a way that's best in the world where that doesn't happen, because that world has so much lower K-complexity, and because you therefore care so much more about what happens in that world.
There's something unsettling about that, which I think deserves to be mentioned, even though I do not think it's a good counterargument to this view. This unsettling thing is that on priors, it's very unlikely that the world you experience arises from a really simple mathematical description. (This is a version of a point I also made in my previous post.) Even if the physicists had already figured out the simple Theory of Everything, which is a super-simple cellular automaton that accords really well with experiments, you don't know that this simple cellular automaton, if you ran it, would really produce you. After all, imagine that somebody intervened in Earth's history so that orchids never evolved, but otherwise left the laws of physics the same; there might still be humans, or something like humans, and they would still run experiments and find that they match the predictions of the simple cellular automaton, so they would assume that if you ran that cellular automaton, it would compute them -- except it wouldn't, it would compute us, with orchids and all. Unless, of course, it does compute them, and a special intervention is required to get the orchids.
So you don't know that you live in a simple world. But, goes the obvious reply, you care much more about what happens if you do happen to live in the simple world. On priors, it's probably not true; but it's best, according to your values, if all people like you act as if they live in the simple world (unless they're in a counterfactual mugging type of situation, where they can influence what happens in the simple world even if they're not in the simple world themselves), because if the actual people in the simple world act like that, that gives the highest utility.
You can adapt an argument that I was making in my l-zombies post to this setting: Given these preferences, it's fine for everybody to believe that they're in a simple world, because this will increase the correspondence between map and territory for the people that do live in simple worlds, and that's who you care most about.
I mostly agree with this reasoning. I agree that Tegmark IV without a measure seems like the most obvious and reasonable hypothesis about what the world looks like. I agree that there seems no reason for there to be a "magical reality fluid". I agree, therefore, that on the priors that I'd put into my UDT calculation for how I should act, it's much more likely that true reality is a measureless Tegmark IV than that it has some objective measure according to which some experiences are "experienced less" than others, or not experienced at all. I don't think I understand things well enough to be extremely confident in this, but my odds would certainly be in favor of it.
Moreover, I agree that if this is the case, then my preferences are to care more about the simpler worlds, making things add up to normality; I'd want to act as if purple pumpkins are not about to start falling from the sky, precisely because I care more about the consequences my actions have in more orderly worlds.
Imagine this: Once you finish reading this article, you hear a bell ringing, and then a sonorous voice announces: "You do indeed live in a Tegmark IV multiverse without a measure. You had better deal with it." And then it turns out that it's not just you who's heard that voice: Every single human being on the planet (who didn't sleep through it, isn't deaf etc.) has heard those same words.
On the hypothesis, this is of course about to happen to you, though only in one of those worlds with high K-complexity that you don't care about very much.
So let's consider the following possible plan of action: You could act as if there is some difference between "existence" and "non-existence", or perhaps some graded degree of existence, until you hear those words and confirm that everybody else has heard them as well, or until you've experienced one similarly obviously "disorderly" event. So until that happens, you do things like invest time and energy into trying to figure out what the best way to act is if it turns out that there is some magical reality fluid, and into trying to figure out what a non-confused version of something like a measure on conscious experience could look like, and you act in ways that don't kill you if we happen to not live in a measureless Tegmark IV. But once you've had a disorderly experience, just a single one, you switch over to optimizing for the measureless mathematical multiverse.
If the degree to which you care about worlds is really proportional to their K-complexity, with respect to what you and I would consider a "simple" universal Turing machine, then this would be a silly plan; there is very little to be gained from being right in worlds that have that much higher K-complexity. But when I query my intuitions, it seems like a rather good plan:
- Yes, I care less about those disorderly worlds. But not as much less as if I valued them by their K-complexity. I seem to be willing to tap into my complex human intuitions to refer to the notion of "single obviously disorderly event", and assign the worlds with a single such event, and otherwise low K-complexity, not that much lower importance than the worlds with actual low K-complexity.
- And if I imagine that the confused-seeming notions of "really physically exists" and "actually experienced" do have some objective meaning independent of my preferences, then I care much more about the difference between "I get to 'actually experience' a tomorrow" and "I 'really physically' get hit by a car today" than I care about the difference between the world with true low K-complexity and the worlds with a single disorderly event.
In other words, I agree that on the priors I put into my UDT calculation, it's much more likely that we live in measureless Tegmark IV; but my confidence in this isn't extreme, and if we don't, then the difference between "exists" and "doesn't exist" (or "is experienced a lot" and "is experienced only infinitesimally") is very important; much more important than the difference between "simple world" and "simple world plus one disorderly event" according to my preferences if we do live in a Tegmark IV universe. If I act optimally according to the Tegmark IV hypothesis in the latter worlds, that still gives me most of the utility that acting optimally in the truly simple worlds would give me -- or, more precisely, the utility differential isn't nearly as large as if there is something else going on, and I should be doing something about it, and I'm not.
This is the reason why I'm trying to think seriously about things like l-zombies and magical reality fluid. I mean, I don't even think that these are particularly likely to be exactly right even if the measureless Tegmark IV hypothesis is wrong; I expect that there would be some new insight that makes even more sense than Tegmark IV, and makes all the confusion go away. But trying to grapple with the confused intuitions we currently have seems at least a possible way to make progress on this, if it should be the case that there is in fact progress to be made.
Here's one avenue of investigation that seems worthwhile to me, and wouldn't without the above argument. One thing I could imagine finding, that could make the confusion go away, would be that the intuitive notion of "all possible Turing machines" is just wrong, and leads to outright contradictions (e.g., to inconsistencies in Peano Arithmetic, or something similarly convincing). Lots of people have entertained the idea that concepts like the real numbers don't "really" exist, and only the behavior of computable functions is "real"; perhaps not even that is real, and true reality is more restricted? (You can reinterpret many results about real numbers as results about computable functions, so maybe you could reinterpret results about computable functions as results about these hypothetical weaker objects that would actually make mathematical sense.) So it wouldn't be the case after all that there is some Turing machine that computes the conscious experiences you would have if pumpkins started falling from the sky.
Does the above make sense? Probably not. But I'd say that there's a small chance that maybe yes, and that if we understood the right kind of math, it would seem very obvious that not all intuitively possible human experiences are actually mathematically possible (just as obvious as it is today, with hindsight, that there is no Turing machine which takes a program as input and outputs whether this program halts). Moreover, it seems plausible that this could have consequences for how we should act. This, together with my argument above, make me think that this sort of thing is worth investigating -- even if my priors are heavily on the side of expecting that all experiences exist to the same degree, and ordinarily this difference in probabilities would make me think that our time would be better spent on investigating other, more likely hypotheses.
Leaving aside the question of how I should act, though, does all of this mean that I should believe that I live in a universe with l-zombies and magical reality fluid, until such time as I hear that voice speaking to me?
I do feel tempted to try to invoke my argument from the l-zombies post that I prefer the map-territory correspondences of actually existing humans to be correct, and don't care about whether l-zombies have their map match up with the territory. But I'm not sure that I care much more about actually existing humans being correct, if the measureless mathematical multiverse hypothesis is wrong, than I care about humans in simple worlds being correct, if that hypothesis is right. So I think that the right thing to do may be to have a subjective belief that I most likely do live in the measureless Tegmark IV, as long as that's the view that seems by far the least confused -- but continue to spend resources on investigating alternatives, because on priors they don't seem sufficiently unlikely to make up for the potential great importance of getting this right.
As you may know from my past posts, I believe that probabilities should not be viewed as uncertainty, but instead as weights on how much you care about different possible universes. This is a very subjective view of reality. In particular, it seems to imply that when other people have different beliefs than me, there is no sense in which they can be wrong. They just care about the possible futures with different weights than I do. I will now try to argue that this is not a necessary conclusion.
First, let's be clear what we mean by saying that probabilities are weights on values. Imagine I have an unfair coin which give heads with probability 90%. I care 9 times as much about the possible futures in which the coin comes up heads as I do about the possible futures in which the coins comes up tails. Notice that this does not mean I want to coin to come up heads. What it means is that I would prefer getting a dollar if the coin comes up heads to getting a dollar if the coin comes up tails.
Now, imagine that you are unaware of the fact that it is an unfair coin. By default, you believe that the coin comes up heads with probability 50%. How can we express the fact that I have a correct belief, and you have an incorrect belief in the language of values?
We will take advantage of the language of terminal and instrumental values. A terminal value is something that you try to get because you want it. An instrumental value is something that you try to get because you believe it will help you get something else that you want.
If you believe a statement S, that means that you care more about the worlds in which S is true. If you terminally assign a higher value to worlds in which S is true, we will call this belief a terminal belief. On the other hand, if you believe S because you think that S is logically implied by some other terminal belief, T, we will call your belief in S an instrumental belief.
Instrumental values can be wrong, if you are factually wrong about the fact that the instrumental value will help achieve your terminal values. Similarly, an Instrumental belief can be wrong if you are factually wrong about the fact that it is implied by your terminal belief.
Your belief that the coin will come up heads with probability 50% is an instrumental belief. You have a terminal belief in some form of Occam's razor. This causes you to believe that coins are likely to behave similarly to how coins have behaved in the past. In this case, that was not valid, because you did not take into consideration the fact that I chose the coin for the purpose of this thought experiment. Your Instrumental belief is in this case wrong. If your belief in Occam's razor is terminal, then it would not be possible for Occam's razor to be wrong.
This is probably a distinction that you are already familiar with. I am talking about the difference between an axiomatic belief and a deduced belief. So why am I viewing it like this? I am trying to strengthen my understanding of the analogy between beliefs and values. To me, they appear to be two different sides of the same coin, and building up this analogy might allow us to translate some intuitions or results from one view into the other view.
View more: Next