A Semitechnical Introductory Dialogue on Solomonoff Induction

I think there's something a bit fishy about the Poe/Shannon analogy.

In order to understand what's wrong with Poe's argument, what you need (or at least what would have been helpful) is an understanding of how having an absurdly large amount of computing power would enable you to solve the problem.

Solomonoff induction assumes not merely an absurdly large amount, but an infinite amount. And simple approximations to Solomonoff assume much more absurdly large amounts of compute than e.g. Shannon.

Poe says "no mechanical device can possibly play a decent game of chess, because it would need to look at a variety of different possible sequences of moves, and that's not a thing that can be done mechanically". Shannon says two relevant things: first, "here is how, with an unreasonably large amount of computation, we could play a perfect game of chess"; second, "here is how, with a large but not so unreasonably large amount of computation, we could play a pretty decent game of chess". Neither of these requires a hypercomputer; even computing the whole search tree is a finite amount of work. And the second was something that could be done to some extent using hardware Shannon could have built. To play grandmaster-level chess using exactly Shannon's method would require an infeasibly large amount of computing power (which we have since dealt with (1) by discovering alpha-beta and other sorts of pruning and (2) by making then-infeasibly powerful computers). But Shannon did show that machinery of a kind that can exist within the universe can play a not-so-awful game of chess; he genuinely refuted Poe's argument. And he didn't appeal to hypercomputers to do it.

Whereas someone skeptical of the very idea of mechanized epistemology can respond to Solomonoff by saying "yeah, but your algorithm requires a literally infinite amount of computation. It's not just that present technology falls short of what it needs, but that no possible technology could do it". A Poe-alike couldn't make such an argument for chess.

You might reply: yeah, sure, but we can "truncate" a Solomonoff inductor so that it only considers programs of, say, some finite maximum size; then it's no longer infinite and so long as reality isn't too complicated it'll still give good results. Except that unless that finite maximum is so tiny that Solomonoff does nothing useful, your truncated Solomonoff inductor is still too resource-hungry to do anything useful even if we turn the whole observable universe into computronium and let it run for the entire lifetime of that universe so far. Again, a Poe-alike couldn't have made such an argument for chess; if you really had to, you could build a Babbage-like contraption that plays chess using some sort of brute-force search to a couple of ply. It would play badly but still better than some beginners, and that's already enough to refute Poe.

That doesn't mean that Solomonoff induction isn't a useful notion, or that it can't provide insight into what's going on when we interpret the evidence of our senses, or that it can't be a useful inspiration for more practical attempts at mechanized epistemology. But I think the situation is sufficiently different from that of Poe contemplating the Mechanical Turk to make the analogy not very helpful.

Extending OP's argument, I would put it like this. Suppose you start off having no idea whatsoever how you might do a thing, and end up having a practical algorithm you can implement on real hardware. You can (in principle) split that up into the following stages: (0) no idea whatsoever, (1) can do it with literally infinite compute, (2) can do it with superexponentially-huge compute, (3) can do it with tile-the-universe-with-computronium compute, (4) can do it with several orders of magnitude more than we have, (5) can do it with something like realistic resources. I suggest that each of those stages may correspond to a comparable advance in understanding. Solomonoff induction takes us from 0 to 1, but pace Peter Thiel this is not always the most important step and I don't think it's as big a step as from Poe to Shannon.

[-]NunoSempere4y30

Except that unless that finite maximum is so tiny that Solomonoff does nothing useful, your truncated Solomonoff inductor is still too resource-hungry to do anything useful even if we turn the whole observable universe into computronium and let it run for the entire lifetime of that universe so far

Not the case!!! The OEIS can be viewed as an abridged Solomonoff inductor, and it is useful.

[-]acgt4y20

I think the point is even stronger than that - Solomonoff induction requires not just infinite compute/time but doing something literally logically impossible - the prior is straight up uncomputable, not in any real-world tractability sense but as uncomputable as the Halting problem is. There’s a huge qualitative gulf between “we can’t solve this problem without idealised computers with unbounded time” and “we can’t solve this on a computer by definition”. Makes a huge difference to how much use the approach is for “crispening” ideas IMO

[-]gjm4y20

Yup, actual Solomonoff induction is uncomputable. I'm not sure what you mean by "not just infinite compute/time", though; given truly infinite computation you absolutely could do it. (Though in a world where that was possible, you'd really want your Solomonoff inductor to consider possible explanations that likewise require an infinite amount of computation, and then you'd be back with the same problem again.) I guess the distinction you're making is between "requires a finite but absurdly large amount of computation" and "requires literally infinite computation", and I agree that the latter is what Solomonoff induction requires.

I think that reduces the credibility of the claim "Solomonoff induction is the One True Way to do inference, at least ideally speaking". But I think the following weaker proposal is intact:

Given two rival explanations of our observations, in the happy (and pretty much unheard-of) case where they have both been expressed precisely in terms of computer programs, all else equal we should consider the shorter program "simpler", "better", and "more likely right".
One way for all else not to be equal is if one program is shorter than the other just because it's been more highly optimized in some way that doesn't have much to do with the actual theory it embodies. So rather than "shorter program better" we should really say something more like "shorter program, after making it as small as possible, better".
Obviously, coming up with an actual program that perfectly explains all our observations is unrealistic; those observations include reading scientific papers, so it seems like the program would need to include a complete Theory Of Everything in physics; those observations include interactions with other humans, so it seems like the program would need to include at least as much intelligence as any person we encounter; these are both famously hard problems that the human race has not yet cracked.
But given two proposals for explaining the same subset of our experience, if we are credibly able to reason about which corresponds (after optimization) to the shorter program, we should prefer the proposal that seems like it has the shorter program.

In many real cases, it will be completely unclear which of two proposals corresponds to the shorter program after optimization, and in that case we'll have to use some other heuristic. And even in cases where it seems pretty clear which of two proposals corresponds to the shorter program, we need to be aware that we could be very wrong because what I've blithely called "optimization" is uncomputable and we can never be sure there isn't a shorter program. And it's not like the gods have handed down to us conclusive reason to think that simpler programs really are more likely to be right; it's just that heuristics along those lines have worked out pretty well for us, and concepts of "simpler" nearer to "shorter program" and further from "seems simpler to naive human brain" generally seem to work better. So (unless there's some clever point I'm missing, which there might be) some of EY's more dramatic claims about how anything other than Solomonoff induction is Wrong and Biased and Irrational seem overblown. But if you treat it as a very promising heuristic rather than a definitely known truth I'm still on board.

Also, you totally can do something that seems to me sufficiently along the lines of Solomonoff induction with an amount of computation that's merely absurdly large, rather than merely infinite. At step N you try out all programs of length up to N running for up to N steps, and see whether they produce output consistent with your observations to date. Once N gets very large you have discovered things like "among programs shorter than a gigabyte running for less than 10^100 cycles and producing output consistent with these observations, these are the shortest ones, and if we weight them SI-style then the probability distribution for our next observation is this". And if (as it seems to me kinda plausible that you actually should) you also somehow regard more expensive computations as less probable, then once N gets large enough that you have any candidate programs that produce the right predictions you can start putting upper bounds on the errors in your predictions compared with an ideal no-longer-exactly-Solomonoff inductor, and those errors tend to 0 as the amount of computation you're willing to do increases.

To be clear, the amount of computation required for any version of this is absurdly impractical in the real world, and if your ideal is actual Solomonff induction then you don't get any error bounds because you can't rule out the possibility that some shorter program that hasn't generated any output yet might do a good job eventually. But it's not as if you literally can't do anything Solomonoff-induction-shaped without literally infinite amounts of computation.

[-]acgt4y10

I think there’s a sense in which some problems can be uncomputable even with infinite compute no? For example if the Halting problem were computable even with literally infinite time, then we could construct a machine that halted when given its own description iff it ran forever when given its own description. I do think theres a distinction beyond just “arbitrarily large finite compute vs. infinite compute”. It seems like either some problems have to be uncomputable even by a hyper-computer, or else the concept of infinite compute time is less straightforward than it seems

I totally agree on your other points though, I think the concept of bounded Solomonoff induction could be interesting in itself, although I presume with it you lose all the theoretical guarantees around bounded error. Would definitely be interested to see if there’s literature on this

[-]gjm4y20

The halting problem is computable with literally-infinite time. But, to be precise, what this means is that a hypercomputer could determine whether a (nonhyper)computer halts; in a universe containing hypercomputers, we would not be very interested in that, and we'd be asking for something that determines whether a given hypercomputer halts (or something like that; I haven't given much thought to what corresponds to "halting" for any given model of hypercomputation...) which would be impossible for the same sort of reasons as the ordinary halting problem is impossible for ordinary computers.

But I think it's only fair to describe this by saying "the halting problem is impossible even with infinite computational resources" if you acknowledge that then "the halting problem" isn't a single problem, it's a thing that varies according to what computational resources you've got, getting harder when you have more resources to throw at it.

[-]Alexei5y100

Anyone knows of 3rd party libraries that implement something like this?

[-]Pongo5y90

Was this actually cross posted by EY, or by Rob or Ben? I prefer it being mentioned in the latter case

[-]Rob Bensinger5y120

I posted this, and I'll make a note that I did so for any future Eliezer content where I hit the 'submit' button.

The causal process for this article looked like this:

Eliezer wrote a lot of AI alignment content for Arbital. (This continues to be a good resource and I encourage you to browse it as-is, but it's pretty disorganized and a lot of it has notes like 'todo: fix X'.)
Eliezer delegated to me the task of cleaning up and cross-posting his Arbital stuff. This isn't at the top of my priority list yet (partly because the content is already available on the public Internet, albeit unpolished and out-of-the-way), so I've only ported over stuff in dribs and drabs so far.
Richard Ngo posted about the Solomonoff induction dialogue, which reminded me of how much I wanted to cross-post that one (and made it more timely to do so). So I decided to cross-post now (after running the timing by Eliezer and Nate for approval), and I pushed the 'submit' button.

I'm also the one who proposed organizing these particular posts into a sequence ("Concepts in Formal Epistemology"), and who decided to cross-post the rest of the sequence when I did (rather than in a different order, or six months later, etc.)

[-]Pongo5y30

Thanks!

[-]Rob Bensinger5yΩ360

Previously linked here: https://www.alignmentforum.org/posts/wsBpJn7HWEPCJxYai/excerpt-from-arbital-solomonoff-induction-dialogue

[-]TAG4y50

"ASHLEY: Uh, but you didn’t actually use the notion of computational simplicity to get that conclusion; you just required that the supply of probability mass is finite and the supply of potential complications is infinite. Any way of counting discrete complications would imply that conclusion, even if it went by surface wheels and gears.

"BLAINE: Well, maybe. But it so happens that Yudkowsky did invent or reinvent that argument after pondering Solomonoff induction, and if it predates him (or Solomonoff) then Yudkowsky doesn’t know the source. Concrete inspiration for simplified arguments is also a credit to a theory, especially if the simplified argument didn’t exist before that.

"ASHLEY: Fair enough."

I think Ashley deserves an answer to "the objection "[a]ny way of counting discrete complications would imply that conclusion, even if it went by surface wheels and gears", not a claim about who invented what first!

[-]Ruby4y50

Curated. Solomonoff Induction is idealized induction, and as the post asserts, sometimes we learn about the non-idealized cases (get much less confused) by studying the idealized case. For that reason, I think this accessible albeit incredibly long dialogue is worth reading. Heck, it helps ground out Occam's razor.

[-]Bunthut5y50

When we don't know how to solve a problem even given infinite computing power, the very work we are trying to do is in some sense murky to us.

I wonder where this goes with questions about infinite domains. It seems to me that I understand what it means to argmax a generic bounded function on a generic domain, but I don't know an algorithm for it and as far as I know there can't be one. So it seems taking this very seriously would lead us to some form of constructionism.

[-]Optimization Process5y10

Hmm. If we're trying to argmax some function over the real numbers, then the simplest algorithm would be something like "iterate over all mathematical expressions $e$ ; for each one, check whether the program 'iterate over all provable theorems, halting when you find one that says $e = a r g m a x f$ ' halts; if it does, return $e$ ."

...but I guess that's not guaranteed to ever halt, since there could conceivably be an infinite procession of ever-more-complex expressions, eking out ever-smaller gains on $f$ . It seems possible that no matter what (reasonably powerful) mathematical language you choose, there are function-expressions with finite maxima at values not expressible in your language. Which is maybe what you meant by "as far as I know there can't be [an algorithm for it]."

(I'm assuming our mathematical language doesn't have the word $a r g m a x$ , since in that case we'd pretty quickly stumble on the expression $a r g m a x f$ , verify that $a r g m a x f = a r g m a x f$ , and return it, which is obviously a cop-out.)

[-]johnlawrenceaspden4y40

We can also consider it as a probability distribution over infinite sequences

Surely, 'over finite sequences'?

[-]Alexei5y40

I had a related question I'm still looking for a good answer to: https://www.lesswrong.com/posts/QCSEFxtNPXr5vsZyf/what-tools-exist-to-compute-all-possible-programs

[-]John_Maxwell5yΩ240

...When we can state code that would solve the problem given a hypercomputer, we have become less confused. Once we have the unbounded solution we understand, in some basic sense, the kind of work we are trying to perform, and then we can try to figure out how to do it efficiently.

ASHLEY: Which may well require new insights into the structure of the problem, or even a conceptual revolution in how we imagine the work we're trying to do.

I'm not convinced your chess example, where the practical solution resembles the hypercomputer one, is representative. One way to sort a list using a hypercomputer is to try every possible permutation of the list until we discover one which is sorted. I tend to see Solomonoff induction as being cartoonishly wasteful in a similar way.

[-]Optimization Process5y30

The understanding I came away with: there are (at least) three stages of understanding a problem:

You can't write a program to solve it.
You can write a cartoonishly wasteful program to solve it.
You can write a computationally feasible program to solve it.

"Shuffle-sort" achieves the second level of knowledge re: sorting lists. Yeah, it's cartoonishly wasteful, and it doesn't even resemble any computationally feasible sorting algorithm (that I'm aware of) -- but, y'know, viewed through this lens, it's still a huge step up from not even understanding "sorting" well enough to sort a list at all.

(Hmm, only marginally related but entertaining: if you reframe the problem of epistemology not as sequence prediction, but as "deduce what program is running your environment," then a Solomonoff inductor can be pretty fairly described as "consider every possible object of type EnvironmentProgram; update its probability based on the sensory input; return the posterior PDF over EnvironmentProgram-space." The equivalent program for list-sorting is "consider every possible object of type List<Int>; check if (a) it's sorted, and (b) it matches the element-counts of the input-list; if so, return it." Which is even more cartoonishly wasteful than shuffle-sort. Ooh, and if you want to generalize to cases where the list-elements are real numbers, I think you get/have to include something that looks a lot like Solomonoff induction, forcing countability on the the reals by iterating over all possible programs that evaluate to real numbers (and hoping to God that whatever process generated the input list, your mathematical-expression-language is powerful enough to describe all the elements).)

[-]CronoDAS4y30

Solomonoff induction does have a blind spot: it assigns probability zero to the existence of halting oracles or other uncomputable sequences. Of course, every other computable prediction algorithm is just as incapable of predicting the output of a halting oracle and there don't seem to be any uncomputable functions in the actual laws of physics, but it's still a blind spot!

[-]drocta4y10

Is this something that the infra-bayesianism idea could address? So, would an infra-bayesian version of AIXI be able to handle worlds that include halting oracles, even though they aren't exactly in its hypothesis class?

[-]Ian Televan5y*30

Could someone explain why this doesn't degenerate into an entirely circular concept when we postulate a stronger compiler; or why it doesn't become entirely dependent on the choice of the compiler?

There are many programs that output identical sequences. That's a waste. Make it so that no two different programs have the same output.
There are many sequences that when fed into the compiler don't result in valid programs. That's a waste. Make it so that every binary sequence represents a valid program.

Now we have a set of sequences that we'd like to encode: S = {, 0, 1, 00, 01, ... }, a set of sequences that are interpreted by the compiler as programs: P = { $ε$ , 0, 1, 00, 01, ... } and the compiler which is a bijection from P to S. It better not turn out to be the identity function.. And that's with the best possible compiler. If we postulate a reasonable but much weaker compiler then the programs that encode the sequences become on average longer than the sequences themselves!

The only way out of this that I see is to weight elements of S by their frequencies in our universe and/or by how much we care about them, and then let the compiler be a function that minimizes this frequency-importance score. In fact, this compiler starts looking more and more like an encoder (?!). The difficult part then seems to me to be the choice of the optimal encoder, and not the Solomonoff induction itself.

Edit: Of course, when there's a 1 to 1 mapping, then selecting the shortest program is trivial. So in a way, if we make the Solomonoff induction trivial then the only thing that's left is the choice of the compiler. But why isn't this still a problem with weaker, traditional compilers?

[-]Pattern5y30

Contents

1. What credit is Poe due? (Without doing lots of research.)

2. "Rationality" and Neural networks

3. Poe's revenge

4. Where does "Rationality" lie?

5. The Way

6. A basic statistic question

7. Not-learned knowns, and bodies versus heuristics

8. Is physics hard, if we're good at moving?

9. Is Occam's Razor a retcon? (And far too much else.)

10. A problem with Solomonoff Induction

11. Does anyone have the code for Solomonoff Induction or AIXI?

1. What credit is Poe due? (Without doing lots of research.)

On Poe's argument (as presented here):

Poe may be right about solving chess. His opinions concerning randomness, are interesting - while those issues may have been better worked out since to show that deterministic algorithms may do as well as algorithms using randomness, I think this was not known at the time.

Information theorist Claude Shannon argued in 1951 that it is not feasible for any computer to actually solve chess, since it would either need to compare some 10^120 possible game variations, or have a "dictionary" denoting an optimal move for each of the about 10^43 possible board positions.[4]*

-Wikipedia on Solving Chess

Arguably his failure is conflating 'finding the/an optimal solution' (which proceeds from the rules), with 'being good'. (Saying 'an automaton can never do this' seems obviously accurate if you note that the necessary computer would be too big to be an automaton. Shannon wrote in a time with better computers. While holding that 'people follow deterministic rules therefore the necessary computation can fit inside a human body' - which he might not have** - Poe might have maintained that your 'Frankenstein monster' (perhaps not the words of the time) is clearly a different type of thing than an automaton. And today, we are presently interested in neural networks - even the idea of simulating a person on 'Babbage's machine' might not have occurred to Poe.**)

2. "Rationality" and Neural networks

Once we have the unbounded solution we understand, in some basic sense, the kind of work we are trying to perform, and then we can try to figure out how to do it efficiently.

ASHLEY: Which may well require new insights into the structure of the problem, or even a conceptual revolution in how we imagine the work we're trying to do.

EY once argued against neural networks (possibly in the context of friendly AI?) - the disagreement may be concerning 'solving problems magically'. (And as a means for beating people at chess, they did come later.) Today it would appear you might not need to possess:

Correct knowledge of how brains (and minds) of people actually work
Complete knowledge of how to play chess (find a chess engine)

in order to come up with a solution, if you have enough resources and the formalism/algorithms of neural networks. Just train something that's good enough. Interestingly, this might mean a 'rational approach' i.e. one with a good theory might not be necessary for technically well-specified problems (like chess), though it may be important for friendly AI (which remains to be seen).

3. Poe's revenge

and inventing good hypotheses from scratch.

So Solmonoff Induction includes randomness?

4. Where does "Rationality" lie?

You had to notice the resemblance to the Fibonacci rule to guess the next number.

Not consciously.

5. The Way

We just have no idea how Terence Tao works, so we can't duplicate his abilities in a formal rule, no matter how much computing power that rule gets...

Simulate Tao's brain. (Did the OP really resist this pun, or just not see it? It doesn't fit with 'figure out how to solve the problem'...)

Yes, as a real world solution, there would be issue - ethics, how do you even do that, are computers powerful enough, etc.

6. A basic statistic question

ASHLEY: But what if you can do better by forgetting more?

So you don't overfit?

for one thing, you can always just do the same policy you would have used if you hadn't seen that evidence.

This is great. (It also seems wrong for people, to some extent.)

With unlimited computing power, nothing goes wrong as a result of trying to process 4 gigabits per second; every extra bit just produces a better expected future prediction.

A little handwaving, but it's clear it's handwaving. (If I told you processing that amount of info at that speed would destroy the world (it probably doesn't)***, you might disagree with 'just produces a better prediction'. This is nitpicking at the level of 'watch your wishes', but unlimited computing power might be very destructive.)

7. Not-learned knowns, and bodies versus heuristics

ASHLEY: I note that there are some things I know that don't come from my sensory inputs at all. Chimpanzees learn to be afraid of skulls and snakes much faster than they learn to be afraid of other arbitrary shapes. I was probably better at learning to walk in Earth gravity than I would have been at navigating in zero G. Those are heuristics I'm born with, based on how my brain was wired, which ultimately stems from my DNA specifying the way that proteins should fold to form neurons—not from any photons that entered my eyes later.

Swimming without having learned how is also an example, until it goes away. Learning to navigate better on Earth than in zero G (is an empirical claim), which might have more to do with the shape of the body, and the environment. That's not '"heuristics" in thinking' - that's body design, etc.

8. Is physics hard, if we're good at moving?

ASHLEY: Part of my mind feels like the laws of physics are quite complicated compared to going outside and watching a sunset. Like, I realize that's false, but I'm not sure how to say out loud exactly why it's false...

Perhaps our brains run a useful approximation? Neural networks may be more adapted to conditions than well, running such general formulas.

The language of physics is differential equations, and it turns out that this is something difficult to beat into some human brains,

Then how are we 'good' at moving? Like, at a level that seems hard to train/program 'robots' to do?

If pi is normal, then somewhere in its digits is a copy of Shakespeare's Hamlet—but the number saying which particular digit of pi to start looking at, will be just about exactly as large as Hamlet itself.

It seems like the number would be longer. Like n^2 at least. (Unless you have a way of compressing it, which seems like it'd be hard to do.)

Similarly, the world Earth is much more algorithmically complex than the laws of physics.

Because it also includes constants?

ASHLEY: A probability distribution over possible 66-megabit frames? Like, a table with 266,000,000 entries, summing to 1?

Which is implicitly a model of the entire universe. (Sort of.)

9. Is Occam's Razor a retcon? (And far too much else.)

The "entities" of a theory are its types, not its objects.

Did Occam mean that, or is this a retcon?

And Solomonoff induction tells us that this invocation of Occam's Razor is flatly misguided because Occam's Razor does not work like that.

This is a circular argument. That's like saying 'The world is round hypothesis tells us the world is round', when that is part of (in fact the whole) of the hypothesis itself.

Some people like Levin search more than Solomonoff induction because it's more computable. I dislike Levin search because (a) it has no fundamental epistemic justification and (b) it assigns probability zero to quantum mechanics.

People want the world to be simple. (The Simple World Fallacy, or The world I can understand (easily) fallacy?)

BLAINE: For example two, that Solomonoff induction outperforms even Terence Tao,

I'm glad this was eventually addressed, although I feel like this has the interpretability problem, except worse.

ASHLEY: So your basic argument is, "Never mind Terence Tao, Solomonoff induction dominates God."

More like "Is God." There might be some work on the flaws of this approach (SI/AIXI), even in theory, which seem immaterial prior to a switch being made to an approximation.

smarter entities can extract more info than is immediately apparent on the surface of things.

smarter, better calibrated, experts in the domain.... Arguably, Solomonoff Induction is rather stupid/low information. It generates all hypothesis (which tells you nothing), then it does work on those hypothesis (most of the information/'smart' in it), and it learns the rest. Shipping something with information about this universe seems more efficient. SI is supposed to get the most out of that information - that's why it's 'an ideal' - but it costs an infinite amount of energy, takes forever, etc.

you could look at which agents were seeing exact data like the data you got

Takes some work to find those agents.

In fact, you're probably pointing at some particular shortcut and claiming nobody can ever figure that out using a reasonable amount of computing power

an unreasonable amount of computing power. Infinitely unreasonable.

just so that their mental simulation of the ideal answer isn't running up against stupidity assertions.

There's no reason in principle, that all types of minds will agree with you. What reason do you have to suppose humans will? (What reasonability guarantee is there?)

It sounds like "Jehovah placed rainbows in the sky as a sign that the Great Flood would never come again" is a 'simple' explanation; you can explain it to a child in nothing flat.

Because we don't have SI/AIXI's 'flaw' - it can never imagine a being such as itself.

and it sounds more alien and less intuitive than Jehovah.

Might just be a matter of the childhood. Would it be more intuitive to adults if it was explained to them as kids? (I'm going to call this 'The Rainbow Religion'.)

but that doesn't mean I should look at the historical role supposedly filled by Abraham Lincoln, and look for simple mechanical rules that would account for the things Lincoln is said to have done.

a) Evolution

b) No, you should look for simple mechanical rules that would generate the story - why do you believe the person telling you the story? It's the P(Observing A) not P(A).

to predict the modified-human entity that is Jehovah.

The supposed infinities involved might do the job. If AIXI/SI cannot imagine itself, then that's probably handled. (I could be wrong about this, but maybe 'Machines don't believe in infinities.')

it shouldn't cost as much to postulate a similar kind of thing elsewhere!

Because the thing hasn't been postulated in isolation, in SI it showed up in a universe. With a cause (ultimately the beginning of the universe). Easy reuse just requires the right sort of cause - an engineer, evolution, duplication, etc.

BLAINE: Well, but even if I was wrong that Solomonoff induction should make Jehovah seem very improbable, it's still Solomonoff induction that says that the alternative hypothesis of 'diffraction' shouldn't itself be seen as burdensome—even though diffraction might require a longer time to explain to a human, it's still at heart a simple program.

ASHLEY: Hmm.

So this is a time spent computing problem? We spend too much time thinking about humans, not enough time thinking about rainbows? (Insufficient Rainbow Contemplation.) Arguably this is rational - which is more likely to kill you, a rainbow or a human?

ASHLEY: Got a list of the good advice you think is derivable?

BLAINE: Um. Not really, but off the top of my head:

Sounds like stuff learned from experience.

People were wrong about galaxies being a priori improbable because that's not how Occam's Razor works.

Or they assumed the universe was small.

If something seems "weird" to you but would be a consequence of simple rules that fit the evidence so far, well, there's nothing in these explicit laws of epistemology that adds an extra penalty term for weirdness.

I think noticing confusion is important. (Retraining your intuitions might be useful though.)

Your epistemology shouldn't have extra rules in it that aren't needed to do Solomonoff induction or something like it, including rules like "science is not allowed to examine this particular part of reality"—

I've considered 'infinities are impossible' myself. The only problem is 'What happens if you find an infinity?' (That being said, I still think it might be a useful tool - if you figure out the limit of a function as the input goes to infinity, and nothing short of infinity will reach the limit, then you've got bounds on that function even when you don't know the input.)

10. A problem with Solomonoff Induction

BLAINE: Well, it wouldn't bite you in the form of repeatedly making wrong experimental predictions.

But it requires infinite resources to run, and can only simulate finite programs, whereas a universe where it could be run would be a universe it couldn't simulate.

Which brings up that dangling question from before about modeling the effect that my actions and choices have on the environment, and whether, say, an agent that used Solomonoff induction would be able to correctly predict "If I drop an anvil on my head, my sequence of sensory observations will end."

In theory that's an empirical question, but without a hypercomputer it seems untestable.

11. Does anyone have the code for Solomonoff Induction or AIXI?

Solomonoff induction is the best formalized epistemology we have right now—

. Does anyone have the code for Solomonoff Induction or AIXI? One with bounds, that actually runs on computers?

Footnotes (from 1.)

*From the original paper:

This is conservative for our calculation since the machine would calculate out to checkmate, not resignation. However, even at this figure there will be 10^120 variations to be calculated from the initial position. A machine operating at the rate of one variation per micro-second would require over 10^90 years to calculate the first move!"

** Believing in souls might have tripped Poe up.

***Because eyeballs and brains seem to do fine. But how do they work?

[-]supposedlyfun5y30

The language of physics is differential equations, and it turns out that this is something difficult to beat into some human brains

You rang?

I'm not sure why, but I find these dialogues easier to learn from than an article expressing the same ideas in the same order, even in an explicitly Q&A format.

[-]lc5y30

Nice to have you back Eliezer.

[-]Pattern5y40

Comments further up the thread suggest this is old content, from Arbital, posted by Rob.

[-]Crackatook4y10

ASHLEY: Good evening, Msr. Blaine.
BLAINE: Good evening, Msr. Ashley.

Is this typo? I've never heard of "Msr." however it is used twice as if it is not typo.

[-]Unnamed4y110

https://en.wikipedia.org/wiki/Gender-neutral_title#Other_titles

[-]Ben Pace4y70

I read it as "Messr" as in

"Messrs Moony, Wormtail, Padfoot, and Prongs
Purveyors of Aids to Magical Mischief-Makers
are proud to present
THE MARAUDER'S MAP"

[-]Crackatook4y10

Thanks for the reply! but I still want to know more; I am confused with that Internet says it is the plural form of "Mr.", which isn't the case here.

[-]Zack_M_Davis4y30

My guess is that "Msr." was intended as an abbreviation for monsieur (French for mister), in ignorance of the fact that the standard abbreviation is actually just M.

[This comment is no longer endorsed by its author]Reply

[-]reallyeli5y10

Okay. Though in the real world, it's quite likely that an unknown frequency is exactly , $1$ , or $1 / 2$

should the text read "unlikely" instead of "likely" ?

LESSWRONG
LW

LESSWRONG
LW

146

A Semitechnical Introductory Dialogue on Solomonoff Induction

146

Ω 33

146

Ω 33

i. Unbounded analysis

ii. Sequences

iii. Hypotheses

iv. Simplicity

v. Choice of Universal Turing Machine

vi. Why algorithmic complexity?

vii. Limitations