Followup toBuilding Something Smarter , Say Not "Complexity", That Alien Message

One of the Godel-inspired challenges to the idea of self-improving minds is based on the notion of "complexity".

Now "complexity", as I've previously mentioned, is a dangerous sort of word.  "Complexity" conjures up the image of a huge machine with incomprehensibly many gears inside - an impressive sort of image.  Thanks to this impressiveness, "complexity" sounds like it could be explaining all sorts of things - that all sorts of phenomena could be happening because of "complexity".

It so happens that "complexity" also names another meaning, strict and mathematical: the Kolmogorov complexity of a pattern is the size of the program code of the shortest Turing machine that produces the pattern as an output, given unlimited tape as working memory.

I immediately note that this mathematical meaning, is not the same as that intuitive image that comes to mind when you say "complexity".  The vast impressive-looking collection of wheels and gears?  That's not what the math term means.

Suppose you ran a Turing machine with unlimited tape, so that, starting from our laws of physics, it simulated our whole universe - not just the region of space we see around us, but all regions of space and all quantum branches.  (There's strong indications our universe may be effectively discrete, but if not, just calculate it out to 3^^^3 digits of precision.)

Then the "Kolmogorov complexity" of that entire universe - throughout all of space and all of time, from the Big Bang to whatever end, and all the life forms that ever evolved on Earth and all the decoherent branches of Earth and all the life-bearing planets anywhere, and all the intelligences that ever devised galactic civilizations, and all the art and all the technology and every machine ever built by those civilizations...

...would be 500 bits, or whatever the size of the true laws of physics when written out as equations on a sheet of paper.

The Kolmogorov complexity of just a single planet, like Earth, would of course be much higher than the "complexity" of the entire universe that contains it.

"Eh?" you say.  "What's this?" you say.  "How can a single planet contain more wheels and gears, more complexity, than the whole vast turning universe that embeds it?  Wouldn't one planet contain fewer books, fewer brains, fewer species?"

But the new meaning that certain mathematicians have formulated and attached to the English word "complexity", is not like the visually impressive complexity of a confusing system of wheels and gears.

It is, rather, the size of the smallest Turing machine that unfolds into a certain pattern.

If you want to print out the entire universe from the beginning of time to the end, you only need to specify the laws of physics.

If you want to print out just Earth by itself, then it's not enough to specify the laws of physics.  You also have to point to just Earth within the universe.  You have to narrow it down somehow.  And, in a big universe, that's going to take a lot of information.  It's like the difference between giving directions to a city, and giving directions for finding a single grain of sand on one beach of one city.  Indeed, with all those quantum branches existing side-by-side, it's going to take around as much information to find Earth in the universe, as to just directly specify the positions of all the atoms.

Kolmogorov complexity is the sense at which we zoom into the endless swirls of the Mandelbrot fractal and think, not "How complicated", but rather, "How simple", because the Mandelbrot set is defined using a very simple equation.  But if you wanted to find a subset of the Mandelbrot set, one particular swirl, you would have to specify where to look, and that would take more bits.

That's why we use the Kolmogorov complexity in Occam's Razor to determine how "complicated" or "simple" something is.  So that, when we think of the entire universe, all the stars we can see and all the implied stars we can't see, and hypothesize laws of physics standing behind it, we will think "what a simple universe" and not "what a complicated universe" - just like looking into the Mandelbrot fractal and thinking "how simple".  We could never accept a theory of physics as probably true, or even remotely possible, if it got an Occam penalty the size of the universe.

As a logical consequence of the way that Kolmogorov complexity has been defined, no closed system can ever increase in Kolmogorov complexity.  (At least, no closed system without a 'true randomness' source.)  A program can pattern ever more interacting wheels and gears into its RAM, but nothing it does from within itself can increase "the size of the smallest computer program that unfolds into it", by definition.

Suppose, for example, that you had a computer program that defined a synthetic biology and a synthetic gene system.  And this computer program ran through an artificial evolution that simulated 10^44 organisms (which is one estimate for the number of creatures who have ever lived on Earth), and subjected them to some kind of competition.  And finally, after some predefined number of generations, it selected the highest-scoring organism, and printed it out.  In an intuitive sense, you would (expect to) say that the best organisms on each round were getting more complicated, their biology more intricate, as the artificial biosystem evolved.  But from the standpoint of Kolmogorov complexity, the final organism, even after a trillion years, is no more "complex" than the program code it takes to specify the tournament and the criterion of competition.  The organism that wins is implicit in the specification of the tournament, so it can't be any more "complex" than that.

But if, on the other hand, you reached into the biology and made a few million random changes here and there, the Kolmogorov complexity of the whole system would shoot way up: anyone wanting to specify it exactly, would have to describe the random changes that you made.

I specify "random" changes, because if you made the changes with beneficence aforethought - according to some criterion of goodness - then I could just talk about the compact criterion you used to make the changes.  Only random information is incompressible on average, so you have to make purposeless changes if you want to increase the Kolmogorov complexity as fast as possible.

So!  As you've probably guessed, the argument against self-improvement is that since closed systems cannot increase their "complexity", the AI must look out upon the world, demanding a high-bandwidth sensory feed, in order to grow up.

If, that is, you believe that "increasing Kolmogorov complexity" is prerequisite to increasing real-world effectiveness.

(We will dispense with the idea that if system A builds system B, then system A must be "by definition" as smart as system B.  This is the "Einstein's mother must have been one heck of a physicist" sophistry.  Even if a future planetary ecology is in some sense "implicit" in a single self-replicating RNA strand in some tidal pool, the ecology is a lot more impressive in a real-world sense: in a given period of time it can exert larger optimization pressures and do more neat stuff.)

Now, how does one argue that "increasing Kolmogorov complexity" has something to do with increasing intelligence?  Especially when small machines can unfold into whole universes, and the maximum Kolmogorov complexity is realized by random noise?

One of the other things that a closed computable system provably can't do, is solve the general halting problem - the problem of telling whether any Turing machine halts.

A similar proof shows that, if you take some given solver, and consider the maximum size bound such that the solver can solve the halting problem for all machines of that size or less, then this omniscience is bounded by at most the solver's own complexity plus a small constant.

So... isn't increasing your Kolmogorov complexity through outside sensory inputs, the key to learning to solve the halting problem for ever-larger systems?

And doesn't this show that no closed system can "self-improve"?

In a word, no.

I mean, if you were to try to write it out as logic, you'd find that one of the steps involves saying, "If you can solve all systems of complexity N, you must be of complexity greater than N (maybe minus a small constant, depending on the details of the proof).  Therefore, by increasing your complexity, you increase the range of systems you can solve."  This is formally a non-sequitur.

It's also a non-sequitur in practice.

I mean, sure, if we're not dealing with a closed system, you can't prove that it won't solve the halting problem.  You could be looking at an external bright light in the sky that flashes on or off to reveal the halting solution.

But unless you already have that kind of mathematical ability yourself, you won't know just from looking at the light that it's giving you true solutions to the halting problem.  You must have just been constructed with faith in the light, and the light must just happen to work.

(And in any computable universe, any light in the sky that you see won't happen to solve the halting problem.)

It's not easy for "sensory information" to give you justified new mathematical knowledge that you could not in principle obtain with your eyes closed.

It's not a matter, for example, of seeing written in the sky a brilliant proof, that you would never have thought of on your own.  A closed system with infinite RAM can close its eyes, and write out every possible sensory experience it could have, along with its own reactions to them, that could occur within some bounded number of steps.  Doing this does not increase its Kolmogorov complexity.

So the notion can't be that the environment tells you something that you recognize as a proof, but didn't think of on your own.  Somehow, having that sensory experience in particular, has to increase your mathematical ability even after you perfectly imagined that experience and your own reaction to it in advance.

Could it be the healing powers of having a larger universe to live in, or other people to talk to?  But you can simulate incredibly large universes - vastly larger than anything we see in our telescopes, up-arrow large - within a closed system without increasing its Kolmogorov complexity.  Within that simulation you could peek at people watching the stars, and peek at people interacting with each other, and plagiarize the books they wrote about the deep wisdom that comes from being embedded in a larger world.

What justified novel mathematical knowledge - about the halting problem in particular - could you gain from a sensory experience, that you could not gain from perfectly imagining that sensory experience and your own reaction to it, nor gain from peeking in on a simulated universe that included someone having that sensory experience?

Well, you can always suppose that you were born trusting the light in the sky, and the light in the sky always happens to tell the truth.

But what's actually going on is that the non-sequitur is coming back to bite:  Increasing your Kolmogorov complexity doesn't necessarily increase your ability to solve math problems.  Swallowing a bucket of random bits will increase your Kolmogorov complexity too.

You aren't likely to learn any mathematics by gazing up at the external stars, that you couldn't learn from "navel-gazing" into an equally large simulated universe.  Looking at the actual stars around you is good for figuring out where in the universe you are (the extra information that specifies your location) but not much good for learning new math that you couldn't learn by navel-gazing into a simulation as large as our universe.

In fact, it's just bloody hard to fundamentally increase your ability to solve math problems in a way that "no closed system can do" just by opening the system.  So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.

Saying that a 500-state Turing machine might be able to solve all problems up to at most 500 states plus a small constant, is misleading.  That's an upper bound, not a lower bound, and it comes from having a constructive way to build a specific unsolvable Turing machine out of the solver.  In reality, you'd expect a 500-state Turing machine to get nowhere near solving the halting problem up to 500.  I would drop dead of shock if there were a 500-state Turing machine that solved the halting problem for all the Turing machines up to 50 states.  The vast majority of 500-state Turing machines that implement something that looks like a "halting problem solver" will go nowhere near 500 states (but see this comment).

Suppose you write a relatively short Turing machine that, by virtue of its unlimited working memory, creates an entire universe containing googols or up-arrows of atoms...

...and within this universe, life emerges on some planet-equivalent, and evolves, and develops intelligence, and devises science to study its ecosphere and its universe, and builds computers, and goes out into space and investigates the various physical systems that have formed, and perhaps encounters other life-forms...

...and over the course of trillions or up-arrows of years, a transgalactic intrauniversal economy develops, with conduits conducting information from one end of the universe to another (because you didn't pick a physics with inconvenient lightspeed limits), a superWeb of hyperintelligences all exchanging information...

...and finally - after a long, long time - your computer program blinks a giant message across the universe, containing a problem to be solved and a specification of how to answer back, and threatening to destroy their universe if the answer is wrong...

...then this intrauniversal civilization - and everything it's ever learned by theory or experiment over the last up-arrow years - is said to contain 400 bits of complexity, or however long the original program was.

But where did it get its math powers, from inside the closed system?

If we trace back the origins of the hypergalactic civilization, then every belief it ever adopted about math, came from updating on some observed event.  That might be a star exploding, or it might be the output of a calculator, or it might be an observed event within some mind's brain... but in all cases, the update will occur because of a logic that says, "If I see this, it means that".  Before you can learn, you must have the prior that structures learning.  If you see something that makes you decide to change the way you learn, then you must believe that seeing X implies you should learn a different way Y.  That's how it would be for superintelligences, I expect.

If you keep tracing back through that simulated universe, you arrive at something before the dawn of superintelligence - the first intelligent beings, produced by evolution.  These first minds are the ones who'll look at Peano Arithmetic and think, "This has never produced a contradiction, so it probably never will - I'll go ahead and program that into my AI."  These first minds are the ones who'll think, "Induction seems like it works pretty well, but how do I formalize the notion of induction?"  And these first minds are the ones who'll think, "If I build a self-improving AI, how should it update itself - including changing its own updating process - from the results of observation?"

And how did the first minds get the ability to think those thoughts?  From natural selection, that generated the adaptations that executed to think all those thoughts, using the simple evolutionary rule: "keep what works".

And in turn, natural selection in this universe popped out of the laws of physics.

So everything that this hypergalactic civilization ever believes about math, is really just induction in one form or another.  All the evolved adaptations that do induction, produced by inductive natural selection; and all the generalizations made from experience, including generalizations about how to form better generalizations.  It would all just unfold out of the inductive principle...

...running in a box sealed as tightly shut as our own universe appears to be.

And I don't see how we, in our own closed universe, are going to do any better.  Even if we have the ability to look up at the stars, it's not going to give us the ability to go outside that inductive chain to obtain justified mathematical beliefs.

If you wrote that 400-bit simulated universe over the course of a couple of months using human-level intelligence and some mysterious source of unlimited computing power, then you are much more complex than that hypergalactic civilization.  You take much more than 400 bits to find within the space of possibilities, because you are only one particular brain.

But y'know, I think that your mind, and the up-arrow mind of that inconceivable civilization, would still be worth distinguishing as Powers.  Even if you can figure out how to ask them questions.  And even if you're asking them questions by running an internal simulation, which makes it all part of your own "complexity" as defined in the math.

To locate a up-arrow-sized mind within an up-arrow-sized civilization, would require up-arrow bits - even if the entire civilization unfolded out of a 400-bit machine as compact as our own laws of physics.  But which would be more powerful, that one "complex" mind, or the "simple" civilization it was part of?

None of this violates Godelian limitations. You can transmit to the hypergalactic civilization a similar Turing machine to the one that built it, and ask it how that Turing machine behaves.  If you can fold a hypergalactic civilization into a 400-bit Turing machine, then even a hypergalactic civilization can confront questions about the behavior of 400-bit Turing machines that are real stumpers.

And 400 bits is going to be an overestimate.  I bet there's at least one up-arrow-sized hypergalactic civilization folded into a halting Turing machine with 15 states, or something like that.  If that seems unreasonable, you are not acquainted with the Busy-Beaver problem.

You can get a hell of a lot of mathematical ability out of small Turing machines that unfold into pangalactic hypercivilizations.  But unfortunately, there are other small Turing machines that are hellishly difficult problems - perhaps unfolding into hypercivilizations themselves, or things even less comprehensible.  So even the tremendous mathematical minds that can unfold out of small Turing machines, won't be able to solve all Turing machines up to a larger size bound.  Hence no Godelian contradiction.

(I wonder:  If humanity unfolded into a future civilization of infinite space and infinite time, creating descendants and hyperdescendants of unlimitedly growing size, what would be the largest Busy Beaver number ever agreed upon?  15?  Maybe even as large as 20?  Surely not 100 - you could encode a civilization of similar origins and equivalent power into a smaller Turing machine than that.)

Olie Lamb said:  "I don't see anything good about complexity.  There's nothing artful about complexity.  There's nothing mystical about complexity.  It's just complex."  This is true even when you're talking about wheels and gears, never mind Kolmogorov complexity.  It's simplicity, not complexity, that is the admirable virtue.

The real force behind this whole debate is that the word "complexity" sounds impressive and can act as an explanation for anything you don't understand.  Then the word gets captured by a mathematical property that's spelled using the same letters, which happens to be provably constant for closed systems.

That, I think, is where the argument really comes from, as it rationalizes the even more primitive intuition of some blind helpless thing in a box.

This argument is promulgated even by some people who can write proofs about complexity - but frankly, I do not think they have picked up the habit of visualizing what their math symbols mean in real life.  That thing Richard Feynman did, where he visualized two balls turning colors or growing hairs?  I don't think they do that.  On the last step of interpretation, they just make a quick appeal to the sound of the words.

But I will agree that, if the laws we know are true, then a self-improving mind which lacks sensory inputs, shouldn't be able to improve its mathematical abilities beyond the level of a up-arrow-sized civilization - for example, it shouldn't be able to solve Busy-Beaver(100).

It might perhaps be more limited than this in mere practice, if it's just running on a laptop computer or something.  But if theoretical mathematical arguments about closed systems show anything, that is what they show.

New Comment
78 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-]Roko00

Can you give an example of the kind of argument that you are trying to refute here?

The argument:

"a (real-world) agent such as a human or AI cannot self improve because the K-complexity of a closed system is constant"

seems obviously silly. A human or AI is not a closed system, it receives sensory inputs, and no-one would argue that self-improvement has to happen without any sensory inputs.

Aside: I would caution, echoing Eliezer, that theoretical results about what is computable and what isn't, and about Turing machines with infinite memory and how... (read more)

what exactly do you mean by the phrase "the 'complexity' of that entire universe... would be 500 bits."

He means 500 bits plus log2 the size of the initial conditions, plus log2 the number of instants elapsed since then - which is the number of bits you would need to specify a current multiverse state under the specified assumptions.

Tim, I already specified I was talking about the whole universe throughout time and space (and inflation and decoherence). The laws probably don't have any initial conditions under that specification, even if different things happen in different places.

Henry, follow the link to the Mandelbrot set, then look up how the Mandelbrot set is computed (a single equation).

-5Peterdjones

You're right about what you said about the instants. However, we don't know how big the initial conditions were - and I'm not clear on what it would mean for them not to exist. Maybe you are talking about eternal return? Even then there's a specification of which path you are on.

[-]Roko00

@Nick Tarleton:

Thanks, Nick. I can see I'm going to have to spend an exciting evening or 10 alone with the SL4 archives...

Perhaps one could consider Eliezer's OB posts as an "edited and sanitized summary of the morass of SL4"?

Show us simple laws which predict how many galaxies worth of atoms there would be in a universe (why not one?), and I will take 500 bits for generating the universe somewhat seriously...

Suppose, for example, that you had a computer program that defined a synthetic biology and a synthetic gene system. And this computer program ran through an artificial evolution that simulated 10^44 organisms (which is one estimate for the number of creatures who have ever lived on Earth), and subjected them to some kind of competition. And finally, after some predefined number of generations, it selected the highest-scoring organism, and printed it out. In an intuitive sense, you would (expect to) say that the best organisms on each round were getting ... (read more)

The "500 bits" only works if you take a hidden variable or Bohmian position on quantum mechanics. If (as the current consensus would say) non-linear dynamics can amplify quantum noise then enormous amounts of new information are being "produced" locally everywhere all the time. The current state of the universe incorporates much or all of that information. (Someone who understands the debates about black holes and the holographic principle should chime in with more precise analysis.)

I couldn't follow the whole argument so I'm not sure how this affects it, but given that Eliezer keeps referring to this claim I guess it is important.

Tim: The initial wavefunction may be completely determined by the laws, and under MWI that's all you need.

Roko: I actually don't notice much duplication between OB and SL4. It's definitely worth reading.

Will: Why don't you take it seriously? You know the Mandelbrot set is generated from very few bits, and I don't imagine you expect there to be a simple derivation of its structures. Taken literally, your request is highly unreasonable because it involves so many levels of poorly understood processes - inflation, baryogenesis, galaxy formation... and the ans... (read more)

Will Pearson: Show us simple laws which predict how many galaxies worth of atoms there would be in a universe (why not one?), and I will take 500 bits for generating the universe somewhat seriously...

Been done. It didn't work out, and Eddington's theory is now only history, but the example shows that it is not out of the question that there be such laws.

It might be worthwhile to note that cogent critiques of the proposition that a machine intelligence might very suddenly "become a singleton Power" do not deny the inefficacies of the human cognitive architecture offering improvement via recursive introspection and recoding, nor do they deny the improvements easily available via hardware substitution and expansion of more capable hardware and I/O.

The do, however, highlight the distinction between a vastly powerful machine madly exploring vast reaches of a much vaster "up-arrow" space of ... (read more)

[-]Boris320

I would drop dead of shock if there were a 500-state Turing machine that solved the halting problem for all the Turing machines up to 50 states.
There exists a constant C so that if you want to solve the halting problem for all Turing machines up to N states, you can use a particular Turing machine with N+C states. Moreover, this constant C is small (definitely less than 100). In other words, there exists a 500-state Turing machine that solves the halting problem for all the Turing machines up to 400 states.

Here's the algorithm: given as input a Turing machine M with less than 400 states,

  1. Compute a number k greater than BB(400).
  2. Simulate M for k steps.
  3. If it halts by then, then output "M halts". If it doesn't, then output "M doesn't halt".

Correctness proof: If it halts in less than k steps, then obviously it halts. If it doesn't, then by the definition of BB(400), it never will.

Analysis of number of states: This is possible with 400+C states because you use about 400 states for step 1, and a constant number of overhead states for simulating Turing machines. Because there exist universal Turing machines with less than 25 states, you can arrange for C to b... (read more)

I thought you had some reason for giving us extra detail, why the number 500. Yeah it could be short as 500 bits, but it could equally be very long.

I will like shorter explanation for the universe if I am presented with one compared to a longer one, but there is no reason to expect one of a certain length. Since we don't have any that actually manage to do so, why attach any number to it?

[-][anonymous]-30

I don't think there is a Theory of Everything for the universe. Just as that there isn't one for Mathematics.

What if the universe is not continuous, the simulation does lazy evaluation, and the laws of nature can only be described by an infinite number of bit? How would this change your reasoning?

drops dead of shock

Oh... well, at least there's no reasonable way for a superintelligence that can perform any finite computation, to obtain that particular 500-bit Turing machine, through any amount of internal thinking or induction from sensory experience, without using a reasoning method guaranteed at some point to make a mistake.

[-]PK-20

I don't understand. Am I too dumb or is this gibberish?

@pk I don't understand. Am I too dumb or is this gibberish?

It's not so complicated; it's just that we're so formal...

[-]Oleg00

Kolmogorov complexity of universe is more than just a description of its physical laws.

First of all, besides the physical laws, K should include the initial state of the universe. It may be large or even infinite. And it should be properly expressed in qubits.

Second, you'd need to pick the "present state", whatever that means, our of all possible future and past states. In a classical universe it may be only a 100bits or so, in a quantum multiverse, it's at least 1 bit per every branch.

Third, it is not at all obvious that the laws of universe are... (read more)

In fact, it's just bloody hard to fundamentally increase your ability to solve math problems in a way that "no closed system can do" just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.

Eliezer, you're making an important error here. I don’t think it affects the main argument you're making in this article (that considerations of "complexity" doesn't rule out self-improving minds), but this error may have grave consequences elsewhere. The error... (read more)

Wei, essentially the black box has to be such that it seems more likely for it to be a halting oracle than something only slightly smarter than you are - since when it says "doesn't halt", and you don't know if it halts or not, it's not like you can easily check.

Suppose, though, that the black box has the apparent qualities of a closed timelike curve in a universe that permits CTCs to be constructed with perfection (even if our own universe permitted CTCs they would break down well before up-arrow time). Then it would seem pretty plausible, afte... (read more)

Good point, but when the box says "doesn't halt", how do I know it's correct?

If humanity unfolded into a future civilization of infinite space and infinite time, creating descendants and hyperdescendants of unlimitedly growing size, what would be the largest Busy Beaver number ever agreed upon?

Suppose they run a BB evaluator for all of time. They would, indeed, have no way at any point of being certain that the current champion 100-bit program is the actual champion that produces BB(100). However, if they decide to anthropically reason that "for any time t, I am probably alive after time t, even though I have no direct eviden... (read more)

Boris: There's a small amount of subtlety in actually doing step 1.

Isn't it simply impossible? That doesn't interfere with your claim that such a Turing machine exists, but step 1 claims that it's computable.

Nick wrote: Good point, but when the box says "doesn't halt", how do I know it's correct?

A halting-problem oracle can be used for all kinds of things besides just checking whether an individual Turing machine will halt or not. For example you can use it to answer various mathematical questions and produce proofs of the answers, and then verify the proofs yourself. You should be able to obtain enough proofs to convince yourself that the black box is not just giving random answers or just being slightly smarter than you are.

If P!=NP, you should be ... (read more)

+1 to Anatoly Vorobey. Using K-complexity to capture the human notion of complexity seems to be even worse than using game-theoretic rationality to capture human rationality - something that's been attacked to death already.

Boris: There's a small amount of subtlety in actually doing step 1.

Douglas: Isn't it simply impossible? That doesn't interfere with your claim that such a Turing machine exists, but step 1 claims that it's computable.

It's impossible to bound BB(n) computably in n, but that leaves open the following question, on which Boris's argument depends.

Let BBB(n), the Busy Beaver Bound function, be the size of the smallest Turing machine that, starting from a blank tape, prints out an upper bound for BB(n). Boris's step (1) claims that BBB(n) is bounded by n+c for some constant c. It is not obvious to me either that this is true or that this is false.

Richard: Boris's step (1) claims that BBB(n) is bounded by n+c for some constant c. It is not obvious to me either that this is true or that this is false. c is the size of a universal Turing machine which, in addition to simulating its argument, keeps track of how many steps it simulates. BB(n) is the maximum number of steps that any halting TM of size n performs, therefore there is some particular TM of size n that performs BB(n) steps. Give that TM as input the the aforementioned UTM, and it will compute BB(n). The uncomputability of BB prevents you from finding the right TM, and from proving it correct if it's handed to you by a magic light in the sky. But the TM still exists even if you'll never know which it is.

Eliezer, I'm not quite sure why you are obsessed with the ability to find mathematical truths. You can different sets of mathematical truths if you take the parallel postulate true or not. Which ones are applicable in the real world depends where you are.

Unless you know where you are in the universe can not know what is friendly for you to do (or even useful).

In fact, it's just bloody hard to fundamentally increase your ability to solve math problems in a way that "no closed system can do" just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.

As Wei mentioned, P≠NP is basically the conjecture that this isn't true: i.e., that you can exponentially increase your ability to solve math problems by your environment being magic and your not being born with faith in that fact. So for example, if your environment im... (read more)

I would drop dead of shock

Eliezer, just as it was interesting to ask what probability estimate 'Nuts!' amounted to, I think it would be very useful for the forum of Overcoming Bias to ask what your implicit probability estimate for a 500 state TM being able to solve the halting problem for all TMs of up to 50 states.

I imagine that 'I would drop dead of shock' was intended to convey a probability of less than 1 in 10,000, or maybe 1 in 1,000,000?

@Toby: Why, yes, I was feeling rather grateful at that point that I hadn't quantified the probability. It's fair to assume that it would have been low enough that I couldn't plausibly recover from the calibration hit, like 1% or something.

@Scott: This entire discussion assumes unlimited finite internal computing power, so P and NP cut no dice here. Otherwise, of course a larger environment can outsmart you mathematically.

@Will: Mathematical truths are about which axioms imply which theorems.

@Wei: A halting oracle is usually said to output 1s or 0s, n... (read more)

Let me expand my point. Assume you run your hyper civilisation for 1 googol years, you ask it, "Do the angles in a three straight sided shape add up to 180?"

This can be answered yes or no, depending upon what axiom scheme you adopt, cartesian, hyperbolic or something more obscure. Having a system that has thought of all mathematical truths doesn't get you a Powerful system, unless it knows which apply to solving the problems you ask.

[-]jb400

Beautiful post, Eli. Seriously, one of the better --- and more useful, at least pedagogically --- ones. Should be required reading for, well, just about everyone.

[-]Boris100

Oops, I mis-stated the result. If we fix a universal Turing machine (equivalently, a programming language), then there exists a constant C so that deciding the halting problem for programs of length less than N can be done by a program of length less than N+C. Pengvado's solution works here.

Unfortunately, as Richard observed, you can't do this with Turing machines, essentially because Turing machines can not inspect their own finite control, unlike the stored program on a von Neumann machine. In fact, it appears that my claim---in Richard's notation, that BBB(N) is N+O(1)---is open. It is known that BBB(N) is O(N), and moreover, a 2002 paper "Improved Bounds for Functions Related to Busy Beavers" by Ben-Amran and Petersen showed that BBB(N) is N+o(N). I haven't found better results, but on the off-chance that I do, I'll post another comment here.

Now, because my claim is open, we still have to check if Eliezer's original intuition (that 500-state Turing machines can't solve the halting problem for 50-state ones) holds up. Luckily for me, it doesn't. It's known that BBB(N) is less than 3N+6 (that's the O(N) result above), so at worst, there exists a 500-state Turing ... (read more)

Otherwise, of course a larger environment can outsmart you mathematically.

No, not of course. For example, suppose P were equal to PSPACE. Then even though a larger environment could fundamentally outsmart you mathematically (say by solving the halting problem), it couldn't prove to you that it was doing so. In other words, the situation with polynomial-time computation would be more-or-less the same as it is with unlimited computation: superintelligent machines could only prove their superintelligence to other superintelligent machines.

That the situatio... (read more)

Scott, while I don't mean to deflate in any sense the shattering import of your wisdom...

...there's a pretty large segment of the planetary population upon whom the distinction between "polynomially more computing power" and "exponentially more computing power" would be lost.

These unenlightened fools, if they had computing power N, would be just as impressed by a superintelligence that had N^googol computing power as a superintelligence that had 2^N computing power. They'd just lump both of those together as "the environment is smarter than me, and can prove things I can't and then exhibit the proof". Poor bastards.

Um, except that we also don't know whether there are computations that can be checked in N time but only performed in Ngoogol time. The situation is qualitatively the same as for N versus 2N.

(the superscripts didn't show up: that was N^googol and 2^N)

The initial wavefunction may be completely determined by the laws, and under MWI that's all you need.

Bundling the initial state into the laws of physics seems like a confused way of looking at things to me. The state and the laws are different things - if you muddle them together conceptually, your accounting is likely to get equally muddled. Also, most of the rationale for the size of the "laws of physics" being small evaporates. The initial state might be enormous, for all anyone knows. The laws of physics could be infinite too - but there the hope that they are small and finite seems slightly more reasonable.

Henry: "Both of those concepts seem completely apt for describing perfectly deterministic systems. But, in describing the "complexity" of the universe even in something as simple as the 'pattern of stars that exists' one would still have to take into account potential non-deterministic factors such as human behavior. [...] [A]re you saying that you are a strict determinist?"

I'll take this one. Yes, we're presuming determinism here, although the determinism of the Many Worlds Interpretation is a little different from the single-world det... (read more)

POSTSCRIPT-- Strike that stray does in the penultimate sentence. And re compatibilism, the short version is that no, you don't have free will, but you shouldn't feel bad about it.

A halting oracle is usually said to output 1s or 0s, not proofs or halting times, right?

It's easy to use such an oracle to produce proofs and halting times. The following assumes that the oracle outputs 1 if the input TM halts, and 0 otherwise.

For proofs: Write a program p which on inputs x and i, enumerates all proofs. If it finds a proof for x, and the i-th bit of that proof is 1, then it halts, otherwise it loops forever. Now query the oracle with (p,x,0), (p,x,1), ..., and you get a proof for x if it has a proof.

Halting times: Write a program p which o... (read more)

I don't recall if I've mentioned this before, but Solomonoff induction in the mixture form makes no mention of the truth of its models. It just says that any computable probability distribution is in the mixture somewhere, so you can do as well as any computable form of cognitive uncertainty up to a constant. Eliezer, if what you say is true, then it shouldn't be possible for anyone, using just a Turing machine, to beat Solomonoff Induction at a pure prediction game (by more than a constant), even if the input sequence is uncomputable. But here is a count
... (read more)

Good question, Eliezer. If the human mathematician is computable, why isn't it already incorporated into Solomonoff Induction? It seems to me that the human mathematician does not behave like a Bayesian. Let H be the hypothesis that the input sequence is the unary encodings of Busy Beaver numbers. The mathematician will try to estimate, as best as he can, P(next symbol is 0|H). But when the next symbol turns out to be 1, he doesn't do a Bayesian update and decrease P(H), but instead says "Ok, so I was wrong. The next Busy Beaver number is bigger than ... (read more)

@ Wei. Thanks for the response. I will look at the refs but haven't yet done so. I'm skeptical about whether they'll change my mind on the subject, but I'll take a look.

It seems to me that the belief in pure determinism is axiomatic (I'm not even sure what it means to be a Bayesian in a purely deterministic system!), so most of this discussion appears to be pure conjecture. I'm not sure that it has any meaningful application.

@Wei: p(n) will approach arbitrarily close to 0 as you increase n.

This doesn't seem right. A sequence that requires knowledge of BB(k), has O(2^-k) probability according to our Solomonoff Inductor. If the inductor compares a BB(k)-based model with a BB(k+1)-based model, then BB(k+1) will on average be about half as probable as BB(k).

In other words, P(a particular model of K-complexity k is correct) goes to 0 as k goes to infinity, but the conditional probability, P(a particular model of K-complexity k is correct | a sub-model of that particular model with K-complexity k-1 is correct), does not go to 0 as k goes to infinity.

After further thought, I need to retract my last comment. Consider P(next symbol is 0|H) again, and suppose you've seen 100 0's so far, so essentially you're trying to predict BB(101). The human mathematician knows that any non-zero number he writes down for this probability would be way too big, unless he resorts to non-constructive notation like 1/BB(101). If you force him to answer "over and over, what their probability of the next symbol being 0 is" and don't allow him to use notation like 1/BB(101) then he'd be forced to write down an inconsistent probability distribution. But in fact the distribution he has in mind is not computable, and that explains how he can beat Solomonoff Induction.

Rolf, I was implicitly assuming that even knowing BB(k), it still takes O(k) bits to learn BB(k+1). But if this assumption is incorrect, then I need to change the setup of my prediction game so that the input sequence consists of the unary encodings of BB(1), BB(2), BB(4), BB(8), …, instead. This shouldn’t affect my overall point, I think.

Since our whole universe may be defined by something as frugal as hundreds of bits of complexity then the fact that there's an upper bound on speed is an interesting thing since it makes locality a very tangible thing, and not just a question of adding some bits of information to define that where in the universe that "locality" is.

"If you want to print out just Earth by itself, then it's not enough to specify the laws of physics. You also have to point to just Earth within the universe. You have to narrow it down somehow. And, in a big universe, that's going to take a lot of information."

0Peterdjones
Yes. To put it another way: it takes much more than 500 bits of information to predict a lifetime's experiences. You have to no where you are in the universe/multiverse. The counterintuitive fact is that a set can contain much less information (down to 0 bits) [*] than one of it subsets..and what we see is always a subset. [*]"A skeptic worries about all the information necessary to specify all those unseen worlds. But an entire ensemble is often much simpler than one of its members. This principle can be stated more formally using the notion of algorithmic information content. The algorithmic information content in a number is, roughly speaking, the length of the shortest computer program that will produce that number as output. For example, consider the set of all integers. Which is simpler, the whole set or just one number? Naively, you might think that a single number is simpler, but the entire set can be generated by quite a trivial computer program, whereas a single number can be hugely long. Therefore, the whole set is actually simpler. Similarly, the set of all solutions to Einstein's field equations is simpler than a specific solution. The former is described by a few equations, whereas the latter requires the specification of vast amounts of initial data on some hypersurface. The lesson is that complexity increases when we restrict our attention to one particular element in an ensemble, thereby losing the symmetry and simplicity that were inherent in the totality of all the elements taken together. In this sense, the higher-level multiverses are simpler. Going from our universe to the Level I multiverse eliminates the need to specify initial conditions, upgrading to Level II eliminates the need to specify physical constants, and the Level IV multiverse eliminates the need to specify anything at all." -- Tegmark

"produced by inductive natural selection;"

Natural selection is not an inductive process.

In practice, natural selection produces intelligent agents, who can do induction, and then they make selective choices that affect who lives, who dies and who reproduces. What that means is that any idea that evolution (or natural selection) in the real world does not involve induction is not correct.

Nice try Tyler. What individuals "do" does not define what natural selection does.

One could also say: "In practice, natural selection produces intelligent agents, who can predict, and then they make selective choices that affect who lives, who dies and who reproduces."

That does NOT mean that natural selection operates via a predictive process. Ask any good biologist and they will tell you than natural selection is not predictive. That's why species go extinct all the time.

Natural selection is a non-inductive proces that can pro... (read more)

That does NOT mean that natural selection operates via a predictive process. [...]

It most certainly does - if we are talking about natural selection in evolution and nature.

Examples of failure to predict do not prove evolution does not predict - just that it does not predict perfectly.

The process of natural selection is "Trial and error" which is quite literally analogous to Poppers theory of falsification.

Heh. Not even science works via Popper's theory of falsification.

it is not at all clear the intelligent agent operates primarily using
... (read more)

You cite the same rhetoric by Dawkins that I do. Dawkins is one of those whom I am arguing against. The issue is simple enough not to require an appeal to authority to resolve. Natural selection does not involve prediction if it acts on a system which does not make predictions. It does involve prediction when it acts on a system which does make predictions. If the selecting agent is intelligent, and thinks about the future, you cannot fully model the resulting process without modelling that predictive process. Tree leaves do not make predictions - and I n... (read more)

Tim,

I think you are confusing an emergent property created by a system with how the system operates on its own. That architects draft blueprints that result in public housing projects that leads to forced living relationships that makes it hard to evict drug dealers doesn’t mean that the discipline of architecture runs on drug dealing. Even if that drug dealing impacts the architects.

You seem to have written some simulations of natural selection. In writing those algorithms did you have to code in the ability to predict? Of course not... (read more)

Since the dolphin has evolved flippers would you therefore say that “natural selection operates on flippers?”

Yes - provided we were talking about a process that included dolphin evolution. The process of natural selection in general does not necessarily involve prediction - but from the beginning I have been discussing natural selection in evolution - which does involve prediction.

Regarding Popper, I won't say much - since I would be preaching to the choir on this blog. I recommend you start by going and reading the bit beginning: "Previously,... (read more)

"Since the dolphin has evolved flippers would you therefore say that “natural selection operates on flippers?"

"Yes - provided we were talking about a process that included dolphin evolution."

I'm flabbergasted by this response. There is nothing inherent about natural selection that is going to give you flippers. That's dependent on the environment, the mutations, and other factors. The process itself has no "flippers". It's a process that works fine without flippers, yet you insist that natural selection "opera... (read more)

Re: the philosophy of science - you don't seem to get it. Oh well, I tried.

Re: evolution and natural selection - yes there is "the class of all things which involve inheritance, variation and selection". However, there is also the instance of evolution via natural selection running on our planet. You have consistently talked about the former. I have consistently talked about the latter. I figure that once you stop attempting to apply my comments to the former notion - which they simply do not apply to - our disagreement here will evaporate.

T... (read more)

Tim,

The problem is that there is no mechanism in the process of natural selection for stuffing that foresight generated by brains back into the genome. Learn all you want but it isn't passed onto you kids via your genes. That's the rub. That's why natural selection is blind to the future. The idea that natural selection is blind is perfectly accurate.

That's also true for the small minority of organisms on the planet that actually have brains that predict the future. So I am talking about the instance of natural selection operating on this planet. Y... (read more)

The problem is that there is no mechanism in the process of natural selection for stuffing that foresight generated by brains back into the genome. Learn all you want but it isn't passed onto you kids via your genes. That's the rub.

There most certainly are such mechanisms. Most obviously sexual selection and the Baldwin effect - but also via vanilla natural selection. I explain this issue clearly enough in my essay.

No, sexual selection does not determine which mutations occur. It's merely a reinforcing feedback loop that is actually considered an example of how evolution can run off the rails, so to speak.

Sure females might by accident happen to pick a feature in males that might prove adaptive. Unfortunately for your argument it is not based on prediction, but is happenstance. Even were the "correct" feature choosen initially there is then the tendency of sexual selection to over select for the feature merely because other females find the feature att... (read more)

To me, this is all as silly as the Chinese Room. Naive intuition says that the Chinese Room doesn't understand chinese in the way we do... because it doesn't, running as it does on exponentially slower timescales. Similarly, I can close the box on my turing machine and it can, over mega-exponential timescales and using mega-exponential amounts of tape, get just as much smarter as it could by looking around and talking to people, but there are still several reasons why the latter strategy is superior.

Of course, I may just not understand what is supposed to ... (read more)

I hardly know where to begin - and I'm not sure I should bother, since we are not really very on topic, and this has gone on long enough already. Very briefly:

1. Nobody ever claimed that sexual selection determines which mutations occur. Sexual selection provides examples of selection by intelligent agents affecting the course of evolution;

2. Sexual selection is not "happenstance". Females tend to pick traits that - among other things - reflect male health, fitness, and resistance to parasite infestations;

3. The cases where sexual selection drive... (read more)

Associating Kolmogorov complexity with intelligence causes brain damage: Example.

"...the Kolmogorov complexity of a pattern is the size of the program code of the shortest Turing machine that produces the pattern as an output, given unlimited tape as working memory."

I am confused. If each pattern gets its own shortest program to describe it, why can't complexity numbers be computed (wikipedia says they can't be)?

I had thought that the reason they could not be was that each string has its own best program as a prefix that minimizes the number of bits needed to express the prefix plus the string, and there was no neutral way t... (read more)

0shokwave
If you could compute them, you could use that computation (in a complicated way) to produce strings that could only be produced by much longer programs. This is a contradiction, so it's not possible to compute complexity.
0[anonymous]
Is this right: For every string there is a program that gives it its smallest k, s+p=K. For every s there is a k bigger than s but no bigger than any other k, yet "For some sn there is a Kn bigger than sn but no bigger than any other K" < pn? So computing K for every s means there is a paradox for some s. Can we make our lives easy by picking a s, accepting it is a paradox, and moving on to compute all other Ks? Or more intensely, by randomly picking a representative sample of strings to be paradoxes, calculate each (non-paradox) string's K in each paradox world, and then average out the Ks? It seems like Eliezer only affirmatively says one K is bigger than another when they are of vastly different orders of magnitude. If this is obviously wrong, sorry but I have no math background!
0[anonymous]
Allow me to try again. Any string can be made by some programs, each program has a mathematical complexity of K, we say each string has a K of the program with the smallest K that makes that string. If knowing a string gives us a precise value for K, knowing a precise value for K lets us generate some strings that are more complex than K but no more complex than any other strings higher than K. Some strings are random, and have Ks that are strings more complex than the strings they describe. For some string the shortest program that makes it will be "The shortest string with K greater than K1", and "The shortest string with K greater than K1" is a program with a low K. Yet that string was chosen by the program because it had a higher K than "the shortest string with complexity greater than K" does, so the string thus delineated "really" has a very small K, not a high one, so it must be unselected, and thus unselected it "really" has a high K again, falsifying that program's selection of anything else. The problem is that programs can be made out of self referencing words that select their components and targets based on what other programs are doing. Can't we still calculate K in a mathematical universe forbidding self referencing sentences? In our world we are comfortable saying K1 is greater than K2 if and only if we know it is orders of magnitude greater. What would be wrong with saying K1 is greater than K2 because K1=100 and K2=10, where K is complexity in the mathematical world with the extra axiom?
0lessdazed
So, any string can be made by some programs, each program has a mathematical complexity of K, we say each string has a K of the program with the smallest K that makes that string. If knowing a string gives us a precise value for K, knowing a precise value for K lets us generate some strings that are more complex than K but no more complex than any other strings higher than K. Some strings are random, and have Ks that are strings more complex than the strings they describe. For some string the shortest program that makes it will be "The shortest string with K greater than K1", and "The shortest string with K greater than K1" is a program with a low K. Yet that string was chosen by the program because it had a higher K than "the shortest string with complexity greater than K" does, so the string thus delineated "really" has a very small K, not a high one, so it must be unselected, and thus unselected it "really" has a high K again, falsifying that program's selection of anything else. The problem is that programs can be made out of self referencing words that select their components and targets based on what other programs are doing. Can't we still calculate K in a mathematical universe forbidding self referencing sentences? In our world we are comfortable saying K1 is greater than K2 if and only if we know it is orders of magnitude greater. What would be wrong with saying K1 is greater than K2 because K1=100 and K2=10, where K is complexity in the mathematical world with the extra axiom?
0shokwave
I'm not mathematician enough to say yes, but I think so. I don't know how well/simple a no-self-reference version of some theories stacks up against the self-referencing versions (I am in the mind of lisp's recursion features; without that ability lisp programs would be ... longer and more complicated).
0nshepperd
And I guess you can't just enumerate all programs and take the length of the shortest (=first) one that produces the string because of the halting problem? I wonder if I could turn that around and spin a computation of complexity into a halting oracle...