Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

SilasBarta comments on Taking Occam Seriously - Less Wrong

22 Post author: steven0461 29 May 2009 05:31PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (51)

You are viewing a single comment's thread.

Comment author: SilasBarta 29 May 2009 09:31:50PM *  7 points [-]

Okay, now for a more substantive comment (ETA see note 1). I read the essay on what it means for a mind to be implemented, and Almonds talks about a "problem" presented by Searle that says, "Can you call a wall a mind (or word processor program) on the grounds that you can find some ismorphism between the molecular motion of the wall, and some more interesting program?" and thus, "Why is the ismorphism to the program somehow less of a valid interpretation than that which we apply to an actual computer running the known program?"

I really don't see what the problem is here. The argument relies on the possibility of finding an isomorphism between an arbitrary "interesting" algorithm, and something completely random. Yes, you can do it: but only by applying an interpreter of such complexity that it is itself the algorithm, and the random process is just background noise.

The reason that we call a PC (or a domino setup ) a "computer" is because its internal dynamics are consistently ismorphic to the abstract calculation procedure the user wants it to do. In a random environment, there is no such consistency, and as time progresses you must keep expanding your interpretation so that it continues to output what WordStar does. Which, again, makes you the algorithm, not the wall's random molecular motions.

(Edit to add: By contrast, a PC's interpreter (the graphics card, monitor, mouse, keyboard, etc.) do not change in complexity or the mapping the perform from the CPU/memory to me.)

Surely, the above differences show how you can meaningfully differentiate between true programs/minds and random processes, yet Almond doesn't mention this possibility (or I don't understand him).

1 By this remark, I was absolutely not meaning to trivialize the other comments here. Rather, at the time I posted this, there were few comments, and I had just made a comment with no substance. The remark compares to my other comment, not to any other commenter.

Comment author: PaulUK 30 May 2009 02:16:26AM 4 points [-]

As the author of this article, I will reply to this, though it is hard to make much of a reply here, though. (I actually got here our of curiosity when I saw the site logs). I am, however, always pleased to discuss issues like this with people. One issue with this reply is that it is not just randomness we have to worry about. If we are basing a computational interpretation on randomness, yes, we may need to make the computational interpretation progressively more extreme, but Searle's famous WordStar running in a wall example is just one example. We may not even have the computational interpretation based on randomness: it could conceivably be based on structure in something else, even though that structure would not be considered to be running the computer program except under a very forced interpretation. Where would we draw the line? Another point - why should it matter if we use a progressively more extreme interpretation? We might, for example, just want to say that a computation ran for 10 seconds, which relies on a fixed intertreptation (if a complex one), and what happens after that may not interest us. Where would we draw the line? Another issue is that the main argument had been about statistical issues with combining computers when considering probability issues - the whole thing had not been based on Searle - who would not take me any more seriously by the way.

Comment author: SilasBarta 30 May 2009 04:36:13AM 4 points [-]

We may not even have the computational interpretation based on randomness: it could conceivably be based on structure in something else, even though that structure would not be considered to be running the computer program except under a very forced interpretation. Where would we draw the line?

We would draw the line where our good old friend mutual information comes in. If learning the results of the other phenomenon tells you something about the results of the algorithm you want to run, then there is mutual information, and the phenomenon counts as a (partial) implementation of the algorithm.

Comment author: PaulUK 30 May 2009 05:23:34AM 5 points [-]

This is an approach I considered back in 1990something actually, and at the time I actually considered it correct. I get the idea. We say that the "finding algorithm" somehow detracts from what is running. The problem is, this does not leave a clearly defined algorithm as the one being found. if X is found by F, you might say that all that runs is a "partial version of X" and that X only exists when found by F. This, however, would not just apply to deeply hidden algorithms. I could equally well apply it your brain. I would have to run some sort of algorithm, F, on your brain to work out that some algorithm corresponding to you, X, is running. Clearly, that would be nothing like as severe as the extreme situations discussed in that article, but what does it mean for your status? Does it mean that the X corresponding to you does not exist? Are you "not all there" in some sense?

Here is a thought experiment:

A mind running in a VR system (suppose the two are one software package to make this easier) gradually encrypts itself. By this I mean that it goes through a series of steps, each intended to make it slightly more difficult to realize that the mind is there. There is no end to this. When does the mind cease to exist? When it is so hard to find that you would need a program as long as the one being hidden to find it? I say that is arbitrary.

You suggest that maybe the program running the mind just exists "partially" in some way, which I fully understand. What would the experience be like for the mind as the encryption gets more and more extreme? I say this causes issues, which are readily resolved if we simply say that the mind's measure decreases.

I can also add a statistical issue to this, which I have not written up yet. (I have a lot to add on this subject. It may be obvious that I need to argue that this applies to everything, and not just minds, to avoid some weird kind of dualism.).

Suppose we have two simulations of you, running in VRs. One is about to look in a box and see a red ball. The other will see a blue ball. We subject the version that will see the blue ball to some process that makes it slightly harder to find. You don't know which version you are. How much will you expect to see a blue ball when you look in the box? Do you say it is 50/50 that you will see a red ball or a blue ball? We keep increasing the "encryption" a bit each time I ask the question. If your idea that somehow the mind is only "partial" by needing the finding algorithm to find it is right, I suggest we end up with statistical incoherency. We can only say that the probability is 50/50 when the situations are exactly the same, but that will never be the case in any real situation. For any situation, one mind will need a bit more finding than the other.

In other words, if you think the length of the finding algorithm makes the algorithm running a mind somehow "partial", in a statistical question in which you had two possibilities, one in which your mind was harder to find than the other, and you don't know which situation you are in, when would you eliminate the "partial" mind as a possibility? If you say, "Never. As the encryption increases I would just say I am less and less likely to be in that situation" you have effectively agreed with me by adopting an approach where each mind is as valid as the other (you accept either as a candidate for your situation but treat them differently with regard to statistics - which is what I do). If you say that one mind cannot be a candidate for your situation then you have the issue of cut-off point. What cut-off point? When would you say, “This mind is real. This mind is only partial so cannot be a candidate for my experience. Therefore, I am the first mind?”

I would point out that I do not ignore these issues. I address them by using measure. I take the view that a mind which takes more finding exists with less measure, because a smaller proportion of the set of all possible algorithms that could be used to find something like it will find something like it.

Finally, this only deals with one issue. There is also the issue of combining computers in the statistical thought experiments that I mentioned in the first article of that series. My intention in that series is to try to show that these various issues demand that we take a particular view about minds and reality to maintain statistical coherency.

Comment author: loqi 30 May 2009 06:38:35PM 1 point [-]

When does the mind cease to exist? [...] I take the view that a mind which takes more finding exists with less measure, because a smaller proportion of the set of all possible algorithms that could be used to find something like it will find something like it.

I'm running into trouble with the concept of "existence" as it's being applied here. Surely existence of abstract information and processes must be relative to a chosen reference frame? The "possible algorithms" need to be specified relative to a chosen data set and initial condition, like "observable physical properties of Searle's wall given sufficient locality". Clearly an observer outside of our light cone couldn't discern anything about the wall, regardless of algorithm.

An encrypted mind "existing less" doesn't seem to carry any subjective consequences for the mind itself. What if a mind encrypts itself but shares the key with a few others? Wouldn't its "existence" depend on whether or not the reference frame has access to the key?

If you've read it, I'm curious to know what you think of the "dust hypothesis" from Egan's Permutation City in this context.

Comment author: PaulUK 30 May 2009 06:57:14PM 2 points [-]

"Less measure" is only meant to be of significance statistically, not subjectively. For example, if you could exist in one of two ways, one with measure X and one with measure of 0.001X, I would say you should think it more likely you are in the first situation. In other words, I am agreeing (if you are arguing for this) that there should be no subjective difference for the mind in the extreme situation. I just think we should think that that situation corresponds to "less" observers in some way.

My own argument is actually a justification of something a bit like the dust hypothesis in "Permutation City". However, there are some significant differences, so that analogy should not apply too much. I would say that the characters in Greg Egan's novel undergo a huge decrease in measure, which could cause philosophical issues - though it would not feel different after it had happened to you.

I think we should consider this in terms of measure because there are "more ways to find you" in some situations than in others. It is almost like you have more minds in one situation than another - though there are no absolute numbers and really it should be considered in terms of density. If you want to see why I think measure is important, this first article may help: http://www.paul-almond.com/Substrate1.htm.

Comment author: loqi 30 May 2009 08:16:56PM *  2 points [-]

For example, if you could exist in one of two ways, one with measure X and one with measure of 0.001X, I would say you should think it more likely you are in the first situation. [...] I just think we should think that that situation corresponds to "less" observers in some way.

This seems tautological to me. Your measure needs to be defined relative to a given set of observers.

I think we should consider this in terms of measure because there are "more ways to find you" in some situations than in others.

More ways for who to find you?

If you want to see why I think measure is important, this first article may help

Very interesting piece. I'll be thinking about the Mars colony scenario for a while. I do have a couple of immediate responses.

How likely is it that you are in Computer A, B or C?

As long as the simulations are identical and interact identically (from the simulation's point of view) with the external world, I don't think the above question is meaningful. A mind doesn't have a geographical location, only implementations of it embedded in a coordinate space do. So A, B, and C are not disjoint possibilities, which means probability mass isn't split between them.

The more redundancy in a particular implementation of a version you, then the more likely it is that that implementation is causing your experiences.

I see this the other way around. The more redundancy in a particular implementation, the more encodings of your own experiences you will expect to find embedded within your accessible reality, assuming you have causal access to the implementation-space. If you are causally disconnected from your implementation (e.g., run on hypothetical tamper-proof hardware without access to I/O), do you exist with measure zero? If you share your virtual environment with millions of other simulated minds with whom you can interact, do they all still exist with measure zero?

Comment author: PaulUK 30 May 2009 08:31:54PM 1 point [-]

"As long as the simulations are identical and interact identically (from the simulation's point of view) with the external world, I don't think the above question is meaningful. A mind doesn't have a geographical location, only implementations of it embedded in a coordinate space do. So A, B, and C are not disjoint possibilities, which means probability mass isn't split between them."

I dealt with this objection in the second article of the series. It would be easy to say that there are two simulations, in which slightly different things are going to happen. For example, we could have one simulation in which you are going to see a red ball when you open a box and one where you are going to see a blue ball. We could have lots of computers running the red ball situation and then combine them and discuss how this affects probability (if at all).

"The more redundancy in a particular implementation of a version you, then the more likely it is that that implementation is causing your experiences."

Does this mean that if we had a billion identical simulations of you in a VR where you were about to see a red ball and one (different) simulation of you in a VR where you are about to see a blue ball, and all these were running on separate computers, and you did not know which situation you were in, you would not think it more likely you were going to see a red ball? (and I know a common answer here is that it is still 50/50 - that copies don't count - which I can answer if you say that and which is addressed in the second article - I am just curious what you would say about that.)

" see this the other way around. The more redundancy in a particular implementation, the more encodings of your own experiences you will expect to find embedded within your accessible reality, assuming you have causal access to the implementation-space. If you are causally disconnected from your implementation (e.g., run on hypothetical tamper-proof hardware without access to I/O), do you exist with measure zero? If you share your virtual environment with millions of other simulated minds with whom you can interact, do they all still exist with measure zero?"

I am not making any suggestion that there is any connection between measure, redundancy and whether or not you are connected to I/O. Whether you are connected to I/O does not interest me much. However, some particularly low measure situations may be hard to connect to I/O if they are associated with very extreme interpretations.

Comment author: loqi 31 May 2009 12:02:19AM *  0 points [-]

I dealt with this objection in the second article of the series. It would be easy to say that there are two simulations, in which slightly different things are going to happen.

While this is also a valid and interesting scenario to consider, I don't think it "deals with the objection". The idea that "which computer am I running on?" is a meaningful question for someone whose experiences have multiple encodings in an environment seems pretty central to the discussion.

Does this mean that if we had a billion identical simulations of you in a VR where you were about to see a red ball and one (different) simulation of you in a VR where you are about to see a blue ball, and all these were running on separate computers, and you did not know which situation you were in, you would not think it more likely you were going to see a red ball?

I actually don't have a good answer to this, and the flavor of my confusion leads me to suspect the definitions involved. I think the word "you" in this context denotes something of an unnatural category. To consider the question of anticipating different experiences, I have to assume a specific self exists prior to copying. Are the subsequent experiences of the copies "mine" relative to this self? If so, then it is certain that "I" will experience both drawing a red ball and drawing a blue ball, and the question seems meaningless. I feel that I may be missing a simple counter-example here.

I know a common answer here is that it is still 50/50 - that copies don't count - which I can answer if you say that and which is addressed in the second article

50/50 makes sense to me only as far it represents a default state of belief about a pair of mutually exclusive possibilities in the absence of any relevant information, but the exclusivity troubles me. I read objection 9, and I'm not bothered by the "strange" conclusion of sensitivity to minor alterations (perhaps this leads to contradictions elsewhere that I haven't perceived?). I agree that counting algorithms is just a dressed-up version of counting machines, because the entire question is predicated on the algorithms being subjectively isomorphic (they're only different in that some underlying physical or virtual machine is behaving differently to encode the same experience).

Of course, this leads to the problem of interpretation, which suggests to me that "information" and "algorithm" may be ill-defined concepts except in terms of one another. This is why I think I/O is important, because a mind may depend on a subjective environment to function. If this is the case, removal of the environment is basically removal of the mind. A mind of this sort, subjectively dependent on its own substrate, can be "destroyed" relative to observers of the environment, as they now have evidence for the following reasoning:

  • Mind M cannot logically exist except as self-observably embedded in environment E. So if E lacks such an encoding, M cannot exist.
  • I have observed E, and have sound reasons (local to E) to doubt the existence of a suitable encoding of M.
  • Therefore, M does not exist.

So far, this is the only substrate dependence argument I find convincing, but it requires the explicit dependence of M on E, which requires I/O.

Comment author: PaulUK 31 May 2009 12:33:50AM 3 points [-]

"Are the subsequent experiences of the copies "mine" relative to this self? If so, then it is certain that "I" will experience both drawing a red ball and drawing a blue ball, and the question seems meaningless. I feel that I may be missing a simple counter-example here."

No. Assume you have already been copied and you know you are one of the software versions. (Some proof of this has been provided). What you don't know is whether you are in a red ball simulation or a blue ball simulation. You do know that there are a lot of (identical - in the digital sense) red ball simulations and one blue ball simulation. My view on this is that you should presume yourself more likely to be in the red ball simulation.

Some people say that the probability is 50/50 because copies don't count. I would make these points:

  1. sensitivity, which you clearly know about.
  2. it is hard to say where each program starts and ends. For example, we could say that the room with each with red ball simulation computer in it is a simulation of a room with a red ball simulation computer in it - in other words, the physical environment around the computer could validly be considered part of the program. It is trivial to argue that a physical system is a valid simulation of itself. As each computer is going to be in a slightly different physical environment, it could be argued that this means that all the programs are different, even if the digital representation put into the box by the humans is the same. The natural tendency of humans is just to to focus on the 1s and 0s - which is just a preferred interpretation.
  3. Humans may say that each program is "digitally" the same but we might interpret the data slightly differently. For example, one program run may have a voltage of 11.964V in a certain switch at a certain time. Another program run may have a voltage of 11.985V to represent the same binary value. It could be argued that this makes them different programs, each of which is simulating a computer with an uploaded mind on it with different voltages in the switches (again, using the idea that a thing is also a computer simulation of that thing if we are going to start counting simulations).

I just think that when we try to go for 50/50 (copies don't count) we can get into a huge mess that a lot of people can miss. While I don't think you agree with me, I think maybe you can see this mess.

"While this is also a valid and interesting scenario to consider, I don't think it "deals with the objection". The idea that "which computer am I running on?" is a meaningful question for someone whose experiences have multiple encodings in an environment seems pretty central to the discussion."

I think the suggested scenario makes it meaningful. There is also the issue of turning off some of the machines. If you know you are running on a billipn identical machines, and that 90% of them are about to be turned off then it could then become an important issue for you. It would make things very similar to what is regarded as "quantum suicide".

We can also consider another situation:

You have a number of computers, all running the same program, and something in the external world is going to affect these computers, for example a visitor from the outside world will "login" and visit you - we could discuss the probability of meeting the visitor while the simulations are all identical.

"This is why I think I/O is important, because a mind may depend on a subjective environment to function. If this is the case, removal of the environment is basically removal of the mind."

I don't know if I fully understood that - are you suggesting that a reclusive AI or uploaded brain simulation would not exist as a conscious entity?

As you asked me about Permutation City (Greg Egan's novel) before, I will elaborate on that a bit.

The "dust hypothesis" in Permutation City was the idea that all the bits of reality could be stuck together in different ways, to get different universe. The idea here is that every interpretation of an object, or part of an object, that can be made, in principle, by an interpretative algorithm, exists as an object in its own right. This argument applies it to minds, but I would clearly have to claim it applies to everything to avoid being some kind of weird dualist. It is therefore a somewhat more general view. Egan's cosmology requires a universe to exist to get scrambled up in different ways. With a view like this, you don't need to assume anything exists. While a lot of people would find this counter-intuitive, if you accept that interpretations that produce objects produce real objects, there is nothing stopping you producing an object by interpreting very little data, or no data at all. In this kind of view, even if you had nothing except logic, interpretation algorithms that could be applied in principle with no input - on nothing at all - would still describe objects, which this kind of cosmology would say would have to exist as abstractions of nothing. Further objects would exist that would be abstractions of these. In other words, if we take the view that every abstraction of any object physically exists as a definition of the idea of physical existence, it makes the existence of a physical reality mandatory.

"Of course, this leads to the problem of interpretation, which suggests to me that "information" and "algorithm" may be ill-defined concepts except in terms of one another. This is why I think I/O is important, because a mind may depend on a subjective environment to function."

and I simply take universal realizability at face value. That is my response to this kind of issue. It frees me totally from any concerns about consistency - and the use of measure even makes things statistically predictable.

Comment author: gjm 30 May 2009 11:40:53PM 3 points [-]

One picky remark: Paul Almond ascribes this argument to Searle, and indeed it appears in a work of Searle's from 1990; but Hilary Putnam published a clearer and more rigorous presentation of it, two years earlier, in his book "Representation and reality".

(Putnam also demolished the rather silly Goedelian argument against artificial intelligence that's commonly attributed to J R Lucas before Lucas even published it. Oh, and he was one of the key players in solving Hilbert's 10th problem. Quite a clever chap.)

Comment author: steven0461 29 May 2009 10:01:09PM 0 points [-]

That's his "objection 4", if I'm not mistaken. Complexity of interpretation comes in degrees. How much of the complexity needs to be in the interpretation and not in the computer, before you can say the algorithm isn't really being implemented?

Incidentally, I only linked to part 3 because it has links to part 1 and part 2. I should probably have made this clear.

Comment author: SilasBarta 29 May 2009 10:26:03PM *  0 points [-]

Objection 4 (and the response) treat it as an issue of the absolute length (or complexity) of the interpreter. It's not the same as my point, which is about that the interpreter must be continually expanded in order to map random data onto an algorithm's output. That's why I conclude you can distinguish them: some interpretations necessarily expand as time progresses, others don't.

Also, Almond's response frames it as a problem of how to say "Length L or greater is impermissible". He doesn't address the alternative of asking if the interpreter is longer than the algorithm it's finding, focusing instead on the absolute length.

Comment author: steven0461 29 May 2009 10:31:37PM *  0 points [-]

the alternative of asking if the interpreter is longer than the algorithm it's finding

still sounds to me like it involves an unreasonable discontinuous jump at a certain complexity level.

I haven't read these articles recently, by the way, so I'm not committing to defend their content. (I don't think of that as a sufficient reason not to have posted the links, but I may be wrong on that.)

Comment author: PaulUK 30 May 2009 02:21:27AM 1 point [-]

I hope it is okay for me to reply to all these. Right, yes, that is my position steven. When the interpreter algorithm length hits the length of the algorithm it is finding, nothing of any import happened. Would we seriously say, for example, that a mind corresponding to a 10^21 bit computer program would be fine, any enjoying a conscious existence, if it was "findable" by a 10^21 bit program, but would suddenly cease to exist if it was findable by only a 10^21+1 bit program? I would say, no. However, I can understand that that is always how people see it. For some reason, the point at which one algorithmic length exceeds the other is the point at which people think things are going too far.

Comment author: SilasBarta 30 May 2009 04:27:28AM 0 points [-]

Thanks for joining the discussion, PaulUK/Paul Almond. (I'll refer to you with the former.)

Would we seriously say, for example, that a mind corresponding to a 10^21 bit computer program would be fine, any enjoying a conscious existence, if it was "findable" by a 10^21 bit program, but would suddenly cease to exist if it was findable by only a 10^21+1 bit program? I would say, no.

Well, then I'm going to apply Occam's razor back onto this. If you require a 10^21+1 bit program to extract a known 10^21 bit program, we should prefer the explanation:

a) "You wrote a program one bit too long."

rather than,

b) "You found a naturally occurring instance of a 10^21 bit algorithm that just happens to need a 10^21+1 bit algorithm in order to map it to the known 10^21 bit algorithm."

See the problem?

The whole point of explaining a phenomenon as implementing an algorithm is that, given the phenomenon, we don't need to do the whole algorithm separately. What if I sold you a "computer" with the proviso that "you have to manually check each answer it gives you"?

Comment author: PaulUK 30 May 2009 04:54:26AM 4 points [-]

Either name is fine (since it is hardly a secret who I am here).

Yes, I see the problem, but this was very much in my mind when I wrote all this. I could have hardly missed the issue. I would have to accept it or deny it, and in fact I considered it a great deal. It is the first thing you would need to consider. I still maintain that there is nothing special about this algorithm length. I actually think your practical example of buying the computer, if anything counts against it. Suppose you sold me a computer and it "allegedly" ran a program 10^21 bits long, but I had to use another computer running a program that was (10^21)+1 bits long to analyze what it was doing and get any useful output. Would I want my money back? Of course I would. However, I would also want my money back if I needed a (10^21)-1 bit program to analyze the computer – and so would you. As a consumer, the thing would be practically useless anyway. In one case I am having to do all the computers job, and a tiny bit more, just to get any output. In the other case I am having to do a tiny bit less than the computer's job to get any output: it would hardly make a practical difference. There is no sudden point at which I would want my money back: I would want it back long before we got near 10^21 bits. Can you show that 10^21 bits is special? I would say that to have it as special you pretty much have to postulate it and I want to work with a minimum of postulates: it is my whole approach, though it causes some conclusions I hardly find comfortable.

You have mentioned Occam’s razor, but we may disagree on how it should be applied. What Occam originally said was probably too vague to help much in these matters, so we should go with what seems a reasonable “modernization” of Occam’s razor. I do not think Occam’s razor tells us to reduce the amount of stuff we accept. Rather, I think it tells us to reduce the amount of stuff we accept as intrinsically existing. I would not, for example, regarding Occam's razor as arguing against the many-worlds interpretation of quantum mechanics, as many people would. I would say that Occam's razor would argue against having some arbitrary wavefunction collapse mechanism if we need not assume one.

I would also say, as well, that this does not resolve the issue of combining computers and probability that I raised in the first article. My intention was to put a number of such issues together and show that we needing to do the sort of thing I said to get round difficult issues.