My Kind of Reflection

Eliezer Yudkowsky

In "Where Recursive Justification Hits Bottom", I concluded that it's okay to use induction to reason about the probability that induction will work in the future, given that it's worked in the past; or to use Occam's Razor to conclude that the simplest explanation for why Occam's Razor works is that the universe itself is fundamentally simple.

Now I am far from the first person to consider reflective application of reasoning principles. Chris Hibbert compared my view to Bartley's Pan-Critical Rationalism (I was wondering whether that would happen). So it seems worthwhile to state what I see as the distinguishing features of my view of reflection, which may or may not happen to be shared by any other philosopher's view of reflection.

• All of my philosophy here actually comes from trying to figure out how to build a self-modifying AI that applies its own reasoning principles to itself in the process of rewriting its own source code. So whenever I talk about using induction to license induction, I'm really thinking about an inductive AI considering a rewrite of the part of itself that performs induction. If you wouldn't want the AI to rewrite its source code to not use induction, your philosophy had better not label induction as unjustifiable.

• One of the most powerful general principles I know for AI in general, is that the true Way generally turns out to be naturalistic—which for reflective reasoning, means treating transistors inside the AI, just as if they were transistors found in the environment; not an ad-hoc special case. This is the real source of my insistence in "Recursive Justification" that questions like "How well does my version of Occam's Razor work?" should be considered just like an ordinary question—or at least an ordinary very deep question. I strongly suspect that a correctly built AI, in pondering modifications to the part of its source code that implements Occamian reasoning, will not have to do anything special as it ponders—in particular, it shouldn't have to make a special effort to avoid using Occamian reasoning.

• I don't think that "reflective coherence" or "reflective consistency" should be considered as a desideratum in itself. As I said in the Twelve Virtues and the Simple Truth, if you make five accurate maps of the same city, then the maps will necessarily be consistent with each other; but if you draw one map by fantasy and then make four copies, the five will be consistent but not accurate. In the same way, no one is deliberately pursuing reflective consistency, and reflective consistency is not a special warrant of trustworthiness; the goal is to win. But anyone who pursues the goal of winning, using their current notion of winning, and modifying their own source code, will end up reflectively consistent as a side effect—just like someone continually striving to improve their map of the world should find the parts becoming more consistent among themselves, as a side effect. If you put on your AI goggles, then the AI, rewriting its own source code, is not trying to make itself "reflectively consistent"—it is trying to optimize the expected utility of its source code, and it happens to be doing this using its current mind's anticipation of the consequences.

• One of the ways I license using induction and Occam's Razor to consider "induction" and "Occam's Razor", is by appealing to E. T. Jaynes's principle that we should always use all the information available to us (computing power permitting) in a calculation. If you think induction works, then you should use it in order to use your maximum power, including when you're thinking about induction.

• In general, I think it's valuable to distinguish a defensive posture where you're imagining how to justify your philosophy to a philosopher that questions you, from an aggressive posture where you're trying to get as close to the truth as possible. So it's not that being suspicious of Occam's Razor, but using your current mind and intelligence to inspect it, shows that you're being fair and defensible by questioning your foundational beliefs. Rather, the reason why you would inspect Occam's Razor is to see if you could improve your application of it, or if you're worried it might really be wrong. I tend to deprecate mere dutiful doubts.

• If you run around inspecting your foundations, I expect you to actually improve them, not just dutifully investigate. Our brains are built to assess "simplicity" in a certain intuitive way that makes Thor sound simpler than Maxwell's Equations as an explanation for lightning. But, having gotten a better look at the way the universe really works, we've concluded that differential equations (which few humans master) are actually simpler (in an information-theoretic sense) than heroic mythology (which is how most tribes explain the universe). This being the case, we've tried to import our notions of Occam's Razor into math as well.

• On the other hand, the improved foundations should still add up to normality; 2 + 2 should still end up equalling 4, not something new and amazing and exciting like "fish".

• I think it's very important to distinguish between the questions "Why does induction work?" and "Does induction work?" The reason why the universe itself is regular is still a mysterious question unto us, for now. Strange speculations here may be temporarily needful. But on the other hand, if you start claiming that the universe isn't actually regular, that the answer to "Does induction work?" is "No!", then you're wandering into 2 + 2 = 3 territory. You're trying too hard to make your philosophy interesting, instead of correct. An inductive AI asking what probability assignment to make on the next round is asking "Does induction work?", and this is the question that it may answer by inductive reasoning. If you ask "Why does induction work?" then answering "Because induction works" is circular logic, and answering "Because I believe induction works" is magical thinking.

• I don't think that going around in a loop of justifications through the meta-level is the same thing as circular logic. I think the notion of "circular logic" applies within the object level, and is something that is definitely bad and forbidden, on the object level. Forbidding reflective coherence doesn't sound like a good idea. But I haven't yet sat down and formalized the exact difference—my reflective theory is something I'm trying to work out, not something I have in hand.

Re. your last remark, wouldn't a distinction between premise-circularity and rule-circularity do the trick?

What do you consider evidence for the positive conclusion of "Does induction work?". I can think of lots of applications of induction, that don't work very well.

Lake, Occam's Razor could be described as a "premise" instead of a "rule", since it can be viewed as a Bayesian prior - so can many kinds of induction.

Will, just as we've refined our understanding of Occam's Razor, we've also refined our understanding of induction. In particular, we've observed that the fundamental laws appear to be absolutely stable, universal, and precise; and whenever we observe a seeming exception, it turns out that there's a deeper and absolutely universal fundamental rule. More perfectly stable rules on basic levels of organization, of course, take priority on induction over surface levels of organization. We now have a justification for why surface induction works imperfectly, in terms of the apparent perfect regularity of the fundamental level.

Perhaps I'm being dim, but a prior is a probability distribution, isn't it? Whereas Occam's Razor and induction aren't: they're rules for how to estimate prior probability. Or have I lost you somewhere?

In particular, we've observed that the fundamental laws appear to be absolutely stable, universal, and precise; and whenever we observe a seeming exception, it turns out that there's a deeper and absolutely universal fundamental rule.

[SNARK DELETED. Caledonian, I don't have time to edit your individual comments, keep it up and I'll start deleting them even if they contain meat. -- EY]

We've found lots of exceptions to our fundamental laws - but once we did that, we no longer considered them fundamental laws. Your statement is accurate only in a trivial sense, and that's if we make the extraordinary presumption that our beliefs are actually right. Experience teaches us that, at any given time, the majority of our beliefs will be wrong, and the only thing that makes even approximate correctness possible is precisely what we cannot apply to the universe as a whole.

Perhaps I'm being dim, but a prior is a probability distribution, isn't it? Whereas Occam's Razor and induction aren't: they're rules for how to estimate prior probability.

But we can think about probability that Occam's razor produces correct answers, this probability is a prior.

Our 'absolutely universal' laws can be shown to have predictive power over an infinitesimal speck of the cosmos. Our ability to observe even natural experiments in the rest of the universe is extremely limited. ... Experience teaches us that, at any given time, the majority of our beliefs will be wrong, and the only thing that makes even approximate correctness possible is precisely what we cannot apply to the universe as a whole.

Our ability to observe natural experiments even on Earth is extremely limited (e.g. we surely haven't seen most of elementary particles that can be produced here on Earth if one had sufficient energy). But what's the problem with that? Experience teaches us that most of our beliefs have limited domain of validity, rather than are wrong. Newtonian physics is not "wrong", it is still predictive and useful, even when we have got now better theories valid in larger set of situations.

Perhaps it's better to reformulate Eliezer's statement about universal laws something like that "all phenomena we encountered could be described by relatively simple set of laws; every newly discovered phenomenon makes the laws more precise, instead of totally dicarding them". I think this is not completely trivial statement, as I can imagine a world where the laws were as complicated as the phenomena themselves, thus a world where nothing was predictable.

Quote: An inductive AI asking what probability assignment to make on the next round is asking "Does induction work?", and this is the question that it may answer by inductive reasoning. If you ask "Why does induction work?" then answering "Because induction works" is circular logic, and answering "Because I believe induction works" is magical thinking.

My view (IMBW) is that the inductive AI is asking the different question "Is induction a good choice of strategy for this class of problem ?" Your follow-up question is "Why did you choose induction for that class of problem ?" and the answer is "Because induction has proved a good choice of strategy in other, similar classes of problem, or for a significant subset of problems attempted in this class".

Generalising, I suggest that self-optimising systems start on particulars and gradually become more general, rather than starting at generalities.

"if you make five accurate maps of the same city, then the maps will necessarily be consistent with each other; but if you draw one map by fantasy and then make four copies, the five will be consistent but not accurate. "

This reminds me of one of my major points about Aumann Agreement, namely, that in actuality, if two people have been trying for any very substantial amount of time to reach true beliefs they won't just agree after encountering one another and exchanging information, they will in most cases, to a very close approximation, agree BEFORE encountering one another. When you find someone who disagrees with you this is very strong evidence that either you or that other person or both HAVE NOT BEEN TRYING to reach true beliefs in the relevant domain. If you have not been trying, why should you start now by changing your belief? If they have not been trying and you are trying you should NOT change your beliefs in a manner that prevents you from being able to predict disagreement with them.

Example, I not only don't persist in disagreement with people about whether the sun is hot and ice is cold, I don't even enter into disagreements with people about these questions. When I think that gravity is due to a "force of attraction" and someone else thinks it's due to "curvature of space-time" it turns out, predictably, that upon reflection we agreed to a very close approximation before exchanging information. When I was in high school and believed that the singularity was centuries away and that I knew that cryonics wouldn't work it turned out, upon reflection, that I had not been trying to reach a realistic model of the future, but rather to reach a model that explained justified the behaviors of the people around me under a model of them as rational agents which I had arrived at by not trying to predict their behavior or statements but rather by trying to justify my beliefs that

a) I should 'respect' the people I encountered unless I observed on an individual level that a person wasn't 'worthy' of 'respect'. and b) I should only 'respect' people who I believed to be rational moral agents in something like a Kantian sense.

Those beliefs had been absorbed on the basis of argument from authority in the moral domain, which was accepted because I had been told to be skeptical of factual claims but not of moral claims (though I examined both my model of the world and my model of morality for internal consistency to a fairly high degree).

Thinking about your declaration "If you run around inspecting your foundations, I expect you to actually improve them", I now see that I've been using "PCR" to refer to the reasoning trick that Bartley introduced (use all the tools at your disposal to evaluate your foundational approaches) to make Pan-Critical Rationalism an improvement over Popper's Critical Rationalism. But, for Bartley, PCR was just a better foundation for the rest of Popper's epistemology, and you would replace that epistemology with something more sophisticated. For me, the point of emphasizing PCR is that you should want Bartley's trick as the unchangable foundation below everything else.

If an AI is going to inspect its foundations occasionally, and expect to be able to improve on them, you'd better program it to use all the tools at its disposal to evaluate the results before making changes. This rule seems more fundamental than guidelines on when to apply Occam, induction, or Bayes rule.

If Bartley's trick is the starting point, I don't know whether it would be necessary or useful to make that part of the code immutable. In terms of software simplicity, not having a core that follows different principals would be an improvement. But if there's any chance that the AI could back itself into a corner that would lead it to conclude that there were a better rule to decide what tools to rely on, everything might be lost. Hard-coding Bartley's trick might provide the only platform to stand on that would give the AI a way to rebuild after a catastrophe.

I now understand the reluctance to call the result PCR: it's not the whole edifice that Bartley (& Popper) constructed, you only use the foundation Bartley invented.

@michael vassar: When you find someone who disagrees with you this is very strong evidence that either you or that other person or both HAVE NOT BEEN TRYING to reach true beliefs in the relevant domain.

How about the case where you have been trying hard but simply went down the wrong way because of undetected reasoning errors?

Roland: I don't think that this is at all common, at least for highly intelligent people and important practical questions. Even for a machine as complex as a space shuttle Feynman was able to point out that the Challenger explosion was due to defects in the deliberative process being used, not simply due to honest mistakes. Deep mistakes, such as the ones that prevented me from really seriously orienting my efforts around transhumanist concerns at the age Eliezer did, are not, in my experience honest mistakes.

michael vassar:

I would agree with you if there were no cognitive biases, but alas, there are and I think they are one of the main causes why reasoning errors happen. In fact this is why this blog exists.

When I look at my past such reasoning errors abound and they are the result of a biased human mind. According to your definition I don't think those where "honest" mistakes, at the same time I think it is unfair to label them "dishonest". The biases simply reflect the way human minds work.

michael vassar:

I would agree with you if there were no cognitive biases, but alas, there are and I think they are one of the main causes why reasoning errors happen. In fact this is why this blog exists.

But I haven't yet sat down and formalized the exact difference - my reflective theory is something I'm trying to work out, not something I have in hand.

"The principle of induction is true" is a statement that cannot be justified. "You should use the principle of induction when thinking about the future" can be justified along the lines of Pascal's wager. Assuming that it works in a universe where it does in fact work, one will make predictions that are more accurate than predictions chosen at random. Assuming that it works in a universe where it doesn't work, one will not make predictions that are less accurate than predictions chosen at random. But I don't think you can construct a Pascal-style argument in favor of "you should use induction when thinking about induction." It would be interesting if you came up with something.

All of my philosophy here actually comes from trying to figure out how to build a self-modifying AI that applies its own reasoning principles to itself in the process of rewriting its own source code.

So it's not that being suspicious of Occam's Razor, but using your current mind and intelligence to inspect it, shows that you're being fair and defensible by questioning your foundational beliefs.

Eliezer, let's step back a moment and look at your approach to AI research. It looks to me like you are trying to first clarify your philosophy, and then you hope that the algorithms will follow from the philosophy. I have a PhD in philosophy and I've been doing AI research for many years. For me, it's a two-way street. My philosophy guides my AI research and my experiments with AI feed back into my philosophy.

I started my AI research with the belief that Occam's Razor is right. In a sense, I still believe it is right. But trying to implement Occam's Razor in code has changed my philosophy. The problem is taking the informal, intuitive, vague, contradictory concept of Occam's Razor that is in my mind and converting it into an algorithm that works in a computer. There are many different formalizations of Occam's Razor, and they don't all agree with each other. I now think that none of them are quite right.

I agree that introspection suggests that we use something like Occam's Razor when we think, and I agree that it is likely that evolution has shaped our minds so that our intuitive concept of Occam's Razor captures something about how the universe is structured. What I doubt is that any of our formalizations of Occam's Razor are correct. This is why I insist that any formalizations of Occam's Razor require experimental validation.

I am not "being suspicious of Occam's Razor" in order to be "fair and defensible by questioning [my] foundational beliefs". I am suspicious of formalizations of Occam's Razor because I doubt that they really capture how our minds work, so I would like to see evidence that these formalizations work. I am suspicious of informal thinking about Occam's Razor, because I have learned that introspection is misleading, and because my informal notion of Occam's Razor becomes fuzzier and fuzzier the longer I stare at it.

With respect to reflective decision theory: a few weeks ago I saw a talk by economist Jason Potts on the "economics of identity". Apparently there is a small literature now - Nobel laureate George Akerlof was mentioned - examining the effects of identity-dependent utility functions, where one's "identity" is something like "one's currently dominant self-concept". Jason described the existing work as static, and said he had a paper coming out which would introduce a dynamic account - I got the impression of something like Tom McCabe's self-referential agent.

michael vassar, I don't think "dishonest" is such a great choice to describe indoctrinated behavior.

Michael Vassar: instead of arguing about the meaning of "honest" or "dishonest", do you think it is possible for a person to know by introspection whether or not he has "really been trying" to get at the truth about something or not?

If it is, then people still shouldn't disagree: the one who knows that he hasn't been trying to get at the truth should just admit it, and accept the position of the other guy as more reasonable.

If it isn't, then your account does not supply an argument against Robin Hanson (which I take it you thought that it does.)

« I don't think that "reflective coherence" or "reflective consistency" should be considered as a desideratum in itself. » It is not a terminal value, but I do consider it to be still a very useful "intermediate value". The reason is that interacting with reality is often costly (in term of time, resources, energy, risks, ...) so doing an internal-check of consistency before going to experiment is a very useful heuristic. If your hypothesis/theory is not coherent or consistent with itself, it's very likely to not be true. If it's coherent, then it may be true or not, and you've to check with reality.

If a map of a city includes Escher-like always ascending staircase, I don't even need to go to the place to say "hey, there is a problem". If a designer claims to make a perpetual motion machine, I don't even need to build it to say "it won't work". So it would appear a good thing to add an initial AI, that it won't perform costly/dangerous checks of an hypothesis that just doesn't have reflective coherence.

I think Korzynski's concept of Order of Abstraction would be helpful here. Any evaluation occurs at some level of abstraction. An evaluation of the evaluation is a higher order abstraction. Believing that your abstraction applies to itself is just a confusion. Abstracting on an abstraction always creates a higher order abstraction.

But the question "Can I prove that induction/Occam's Razor works?" is different from both "Does it work?" and "Why does it work?", and we can easily find ourselves in the world where the first question ends up with an irreducible NO and the second one ends up with YES (as you said yourself, |- P and |- []P are different).

I'm trying to find the article in which Eliezer explains the last paragraph (strange loops through the meta level), i remember reading it, but now i can't find it. does anyone remember which one it is?

All of my philosophy here actually comes from trying to figure out how to build a self-modifying AI that applies its own reasoning principles to itself in the process of rewriting its own source code. So whenever I talk about using induction to license induction, I'm really thinking about an inductive AI considering a rewrite of the part of itself that performs induction. If you wouldn't want the AI to rewrite its source code to not use induction, your philosophy had better not label induction as unjustifiable.

This changes the way I see Rationality A-Z, no longer thinking it simply you communicating your philosophy when you made these posts, but actually using it in order to make something even greater than a post could ever be.

LESSWRONG
LW

LESSWRONG
LW

74

My Kind of Reflection

74

74