Realism about rationality

Richard_Ngo

LESSWRONG
LW

Realism about rationality — LessWrong

192 Realism about rationality

by Richard_Ngo

16th Sep 2018

AI Alignment ForumLinkpost from thinkingcomplete.blogspot.com

5 min read

147

192 Ω 36

Epistemic status: trying to vaguely gesture at vague intuitions. A similar idea was explored here under the heading "the intelligibility of intelligence", although I hadn't seen it before writing this post. As of 2020, I consider this follow-up comment to be a better summary of the thing I was trying to convey with this post than the post itself. The core disagreement is about how much we expect the limiting case of arbitrarily high intelligence to tell us about the AGIs whose behaviour we're worried about.

There’s a mindset which is common in the rationalist community, which I call “realism about rationality” (the name being intended as a parallel to moral realism). I feel like my skepticism about agent foundations research is closely tied to my skepticism about this mindset, and so in this essay I try to articulate what it is.

Humans ascribe properties to entities in the world in order to describe and predict them. Here are three such properties: "momentum", "evolutionary fitness", and "intelligence". These are all pretty useful properties for high-level reasoning in the fields of physics, biology and AI, respectively. There's a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn't just because biologists haven't figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated "function" which basically requires you to describe that organism's entire phenotype, genotype and environment.

In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It's a mindset which makes the following ideas seem natural:

The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general. (I don't count brute force approaches like AIXI for the same reason I don't consider physics a simple yet powerful description of biology).
The idea that there is an “ideal” decision theory.

The idea that AGI will very likely be an “agent”.
The idea that Turing machines and Kolmogorov complexity are foundational for epistemology.
The idea that, given certain evidence for a proposition, there's an "objective" level of subjective credence which you should assign to it, even under computational constraints.
The idea that Aumann's agreement theorem is relevant to humans.
The idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct.
The idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn’t depend very much on morally arbitrary factors.
The idea that having having contradictory preferences or beliefs is really bad, even when there’s no clear way that they’ll lead to bad consequences (and you’re very good at avoiding dutch books and money pumps and so on).

To be clear, I am neither claiming that realism about rationality makes people dogmatic about such ideas, nor claiming that they're all false. In fact, from a historical point of view I’m quite optimistic about using maths to describe things in general. But starting from that historical baseline, I’m inclined to adjust downwards on questions related to formalising intelligent thought, whereas rationality realism would endorse adjusting upwards. This essay is primarily intended to explain my position, not justify it, but one important consideration for me is that intelligence as implemented in humans and animals is very messy, and so are our concepts and inferences, and so is the closest replica we have so far (intelligence in neural networks). It's true that "messy" human intelligence is able to generalise to a wide variety of domains it hadn't evolved to deal with, which supports rationality realism, but analogously an animal can be evolutionarily fit in novel environments without implying that fitness is easily formalisable.

Another way of pointing at rationality realism: suppose we model humans as internally-consistent agents with beliefs and goals. This model is obviously flawed, but also predictively powerful on the level of our everyday lives. When we use this model to extrapolate much further (e.g. imagining a much smarter agent with the same beliefs and goals), or base morality on this model (e.g. preference utilitarianism, CEV), is that more like using Newtonian physics to approximate relativity (works well, breaks down in edge cases) or more like cavemen using their physics intuitions to reason about space (a fundamentally flawed approach)?

Another gesture towards the thing: a popular metaphor for Kahneman and Tversky's dual process theory is a rider trying to control an elephant. Implicit in this metaphor is the localisation of personal identity primarily in the system 2 rider. Imagine reversing that, so that the experience and behaviour you identify with are primarily driven by your system 1, with a system 2 that is mostly a Hansonian rationalisation engine on top (one which occasionally also does useful maths). Does this shift your intuitions about the ideas above, e.g. by making your CEV feel less well-defined? I claim that the latter perspective is just as sensible as the former, and perhaps even more so - see, for example, Paul Christiano's model of the mind, which leads him to conclude that "imagining conscious deliberation as fundamental, rather than a product and input to reflexes that actually drive behavior, seems likely to cause confusion."

These ideas have been stewing in my mind for a while, but the immediate trigger for this post was a conversation about morality which went along these lines:

R (me): Evolution gave us a jumble of intuitions, which might contradict when we extrapolate them. So it’s fine to accept that our moral preferences may contain some contradictions.

O (a friend): You can’t just accept a contradiction! It’s like saying “I have an intuition that 51 is prime, so I’ll just accept that as an axiom.”

R: Morality isn’t like maths. It’s more like having tastes in food, and then having preferences that the tastes have certain consistency properties - but if your tastes are strong enough, you might just ignore some of those preferences.

O: For me, my meta-level preferences about the ways to reason about ethics (e.g. that you shouldn’t allow contradictions) are so much stronger than my object-level preferences that this wouldn’t happen. Maybe you can ignore the fact that your preferences contain a contradiction, but if we scaled you up to be much more intelligent, running on a brain orders of magnitude larger, having such a contradiction would break your thought processes.

R: Actually, I think a much smarter agent could still be weirdly modular like humans are, and work in such a way that describing it as having “beliefs” is still a very lossy approximation. And it’s plausible that there’s no canonical way to “scale me up”.

I had a lot of difficulty in figuring out what I actually meant during that conversation, but I think a quick way to summarise the disagreement is that O is a rationality realist, and I’m not. This is not a problem, per se: I'm happy that some people are already working on AI safety from this mindset, and I can imagine becoming convinced that rationality realism is a more correct mindset than my own. But I think it's a distinction worth keeping in mind, because assumptions baked into underlying worldviews are often difficult to notice, and also because the rationality community has selection effects favouring this particular worldview even though it doesn't necessarily follow from the community's founding thesis (that humans can and should be more rational).

Law-ThinkingAIRationality

Curated

192 Ω 36

Mentioned in

144The Learning-Theoretic Agenda: Status 2023

1352018 Review: Voting Results!

126AI Alignment Metastrategy

85Plausible cases for HRAD work, and locating the crux in the "realism about rationality" debate

72Prizes for Last Year's 2018 Review

Load More (5/18)

Realism about rationality

1Alexander Gietelink Oldenziel

2Richard_Ngo

2Alexander Gietelink Oldenziel

More from Richard_Ngo

Curated and popular this week

147Comments

147

New Comment

147 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:54 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]abramdemski6y*Ω24710Review for 2018 Review

I didn't like this post. At the time, I didn't engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn't actually engage with the idea very much. So it seems like a good idea to say something now.

The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don't think it's my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it's a straw-man of the view it's trying to point at.

The main problem is the word "realism". It isn't clear exactly what it means, but I suspect that being really anti-realist about rationality would not shift my views about the importance of MIRI-style research that much.

I agree that there's something kind of like rationality realism. I just don't think this post successfully points at it.

Ricraz starts out with the list: momentum, evolutionary fitness, intelligence. He says that the question (of rationality realism) is whether fitness is more like momentum or more like fitness. Momentum is highly formalizable. Fitness is a useful a... (read more)

[-]Rohin Shah6y*Ω5132

ETA: The original version of this comment conflated "evolution" and "reproductive fitness", I've updated it now (see also my reply to Ben Pace's comment).

Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality.

MIRI in general and you in particular seem unusually (to me) confident that:

1. We can learn more than we already know about rationality of "ideal" agents (or perhaps arbitrary agents?).

2. This understanding will allow us to build AI systems that we understand better than the ones we build today.

3. We will be able to do this in time for it to affect real AI systems. (This could be either because it is unusually tractable and can be solved very quickly, or because timelines are very long.)

This is primarily based on what research you and MIRI do, some of MIRI's strategy writing, writing like the Rocket Alignment problem and law thinking, and an assumption that you are choosing to do this research because you think it is an effective way to reduce AI risk (given your skills).

(Another possibility is that you think that

... (read more)

[-]Ben Pace6yΩ5160

Huh? A lot of these points about evolution register to me as straightforwardly false. Understanding the theory of evolution moved us from "Why are there all these weird living things? Why do they exist? What is going on?" to "Each part of these organisms has been designed by a local hill-climbing process to maximise reproduction." If I looked into it, I expect I'd find out that early medicine found it very helpful to understand how the system was built. This is like me handing you a massive amount of code that has a bunch of weird outputs and telling you to make it work better and more efficiently, and the same thing but where I tell you what company made the code, why they made it, and how they made it, and loads of examples of other pieces of code they made in this fashion.

If I knew how to operationalise it I would take a pretty strong bet that the theory of natural selection has been revolutionary in the history of medicine.

3Rohin Shah6y

I don't know which particular points you mean. The only one that it sounds like you're arguing against is Were there others? I think the mathematical theory of natural selection + the theory of DNA / genes were probably very influential in both medicine and biology, because they make very precise predictions and the real world is a very good fit for the models they propose. (That is, they are "real", in the sense that "real" is meant in the OP.) I don't think that an improved mathematical understanding of what makes particular animals more fit has had that much of an impact on anything. Separately, I also think the general insight of "each part of these organisms has been designed by a local hill-climbing process to maximise reproduction" would not have been very influential in either medicine or biology, had it not been accompanied by the math (and assuming no one ever developed the math). On reflection, my original comment was quite unclear about this, I'll add a note to it to clarify. I do still stand by the thing that I meant in my original comment, which is that to the extent that you think rationality is like reproductive fitness (the claim made in the OP that Abram seems to agree with), where it is a very complicated mess of a function that we don't hope to capture in a simple equation; I don't think that improved understanding of that sort of thing has made much of an impact on our ability to do "big things" (as a proxy, things that affect normal people). Within evolution, the claim would be that there has not been much impact from gaining an improved mathematical understanding of the reproductive fitness of some organism, or the "reproductive fitness" of some meme for memetic evolution.

2DanielFilan6y

But surely you wouldn't get the mathematics of natural selection without the general insight, and so I think the general insight deserves to get a bunch of the credit. And both the mathematics of natural selection and the general insight seem pretty tied up to the notion of 'reproductive fitness'.

1Rohin Shah6y

Here is my understanding of what Abram thinks: Rationality is like "reproductive fitness", in that it is hard to formalize and turn into hard math. Regardless of how much theoretical progress we make on understanding rationality, it is never going to turn into something that can make very precise, accurate predictions about real systems. Nonetheless, qualitative understanding of rationality, of the sort that can make rough predictions about real systems, is useful for AI safety. Hopefully that makes it clear why I'm trying to imagine a counterfactual where the math was never developed. It's possible that I'm misunderstanding Abram and he actually thinks that we will be able to make precise, accurate predictions about real systems; but if that's the case I think he in fact is "realist about rationality" and this post is in fact pointing at a crux between him and Richard (or him and me), though not as well as he would like.

[-]abramdemski6yΩ4110

(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)

This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.

(I agree with 1, somewhat agree with 2, and don't agree with 3).

It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?

My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy).

I guess my position is something like this. I think it may be quite possible to make capabilities "blindly" -- basically the processing-power heavy type of AI progress (applying enough tricks so you're not lit... (read more)

5Rohin Shah6y

I think we disagree primarily on 2 (and also how doomy the default case is, but let's set that aside). I think that's a crux between you and me. I'm no longer sure if it's a crux between you and Richard. (ETA: I shouldn't call this a crux, I wouldn't change my mind on whether MIRI work is on-the-margin more valuable if I changed my mind on this, but it would be a pretty significant update.) Yeah, I was ignoring that sort of stuff. I do think this post would be better without the evolutionary fitness example because of this confusion. I was imagining the "unreal rationality" world to be similar to what Daniel mentions below: Yeah, I'm going to try to give a different explanation that doesn't involve "realness". When groups of humans try to build complicated stuff, they tend to do so using abstraction. The most complicated stuff is built on a tower of many abstractions, each sitting on top of lower-level abstractions. This is most evident (to me) in software development, where the abstraction hierarchy is staggeringly large, but it applies elsewhere, too: the low-level abstractions of mechanical engineering are "levers", "gears", "nails", etc. A pretty key requirement for abstractions to work is that they need to be as non-leaky as possible, so that you do not have to think about them as much. When I code in Python and I write "x + y", I can assume that the result will be the sum of the two values, and this is basically always right. Notably, I don't have to think about the machine code that deals with the fact that overflow might happen. When I write in C, I do have to think about overflow, but I don't have to think about how to implement addition at the bitwise level. This becomes even more important at the group level, because communication is expensive, slow, and low-bandwidth relative to thought, and so you need non-leaky abstractions so that you don't need to communicate all the caveats and intuitions that would accompany a leaky abstraction. One way to o

4abramdemski6y

I generally like the re-framing here, and agree with the proposed crux. I may try to reply more at the object level later.

2edoarad6y

Abram, did you reply to that crux somewhere?

9abramdemski6y

Just a quick reply to this part for now (but thanks for the extensive comment, I'll try to get to it at some point). It makes sense. My recent series on myopia also fits this theme. But I don't get much* push-back on these things. Some others seem even less realist than I am. I see myself as trying to carefully deconstruct my notions of "agency" into component parts that are less fake. I guess I do feel confused why other people seem less interested in directly deconstructing agency the way I am. I feel somewhat like others kind of nod along to distinctions like selection vs control but then go back to using a unitary notion of "optimization". (This applies to people at MIRI and also people outside MIRI.) *The one person who has given me push-back is Scott.

7DanielFilan6y

For what it's worth, I think I disagree with this even when "non-real" means "as real as the theory of liberalism". One example is companies - my understanding is that people have fake theories about how companies should be arranged, that these theories can be better or worse (and evaluated as so without looking at how their implementations turn out), that one can maybe learn these theories in business school, and that implementing them creates more valuable companies (at least in expectation). At the very least, my understanding is that providing management advice to companies in developing countries significantly raises their productivity, and found this study to support this half-baked memory. (next paragraph is super political, but it's important to my point) I live in what I honestly, straightforwardly believe is the greatest country in the world (where greatness doesn't exactly mean 'moral goodness' but does imply the ability to support moral goodness - think some combination of wealth and geo-strategic dominance), whose government was founded after a long series of discussions about how best to use the state to secure individual liberty. If I think about other wealthy countries, it seems to me that ones whose governments built upon this tradition of the interaction between liberty and governance are over-represented (e.g. Switzerland, Singapore, Hong Kong). The theory of liberalism wasn't complete or real enough to build a perfect government, or even a government reliable enough to keep to its founding principles (see complaints American constitutionalists have about how things are done today), but it was something that can be built upon. At any rate, I think it's the case that the things that can be built off of these fake theories aren't reliable enough to satisfy a strict Yudkowsky-style security mindset. But I do think it's possible to productively build off of them.

2Rohin Shah6y

On the model proposed in this comment, I think of these as examples of using things / abstractions / theories with imprecise predictions to reason about things that are "directly relevant". If I agreed with the political example (and while I wouldn't say that myself, it's within the realm of plausibility), I'd consider that a particularly impressive version of this.

2DanielFilan6y

I'm confused how my examples don't count as 'building on' the relevant theories - it sure seems like people reasoned in the relevant theories and then built things in the real world based on the results of that reasoning, and if that's true (and if the things in the real world actually successfully fulfilled their purpose), then I'd think that spending time and effort developing the relevant theories was worth it. This argument has some weak points (the US government is not highly reliable at preserving liberty, very few individual businesses are highly reliable at delivering their products, the theories of management and liberalism were informed by a lot of experimentation), but you seem to be pointing at something else.

6Rohin Shah6y

Agreed. I'd say they built things in the real world that were "one level above" their theories. Agreed. Agreed. Overall I think these relatively-imprecise theories let you build things "one level above", which I think your examples fit into. My claim is that it's very hard to use them to build things "2+ levels above". Separately, I claim that: * "real AGI systems" are "2+ levels above" the sorts of theories that MIRI works on. * MIRI's theories will always be the relatively-imprecise theories that can't scale to "2+ levels above". (All of this with weak confidence.) I think you disagree with the underlying model, but assuming you granted that, you would disagree with the second claim; I don't know what you'd think of the first.

4DanielFilan6y

OK, I think I understand you now. I think that I sort of agree if 'levels above' means levels of abstraction, where one system uses an abstraction of another and requires the mesa-system to satisfy some properties. In this case, the more layers of abstraction you have, the more requirements you're demanding which can independently break, which exponentially reduces the chance that you'll have no failure. But also, to the extent that your theory is mathematisable and comes with 'error bars', you have a shot at coming up with a theory of abstractions that is robust to failure of your base-level theory. So some transistors on my computer can fail, evidencing the imprecision of the simple theory of logic gates, but my computer can still work fine because the abstractions on top of logic gates accounted for some amount of failure of logic gates. Similarly, even if you have some uncorrelated failures of individual economic rationality, you can still potentially have a pretty good model of a market. I'd say that the lesson is that the more levels of abstraction you have to go up, the more difficult it is to make each level robust to failures of the previous level, and as such the more you'd prefer the initial levels be 'exact'. I'd say that they're some number of levels above (of abstraction) and also levels below (of implementation). So for an unrealistic example, if you develop logical induction decision theory, you have your theory of logical induction, then you depend on that theory to have your decision theory (first level of abstraction), and then you depend on your decision theory to have multiple LIDT agents behave well together (second level of abstraction). Separately, you need to actually implement your logical inductor by some machine learning algorithm (first level of implementation), which is going to depend on numpy and floating point arithmetic and such (second and third (?) levels of implementation), which depends on computing hardware and firmware (I d

2Rohin Shah6y

Yeah, I think this is the sense in which realism about rationality is an important disagreement. Yeah, I agree that this would make it easier to build multiple levels of abstractions "on top". I also would be surprised if mathematical theories of embedded rationality came with tight error bounds (where "tight" means "not so wide as to be useless"). For example, current theories of generalization in deep learning do not provide tight error bounds to my knowledge, except in special cases that don't apply to the main successes of deep learning. Agreed. I am basically only concerned about machine learning, when I say that you can't build on the theories. My understanding of MIRI's mainline story of impact is that they develop some theory that AI researchers use to change the way they do machine learning that leads to safe AI. This sounds to me like there are multiple levels of inference: "MIRI's theory" -> "machine learning" -> "AGI". This isn't exactly layers of abstraction, but I think the same principle applies, and this seems like too many layers. You could imagine other stories of impact, and I'd have other questions about those, e.g. if the story was "MIRI's theory will tell us how to build aligned AGI without machine learning", I'd be asking when the theory was going to include computational complexity.

4DanielFilan6y

I'm not sure what exactly you mean, but examples that come to mind: * Crops and domestic animals that have been artificially selected for various qualities. * The medical community encouraging people to not use antibiotics unnecessarily. * [Inheritance but not selection] The fact that your kids will probably turn out like you without specific intervention on your part to make that happen.

4Rohin Shah6y

I feel fairly confident this was done before we understood evolution. Also seems like a thing we knew before we understood evolution. That one seems plausible; though I'd want to know more about the history of how this came up. It also seems like the sort of thing that we'd have figured out even if we didn't understand evolution, though it would have taken longer, and would have involved more deaths. Going back to the AI case, my takeaway from this example is that understanding non-real things can still help if you need to get everything right the first time. And in fact, I do think that if you posit a discontinuity, such that we have to get everything right before that discontinuity, then the non-MIRI strategy looks worse because you can't gather as much empirical evidence (though I still wouldn't be convinced that the MIRI strategy is the right one).

2DanielFilan6y

Ah, I didn't quite realise you meant to talk about "human understanding of the theory of evolution" rather than evolution itself. I still suspect that the theory of evolution is so fundamental to our understanding of biology, and our understanding of biology so useful to humanity, that if human understanding of evolution doesn't contribute much to human welfare it's just because most applications deal with pretty long time-scales. (Also I don't get why this discussion is treating evolution as 'non-real': stuff like the Price equation seems pretty formal to me. To me it seems like a pretty mathematisable theory with some hard-to-specify inputs like fitness.)

4Rohin Shah6y

Yeah, I agree, see my edits to the original comment and also my reply to Ben. Abram's comment was talking about reproductive fitness the entire time and then suddenly switched to evolution at the end; I didn't notice this and kept thinking of evolution as reproductive fitness in my head, and then wrote a comment based on that where I used the word evolution despite thinking about reproductive fitness and the general idea of "there is a local hill-climbing search on reproductive fitness" while ignoring the hard math.

4Raemon6y

The most obvious thing is understanding why overuse of antibiotics might weaken the effect of antibiotics.

2Rohin Shah6y

See response to Daniel below; I find this one a little compelling (but not that much).

4Zack_M_Davis6y

Evolutionary psychology?

2Matthew Barnett6y

How does evolutionary psychology help us during our everyday life? We already know that people like having sex and that they execute all these sorts of weird social behaviors. Why does providing the ultimate explanation for our behavior provide more than a satisfaction of our curiosity?

2Rohin Shah6y

+1, it seems like some people with direct knowledge of evolutionary psychology get something out of it, but not everyone else.

2DanielFilan6y

Sorry, how is this not saying "people who don't know evo-psych don't get anything out of knowing evo-psych"?

[-]Richard_Ngo6yΩ6120

I like this review and think it was very helpful in understanding your (Abram's) perspective, as well as highlighting some flaws in the original post, and ways that I'd been unclear in communicating my intuitions. In the rest of my comment I'll try write a synthesis of my intentions for the original post with your comments; I'd be interested in the extent to which you agree or disagree.

We can distinguish between two ways to understand a concept X. For lack of better terminology, I'll call them "understanding how X functions" and "understanding the nature of X". I conflated these in the original post in a confusing way.

For example, I'd say that studying how fitness functions would involve looking into the ways in which different components are important for the fitness of existing organisms (e.g. internal organs; circulatory systems; etc). Sometimes you can generalise that knowledge to organisms that don't yet exist, or even prove things about those components (e.g. there's probably useful maths connecting graph theory with optimal nerve wiring), but it's still very grounded in concrete examples. If we thought that we ... (read more)

8abramdemski6y

So, yeah, one thing that's going on here is that I have recently been explicitly going in the other direction with partial agency, so obviously I somewhat agree. (Both with the object-level anti-realism about the limit of perfect rationality, and with the meta-level claim that agent foundations research may have a mistaken emphasis on this limit.) But I also strongly disagree in another way. For example, you lump logical induction into the camp of considering the limit of perfect rationality. And I can definitely see the reason. But from my perspective, the significant contribution of logical induction is absolutely about making rationality more bounded. * The whole idea of the logical uncertainty problem is to consider agents with limited computational resources. * Logical induction in particular involves a shift in perspective, where rationality is not an ideal you approach but rather directly about how you improve. Logical induction is about asymptotically approximating coherence in a particular way as opposed to other ways. So to a large extent I think my recent direction can be seen as continuing a theme already present -- perhaps you might say I'm trying to properly learn the lesson of logical induction. But is this theme isolated to logical induction, in contrast to earlier MIRI research? I think not fully: Embedded Agency ties everything together to a very large degree, and embeddedness is about this kind of boundedness to a large degree. So I think Agent Foundations is basically not about trying to take the limit of perfect rationality. Rather, we inherited this idea of perfect rationality from Bayesian decision theory, and Agent Foundations is about trying to break it down, approaching it with skepticism and trying to fit it more into the physical world. Reflective Oracles still involve infinite computing power, and logical induction still involves massive computing power, more or less because the approach is to start with idealized rationality and

2Richard_Ngo6y

I'll try respond properly later this week, but I like the point that embedded agency is about boundedness. Nevertheless, I think we probably disagree about how promising it is "to start with idealized rationality and try to drag it down to Earth rather than the other way around". If the starting point is incoherent, then this approach doesn't seem like it'll go far - if AIXI isn't useful to study, then probably AIXItl isn't either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl). I appreciate that this isn't an argument that I've made in a thorough or compelling way yet - I'm working on a post which does so.

6abramdemski6y

Hm. I already think the starting point of Bayesian decision theory (which is even "further up" than AIXI in how I am thinking about it) is fairly useful. * In a naive sort of way, people can handle uncertain gambles by choosing a quantity to treat as 'utility' (such as money), quantifying probabilities of outcomes, and taking expected values. This doesn't always serve very well (e.g. one might prefer Kelley betting), but it was kind of the starting point (probability theory getting its starting point from gambling games) and the idea seems like a useful decision-making mechanism in a lot of situations. * Perhaps more convincingly, probability theory seems extremely useful, both as a precise tool for statisticians and as a somewhat looser analogy for thinking about everyday life, cognitive biases, etc. AIXI adds to all this the idea of quantifying Occam's razor with algorithmic information theory, which seems to be a very fruitful idea. But I guess this is the sort of thing we're going to disagree on. As for AIXItl, I think it's sort of taking the wrong approach to "dragging things down to earth". Logical induction simultaneously makes things computable and solves a new set of interesting problems having to do with accomplishing that. AIXItl feels more like trying to stuff an uncomputable peg into a computable hole.

6Raemon6y

Hmm, I am interested in some debate between you and Daniel Filan (just naming someone who seemed to describe himself as endorsing rationality realism as a crux, although I'm not sure he qualifies as a "miri person")

8DanielFilan6y

* I believe in some form of rationality realism: that is, that there's a neat mathematical theory of ideal rationality that's in practice relevant for how to build rational agents and be rational. I expect there to be a theory of bounded rationality about as mathematically specifiable and neat as electromagnetism (which after all in the real world requires a bunch of materials science to tell you about the permittivity of things). * If I didn't believe the above, I'd be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my 'worldview' related to AI. * Searching for beliefs I hold for which 'rationality realism' is crucial by imagining what I'd conclude if I learned that 'rationality irrealism' was more right: * I'd be more interested in empirical understanding of deep learning and less interested in an understanding of learning theory. * I'd be less interested in probabilistic forecasting of things. * I'd want to find some higher-level thing that was more 'real'/mathematically characterisable, and study that instead. * I'd be less optimistic about the prospects for an 'ideal' decision and reasoning theory. * My research depends on the belief that rational agents in the real world are likely to have some kind of ordered internal structure that is comprehensible to people. This belief is informed by rationality realism but distinct from it.

2abramdemski6y

How critical is it that rationality is as real as electromagnetism, rather than as real as reproductive fitness? I think the latter seems much more plausible, but I also don't see why the distinction should be so cruxy. My suspicion is that Rationality Realism would have captured a crux much more closely if the line weren't "momentum vs reproductive fitness", but rather, "momentum vs the bystander effect" (ie, physics vs social psychology). Reproductive fitness implies something that's quite mathematizable, but with relatively "fake" models -- e.g., evolutionary models tend to assume perfectly separated generations, perfect mixing for breeding, etc. It would be absurd to model the full details of reality in an evolutionary model, although it's possible to get closer and closer. I think that's more the sort of thing I expect for theories of agency! I am curious why you expect electromagnetism-esque levels of mathematical modeling. Even AIXI describes a heavy dependence on programming language. Any theory of bounded rationality which doesn't ignore poly-time differences (ie, anything "closer to the ground" than logical induction) has to be hardware-dependent as well. What alternative world are you imagining, though?

2DanielFilan6y

Meta/summary: I think we're talking past each other, and hope that this comment clarifies things. I was thinking of the difference between the theory of electromagnetism vs the idea that there's a reproductive fitness function, but that it's very hard to realistically mathematise or actually determine what it is. The difference between the theory of electromagnetism and mathematical theories of population genetics (which are quite mathematisable but again deal with 'fake' models and inputs, and which I guess is more like what you mean?) is smaller, and if pressed I'm unsure which theory rationality will end up closer to. Separately, I feel weird having people ask me about why things are 'cruxy' when I didn't initially say that they were and without the context of an underlying disagreement that we're hashing out. Like, either there's some misunderstanding going on, or you're asking me to check all the consequences of a belief that I have compared to a different belief that I could have, which is hard for me to do. I confess to being quite troubled by AIXI's language-dependence and the difficulty in getting around it. I do hope that there are ways of mathematically specifying the amount of computation available to a system more precisely than "polynomial in some input", which should be some input to a good theory of bounded rationality. I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.

[-]abramdemski6y110

I was thinking of the difference between the theory of electromagnetism vs the idea that there's a reproductive fitness function, but that it's very hard to realistically mathematise or actually determine what it is. The difference between the theory of electromagnetism and mathematical theories of population genetics (which are quite mathematisable but again deal with 'fake' models and inputs, and which I guess is more like what you mean?) is smaller, and if pressed I'm unsure which theory rationality will end up closer to.

[Spoiler-boxing the following response not because it's a spoiler, but because I was typing a response as I was reading your message and the below became less relevant. The end of your message includes exactly the examples I was asking for (I think), but I didn't want to totally delete my thinking-out-loud in case it gave helpful evidence about my state.]

I'm having trouble here because yes, the theory of population genetics factors in heavily to what I said, but to me reproductive fitness functions (largely) inherit their realness from the role they play in population genetics. So the two comparisons you give seem not very

... (read more)

4DanielFilan6y

For what it's worth, from my perspective, two months ago I said I fell into a certain pattern of thinking, then raemon put me in the position of saying what that was a crux for, then I was asked to elaborate about why a specific facet of the distinction was cruxy, and also the pattern of thinking morphed into something more analogous to a proposition. So I'm happy to elaborate on consequences of 'rationality realism' in my mind (such as they are - the term seems vague enough that I'm a 'rationality realism' anti-realist and so don't want to lean too heavily on the concept) in order to further a discussion, but in the context of an exchange that was initially framed as a debate I'd like to be clear about what commitments I am and am not making. Anyway, glad to clarify that we have a big disagreement about how 'real' a theory of rationality should be, which probably resolves to a medium-sized disagreement about how 'real' rationality and/or its best theory actually is.

2Ben Pace6y

This is such an interesting use of a spoiler tags. I might try it myself sometime.

4DanielFilan6y

To answer the easy part of this question/remark, I don't work at MIRI and don't research agent foundations, so I think I shouldn't count as a "MIRI person", despite having good friends at MIRI and having interned there. (On a related note, it seems to me that the terminology "MIRI person"/"MIRI cluster" obscures intellectual positions and highlights social connections, which makes me wish that it was less prominent.)

2Raemon6y

I guess the main thing I want is an actual tally on "how many people definitively found this post to represent their crux", vs "how many people think that this represented other people's cruxes"

4Rohin Shah6y

If I believed realism about rationality, I'd be closer to buying what I see as the MIRI story for impact. It's hard to say whether I'd actually change my mind without knowing the details of what exactly I'm updating to.

4Vanessa Kosoy6y

I think that ricraz claims that it's impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the "momentum vs. fitness" comparison doesn't make sense to me. Specifically, a concept doesn't have to be crisply well-defined in order to use it in mathematical models. Even momentum, which is truly one of the "cripser" concepts in science, is no longer well-defined when spacetime is not asymptotically flat (which it isn't). Much less so are concepts such as "atom", "fitness" or "demand". Nevertheless, physicists, biologist and economists continue to successfully construct and apply mathematical models grounded in such fuzzy concepts. Although in some sense I also endorse the "strawman" that rationality is more like momentum than like fitness (at least some aspects of rationality).

5abramdemski6y

How so? Well, it's not entirely clear. First there is the "realism" claim, which might even be taken in contrast to mathematical abstraction; EG, "is IQ real, or is it just a mathematical abstraction"? But then it is clarified with the momentum vs fitness test, which makes it seem like the question is the degree to which accurate mathematical models can be made (where "accurate" means, at least in part, helpfulness in making real predictions). So the idea seems to be that there's a spectrum with physics at one extreme end. I'm not quite sure what goes at the other extreme end. Here's one possibility: * Physics * Chemistry * Biology * Psychology * Social Sciences * Humanities A problem I have is that (almost) everything on the spectrum is real. Tables and chairs are real, despite not coming with precise mathematical models. So (arguably) one could draw two separate axes, "realness" vs "mathematical modelability". Well, it's not clear exactly what that second axis should be. Anyway, to the extent that the question is about how mathematically modelable agency is, I do think it makes more sense to expect "reproductive fitness" levels rather than "momentum" levels. Hmm, actually, I guess there's a tricky interpretational issue here, which is what it means to model agency exactly. * On the one hand, I fully believe in Eliezer's idea of understanding rationality so precisely that you could make it out of pasta and rubber bands (or whatever). IE, at some point we will be able to build agents from the ground up. This could be seen as an entirely precise mathematical model of rationality. * But the important thing is a theoretical understanding sufficient to understand the behavior of rational agents in the abstract, such that you could predict in broad strokes what an agent would do before building and running it. This is a very different matter. I can see how Ricraz would read statements of the first type as suggesting very strong claims of the second type.

2Richard_Ngo6y

Yeah, I should have been much more careful before throwing around words like "real". See the long comment I just posted for more clarification, and in particular this paragraph:

0Vanessa Kosoy6y

It seems almost tautologically true that you can't accurately predict what an agent will do without actually running the agent. Because, any algorithm that accurately predicts an agent can itself be regarded as an instance of the same agent. What I expect the abstract theory of intelligence to do is something like producing a categorization of agents in terms of qualitative properties. Whether that's closer to "momentum" or "fitness", I'm not sure the question is even meaningful. I think the closest analogy is: abstract theory of intelligence is to AI engineering as complexity theory is to algorithmic design. Knowing the complexity class of a problem doesn't tell you the best practical way to solve it, but it does give you important hints. (For example, if the problem is of exponential time complexity then you can only expect to solve it either for small inputs or in some special cases, and average-case complexity tells you just whether these cases need to be very special or not. If the problem is in NC then you know that it's possible to gain a lot from parallelization. If the problem is in NP then at least you can test solutions, et cetera.) And also, abstract theory of alignment should be to AI safety as complexity theory is to cryptography. Once again, many practical considerations are not covered by the abstract theory, but the abstract theory does tell you what kind of guarantees you can expect and when. (For example, in cryptography we can (sort of) know that a certain protocol has theoretical guarantees, but there is engineering work finding a practical implementation and ensuring that the assumptions of the theory hold in the real system.)

6Shmi6y

That seems manifestly false. You can figure out whether an algorithm halts or not without being accidentally stuck in an infinite loop. You can look at the recursive Fibonacci algorithm and figure out what it would do without ever running it. So there is a clear distinction between analyzing an algorithm and executing it. If anything, one would know more about the agent by using the techniques from analysis of algorithms than the agent would ever know about themselves.

2TAG6y

In special cases, not in the general case.

2Vanessa Kosoy6y

Of course you can predict some properties of what an agent will do. In particular, I hope that we will eventually have AGI algorithms that satisfy provable safety guarantees. But, you can't make exact predictions. In fact, there probably is a mathematical law that limits how accurate predictions you can get. An optimization algorithm is, by definition, something that transforms computational resources into utility. So, if your prediction is so close to the real output that it has similar utility, then it means the way you produced this prediction involved the same product of "optimization power per unit of resources" by "amount of resources invested" (roughly speaking, I don't claim to already know the correct formalism for this). So you would need to either (i) run a similar algorithm with similar resources or (ii) run a dumber algorithm but with more resources or (iii) use less resources but an even smarter algorithm. So, if you want to accurately predict the output of a powerful optimization algorithm, your prediction algorithm would usually have to be either a powerful optimization algorithm in itself (cases i and iii) or prohibitively costly to run (case ii). The exception is cases when the optimization problem is easy, so a dumb algorithm can solve it without much resources (or a human can figure out the answer by emself).

4John_Maxwell6y

Eliezer is a fan of law thinking, right? Doesn't the law thinker position imply that intelligence can be characterized in a "lawful" way like momentum? As a non-MIRI cluster person, I think deconfusion is valuable (insofar as we're confused), but I'm skeptical of MIRI because they seem more confused than average to me.

2dxu6y

It depends on what you mean by "lawful". Right now, the word "lawful" in that sentence is ill-defined, in much the same way as the purported distinction between momentum and fitness. Moreover, most interpretations of the word I can think of describe concepts like reproductive fitness about as well as they do concepts like momentum, so it's not clear to me why "law thinking" is relevant in the first place--it seems as though it simply muddies the discussion by introducing additional concepts.

2John_Maxwell6y

In my experience, if there are several concepts that seem similar, understanding how they relate to one another usually helps with clarity rather than hurting.

2dxu6y

That depends on how strict your criteria are for evaluating “similarity”. Often concepts that intuitively evoke a similar “feel” can differ in important ways, or even fail to be talking about the same type of thing, much less the same thing. In any case, how do you feel law thinking (as characterized by Eliezer) relates to the momentum-fitness distinction (as characterized by ricraz)? It may turn out that those two concepts are in fact linked, but in such a case it would nonetheless be helpful to make the linking explicit.

2TAG3y

"Fundamentalism" would be a better term for the cluster of problems -- dogmatism, literalism and epistemic over-confidence.

[-]Vanessa Kosoy6yΩ16490Review for 2018 Review

In this essay, ricraz argues that we shouldn't expect a clean mathematical theory of rationality and intelligence to exist. I have debated em about this, and I continue to endorse more or less everything I said in that debate. Here I want to restate some of my (critical) position by building it from the ground up, instead of responding to ricraz point by point.

When should we expect a domain to be "clean" or "messy"? Let's look at everything we know about science. The "cleanest" domains are mathematics and fundamental physics. There, we have crisply defined concepts and elegant, parsimonious theories. We can then "move up the ladder" from fundamental to emergent phenomena, going through high energy physics, molecular physics, condensed matter physics, biology, geophysics / astrophysics, psychology, sociology, economics... On each level more "mess" appears. Why? Occam's razor tells us that we should prioritize simple theories over complex theories. But, we shouldn't expect a theory to be more simple than the specification of the domain. The general theory of planets should be simpler than a detailed description of planet Earth, the general theory of atomic matter should be simpler th

... (read more)

[-]Viliam7y480

I have an intuition that the "realism about rationality" approach will lead to success, even if it will have to be dramatically revised on the way.

To explain, imagine that centuries years ago there are two groups trying to find out how the planets move. Group A says: "Obviously, planets must move according to some simple mathematical rule. The simplest mathematical shape is a circle, therefore planets move in circles. All we have to do is find out the exact diameter of each circle." Group B says: "No, you guys underestimate the complexity of the real world. The planets, just like everything in nature, can only be approximated by a rule, but there are always exceptions and unpredictability. You will never find a simple mathematical model to describe the movement of the planets."

The people who finally find out how the planets move will be spiritual descendants of the group A. Even if on the way they will have to add epicycles, and then discard the idea of circles, which seems like total failure of the original group. The problem with the group B is that it has no energy to move forward.

The right moment to discard a simple model is when you have enough data to build a more complex model.

[-]Richard_Ngo7y500

The people who finally find out how the planets move will be spiritual descendants of the group A. ... The problem with the group B is that it has no energy to move forward.

In this particular example, it's true that group A was more correct. This is because planetary physics can be formalised relatively easily, and also because it's a field where you can only observe and not experiment. But imagine the same conversation between sociologists who are trying to find out what makes people happy, or between venture capitalists trying to find out what makes startups succeed. In those cases, Group B can move forward using the sort of "energy" that biologists and inventors and entrepreneurs have, driven an experimental and empirical mindset. Whereas Group A might spend a long time writing increasingly elegant equations which rely on unjustified simplifications.

Instinctively reasoning about intelligence using analogies from physics instead of the other domains I mentioned above is a very good example of rationality realism.

6jamii7y

Uncontrolled argues along similar lines - that the physics/chemistry model of science, where we get to generalize a compact universal theory from a number of small experiments, is simply not applicable to biology/psychology/sociology/economics and that policy-makers should instead rely more on widespread, continuous experiments in real environments to generate many localized partial theories. A prototypical argument is the paradox-of-choice jam experiment, which has since become solidified in pop psychology. But actual supermarkets run many 1000s of in-situ experiments and find that it actually depends on the product, the nature of the choices, the location of the supermarket, the time of year etc.

[-]Rob Bensinger7y200

Uncontrolled argues along similar lines - that the physics/chemistry model of science, where we get to generalize a compact universal theory from a number of small experiments, is simply not applicable to biology/psychology/sociology/economics and that policy-makers should instead rely more on widespread, continuous experiments in real environments to generate many localized partial theories.

I'll note that (non-extreme) versions of this position are consistent with ideas like "it's possible to build non-opaque AGI systems." The full answer to "how do birds work?" is incredibly complex, hard to formalize, and dependent on surprisingly detailed local conditions that need to be discovered empirically. But you don't need to understand much of that complexity at all to build flying machines with superavian speed or carrying capacity, or to come up with useful theory and metrics for evaluating "goodness of flying" for various practical purposes; and the resultant machines can be a lot simpler and more reliable than a bird, rather than being "different from birds but equally opaque in their own alien way".

This isn't meant to be a... (read more)

4binary_doge7y

"This is because planetary physics can be formalized relatively easily" - they can now, and could when they were, but not before. One can argue that we thought many "complex" and very "human" abilities could not be algroithmically emulated in the past, and recent advances in AI (with neural nets and all that) have proven otherwise. If a program can do/predict something, there is a set of mechanical rules that explain it. The set might not be as elegant as Newton's laws of motion, but it is still a set of equations nonetheless. The idea behind Villam's comment (I think) is that in the future someone might say, the same way you just did, that "We can formalize how happy people generally are in a given society because that's relatively easy, but what about something truly complex like what an individual might imagine if we read him a specific story?". In other words, I don't see the essential differentiation between biology and sociology questions and physics questions, that you try to point to. In the post itself you also talk about moral preference, and I tend to agree with you that some people just have very individually strongly valued axioms that might contradict themselves or others, but it doesn't in itself mean that questions about rationality differ from questions about, say, molecular biology, in the sense that they can be hypothetically answered to a satisfactory level of accuracy.

-2DragonGod7y

Group A was most successful in the field of computation, so I have high confidence that their approach would be successful in intelligence as well (especially in intelligence of artificial agents).

6drossbucket7y

This is the most compelling argument I've been able to think of too when I've tried before. Feynman has a nice analogue of it within physics in The Character of Physical Law: I don't think it goes through well in this case, for the reasons ricraz outlines in their reply. Group B already has plenty of energy to move forward, from taking our current qualitative understanding and trying to build more compelling explanatory models and find new experimental tests. It's Group A that seems rather mired in equations that don't easily connect. Edit: I see I wrote about something similar before, in a rather rambling way.

1TAG4y

That isn't analogous to rationalism versus the mainstream. The mainstream has already developed more complex models...it's rationalism that's saying, "no , just use Bayes for everything" (etc).

[-]abramdemski7y410

Rationality realism seems like a good thing to point out which might be a crux for a lot of people, but it doesn't seem to be a crux for me.

I don't think there's a true rationality out there in the world, or a true decision theory out there in the world, or even a true notion of intelligence out there in the world. I work on agent foundations because there's still something I'm confused about even after that, and furthermore, AI safety work seems fairly hopeless while still so radically confused about the-phenomena-which-we-use-intelligence-and-rationality-and-agency-and-decision-theory-to-describe. And, as you say, "from a historical point of view I’m quite optimistic about using maths to describe things in general".

[-]romeostevensit7y340

I really like the compression "There's no canonical way to scale me up."

I think it captures a lot of the important intuitions here.

1Ben Pace7y

[-]Ben Pace7y240

I think I want to split up ricraz's examples in the post into two subclasses, defined by two questions.

The first asks, given that there are many different AGI architectures one could scale up into, are some better than others? (My intuition is both that there are better ones than others, and also that there are many who are on the pareto frontier.) And is there any sort of simple ways to determine about why one is better than another? This leads to saying the following examples from the OP:

There is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general; there is an “ideal” decision theory; the idea that AGI will very likely be an “agent”; the idea that Turing machines and Kolmogorov complexity are foundational for epistemology; the idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct.

The second asks - suppose that some architectures are better than others, and suppose there are some simple explanations about why some are better than others. How practical is it to talk of me in this way today? Here's some concrete examples of things I might do:

Given cert

... (read more)

[-]Benquo7y230

When one person says "I guess we'll have to agree to disagree" and the second person says "Actually according to Aumann's Agreement Theorem, we can't" is the second person making a type error?

Making a type error is not easy to distinguish from attempting to shift frame. (If it were, the frame control wouldn't be very effective.) In the example Eliezer gave from the sequences, he was shifting frame from one that implicitly acknowledges interpretive labor as a cost, to one that demands unlimited amounts of interpretive labor by assuming that we're all perfect Bayesians (and therefore have unlimited computational ability, memory, etc).

This is a big part of the dynamic underlying mistake vs conflict theory.

[-]Benquo7y160

Eliezer's behavior in the story you're alluding to only seems "rational" insofar as we think the other side ends up with a better opinion - I can easily imagine a structurally identical interaction where the protagonist manipulates someone into giving up on a genuine but hard to articulate objection, or proceeding down a conversational path they're ill-equipped to navigate, thus "closing the sale."

[-]gjm7y120

It's not at all clear that improving the other person's opinion was really one of Eliezer's goals on this occasion, as opposed to showing up the other person's intellectual inferiority. He called the post "Bayesian Judo", and highlighted how his showing-off impressed someone of the opposite sex.

He does also suggest that in the end he and the other person came to some sort of agreement -- but it seems pretty clear that the thing they agreed on had little to do with the claim the other guy had originally been making, and that the other guy's opinion on that didn't actually change. So I think an accurate, though arguably unkind, summary of "Bayesian Judo" goes like this: "I was at a party, I got into an argument with a religious guy who didn't believe AI was possible, I overwhelmed him with my superior knowledge and intelligence, he submitted to my manifest superiority, and the whole performance impressed a woman". On this occasion, helping the other party to have better opinions doesn't seem to have been a high priority.

[-]Said Achmiz7y220

When one person says “I guess we’ll have to agree to disagree” and the second person says “Actually according to Aumann’s Agreement Theorem, we can’t” is the second person making a type error?

Note: I confess to being a bit surprised that you picked this example. I’m not quite sure whether you picked a bad example for your point (possible) or whether I’m misunderstanding your point (also possible), but I do think that this question is interesting all on its own, so I’m going to try and answer it.

Here’s a joke that you’ve surely heard before—or have you?

Three mathematicians walk into a bar. The bartender asks them, “Do you all want a beer?”
The first mathematician says, “I don’t know.”
The second mathematician says, “I don’t know.”
The third mathematician says, “I don’t know.”

The lesson of this joke applies to the “according to Aumann’s Agreement Theorem …” case.

When someone says “I guess we’ll have to agree to disagree” and their interlocutor responds with “Actually according to Aumann’s Agreement Theorem, we can’t”, I don’t know if I’d call this a “type error”, precisely (maybe it is; I’d have to think about it more carefully); but the second person is certainly being ridiculou

... (read more)

[-]Ben Pace7y170

Firstly, I hadn't heard the joke before, and it made me chuckle to myself.

Secondly, I loved this comment, for very accurately conveying the perspective I felt like ricraz was trying to defend wrt realism about rationality.

Let me say two (more) things in response:

Firstly, I was taking the example directly from Eliezer.

I said, "So if I make an Artificial Intelligence that, without being deliberately preprogrammed with any sort of script, starts talking about an emotional life that sounds like ours, that means your religion is wrong."

He said, "Well, um, I guess we may have to agree to disagree on this."

I said: "No, we can't, actually. There's a theorem of rationality called Aumann's Agreement Theorem which shows that no two rationalists can agree to disagree. If two people disagree with each other, at least one of them must be doing something wrong."

(Sidenote: I have not yet become sufficiently un-confused about AAT to have a definite opinion about whether EY was using it correctly there. I do expect after further reflection to object to most rationalist uses of the AAT but not this particular one.)

Secondly, and where I think the crux ... (read more)

[-]Said Achmiz7y150

Well, I guess you probably won’t be surprised to hear that I’m very familiar with that particular post of Eliezer’s, and instantly thought of it when I read your example. So, consider my commentary with that in mind!

(Sidenote: I have not yet become sufficiently un-confused about AAT to have a definite opinion about whether EY was using it correctly there. I do expect after further reflection to object to most rationalist uses of the AAT but not this particular one.)

Well, whether Eliezer was using the AAT correctly rather depends on what he meant by “rationalist”. Was he using it as a synonym for “perfect Bayesian reasoner”? (Not an implausible reading, given his insistence elsewhere on the term “aspiring rationalist” for mere mortals like us, and, indeed, like himself.) If so, then certainly what he said about the Theorem was true… but then, of course, it would be wholly inappropriate to apply it in the actual case at hand (especially since his interlocutor was, I surmise, some sort of religious person, and plausibly not even an aspiring rationalist).

If, instead, Eliezer was using “rationalist” to refer to mere actual humans of today, such as himself and the fellow he was conve

... (read more)

0Anon User4y

Actually, there is a logical error in your mathematicians joke - at least compared to how this joke normally goes. When it's their turn, the 3rd mathematician knows that the first two wanted a beer (otherwise they would have said "yes"), and so can say Yes/No. https://www.beingamathematician.org/Jokes/445-three-logicians-walk-into-a-bar.png

3Said Achmiz4y

You have entirely missed the point I was making in that comment. Of course I am aware of the standard form of the joke. I presented my modified form of the joke in the linked comment, as a deliberate contrast with the standard form, to illustrate the point I was making.

2c0rw1n7y

* Aumann's agreement theorem says that two people acting rationally (in a certain precise sense) and with common knowledge of each other's beliefs cannot agree to disagree. More specifically, if two people are genuine Bayesian rationalists with common priors, and if they each have common knowledge of their individual posterior probabilities, then their posteriors must be equal. With common priors. This is what does all the work there! If the disagreeers have non-equal priors on one of the points, then of course they'll have different posteriors. Of course applying Bayes' Theorem with the same inputs is going to give the same outputs, that's not even a theorem, that's an equals sign. If the disagreeers find a different set of parameters to be relevant, and/or the parameters they both find relevant do not have the same values, the outputs will differ, and they will continue to disagree.

[-]Benquo7y220

Relevant: Why Common Priors

[-]Wei Dai7y330

Just want to note that I've been pushing for (what I think is) a proper amount of uncertainty about "realism about rationality" for a long time. Here's a collection of quotes from just my top-level posts, arguing against various items in your list:

Is this realistic for human rationalist wannabes? It seems wildly implausible to me that two humans can communicate all of the information they have that is relevant to the truth of some statement just by repeatedly exchanging degrees of belief about it, except in very simple situations. You need to know the other agent’s information partition exactly in order to narrow down which element of the information partition he is in from his probability declaration, and he needs to know that you know so that he can deduce what inference you’re making, in order to continue to the next step, and so on. One error in this process and the whole thing falls apart. It seems much easier to just tell each other what information the two of you have directly.
Finally, I now see that until the exchange of information completes and common knowledge/agreement is actually achieved, it’s rational for even honest truth-seekers who share common priors to disagre

... (read more)

[-]Vanessa Kosoy7yΩ9320

Although I don't necessarily subscribe to the precise set of claims characterized as "realism about rationality", I do think this broad mindset is mostly correct, and the objections outlined in this essay are mostly wrong.

There’s a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn’t just because biologists haven’t figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated “function” which basically requires you to describe that organism’s entire phenotype, genotype and environment.

This seems entirely wrong to me. Evolution definitely should be studied using mathematical models, and although I am not an expert in that, AFAIK this approach is fairly standard. "Fitness" just refers to the expected behavior of the number of descendants of a given organism or gene. Therefore, it is perfectly defin

... (read more)

[-]cousin_it7yΩ4110

It is true that human and animal intelligence is “messy” in the sense that brains are complex and many of the fine details of their behavior are artifacts of either fine details in limitations of biological computational hardware, or fine details in the natural environment, or plain evolutionary accidents. However, this does not mean that it is impossible to speak of a relatively simple abstract theory of intelligence.

I used to think the same way, but the OP made me have a crisis of faith, and now I think the opposite way.

Sure, an animal brain solving an animal problem is messy. But a general purpose computer solving a simple mathematical problem can be just as messy. The algorithm for multiplying matrices in O(n^2.8) is more complex than the algorithm for doing it in O(n^3), and the algorithm with O(n^2.4) is way more complex than that. As I said in the other comment, "algorithms don't get simpler as they get better".

[-]Vanessa Kosoy7yΩ5180

I don't know a lot about the study of matrix multiplication complexity, but I think that one of the following two possibilities is likely to be true:

There is some $ω \in R$ and an algorithm for matrix multiplication of complexity $O (n^{ω + ϵ})$ for any $ϵ > 0$ s.t. no algorithm of complexity $O (n^{ω - ϵ})$ exists (AFAIK, the prevailing conjecture is $ω = 2$ ). This algorithm is simple enough for human mathematicians to find it, understand it and analyze its computational complexity. Moreover, there is a mathematical proof of its optimality that is simple enough for human mathematicians to find and understand.
There is a progression of algorithms for lower and lower exponents that increases in description complexity without bound as the exponent approaches $ω$ from above, and the problem of computing a program with given exponent is computationally intractable or even uncomputable. This fact has a mathematical proof that is simple enough for human mathematicians to find and understand.

Moreover, if we only care about having a polynomial time algorithm with some exponent then the solution is simple (and doesn't require any astronomical coefficients like Levin search; incidentally, the $O (n^{3})$ algorithm is a

... (read more)

-1cousin_it7y

A good algorithm can be easy to find, but not simple in the other senses of the word. Machine learning can output an algorithm that seems to perform well, but has a long description and is hard to prove stuff about. The same is true for human intelligence. So we might not be able to find an algorithm that's as strong as human intelligence but easier to prove stuff about.

[-]Vanessa Kosoy7yΩ5160

Machine learning uses data samples about an unknown phenomenon to extrapolate and predict the phenomenon in new instances. Such algorithms can have provable guarantees regarding the quality of the generalization: this is exactly what computational learning theory is about. Deep learning is currently poorly understood, but this seems more like a result of how young the field is, rather than some inherent mysteriousness of neural networks. And even so, there is already some progress. People have been making buildings and cannons before Newtonian mechanics, engines before thermodynamics and ways of using chemical reactions before quantum mechanics or modern atomic theory. The fact you can do something using trial and error doesn't mean trial and error is the only way to do it.

1cousin_it7y

I think "inherent mysteriousness" is also possible. Some complex things are intractable to prove stuff about.

2DragonGod7y

I don't see why better algorithms being more complex is a problem?

3DragonGod7y

I disagree that intelligence and rationality are more fundamental than physics; the territory itself is physics, and that is all that is really there. Everything else (including the body of our phone knowledge) are models for navigating that territory. Turing formalised computation and established the limits of computation given certain assumptions. However, those limits only apply as long as the assumptions are true. Turing did not prove that no mechanical system is superior to a Universal Turing Machine, and weird physics may enable super Turing computation. The point I was making is that our models are only as good as their correlation with the territory. The abstract models we have aren't part of the territory itself.

[-]Vanessa Kosoy7yΩ4110

Physics is not the territory, physics is (quite explicitly) the models we have of the territory. Rationality consists of the rules for formulating these models, and in this sense it is prior to physics and more fundumental. (This might be a disagreement over use of words. If by "physics" you, by definition, refer to the territory, then it seems to miss my point about Occam's razor. Occam's razor says that the map should be parsimonious, not the territory: the latter would be a type error.) In fact, we can adopt the view that Solomonoff induction (which is a model of rationality) is the ultimate physical law: it is a mathematical rule of making predictions that generates all the other rules we can come up with. Such a point of view, although in some sense justified, at present would be impractical: this is because we know how to compute using actual physical models (including running computer simulations), but not so much using models of rationality. But this is just another way of saying we haven't constructed AGI yet.

I don't think it's meaningful to say that "weird physics may enable super Turing computation." Hypercomputation is just a... (read more)

6cousin_it7y

If your mind was computable but the external world had lots of seeming hypercomputation (e.g. boxes for solving the halting problem were sold on every corner and were apparently infallible), would you prefer to build an AI that used a prior over hypercomputable worlds, or an AI that used Solomonoff induction because it's the ultimate physical law?

9Vanessa Kosoy7y

What does it mean to have a box for solving the halting problem? How do you know it really solves the halting problem? There are some computable tests we can think of, but they would be incomplete, and you would only verify that the box satisfies those computable tests, not that is "really" a hypercomputer. There would be a lot of possible boxes that don't solve the halting problem that pass the same computable tests. If there is some powerful computational hardware available, I would want the AI the use that hardware. If you imagine the hardware as being hypercomputers, then you can think of such an AI as having a "prior over hypercomputable worlds". But you can alternatively think of it as reasoning using computable hypotheses about the correspondence between the output of this hardware and the output of its sensors. The latter point of view is better, I think, because you can never know the hardware is really a hypercomputer.

5cousin_it7y

Hmm, that approach might be ruling out not only hypercomputers, but also sufficiently powerful conventional computers (anything stronger than PSPACE maybe?) because your mind isn't large enough to verify their strength. Is that right?

5Vanessa Kosoy7y

In some sense, yes, although for conventional computers you might settle on very slow verification. Unless you mean that, your mind has only finite memory/lifespan and therefore you cannot verify an arbitrary conventional computer within any given credence, which is also true. Under favorable conditions, you can quickly verify something in PSPACE (using interactive proof protocols), and given extra assumptions you might be able to do better (if you have two provers that cannot communicate you can do NEXP, or if you have a computer whose memory you can reliably delete you can do an EXP-complete language), however it is not clear whether you can be justifiably highly certain of such extra assumptions. See also my reply to lbThingrb.

2lbThingrb7y

This can’t be right ... Turing machines are assumed to be able to operate for unbounded time, using unbounded memory, without breaking down or making errors. Even finite automata can have any number of states and operate on inputs of unbounded size. By your logic, human minds shouldn’t be modeling physical systems using such automata, since they exceed the capabilities of our brains. It’s not that hard to imagine hypothetical experimental evidence that would make it reasonable to believe that hypercomputers could exist. For example, suppose someone demonstrated a physical system that somehow emulated a universal Turing machine with infinite tape, using only finite matter and energy, and that this system could somehow run the emulation at an accelerating rate, such that it computed n steps in ∑nk=112k seconds. (Let’s just say that it resets to its initial state in a poof of pixie dust if the TM doesn’t halt after one second.) You could try to reproduce this experiment and test it on various programs whose long-term behavior is predictable, but you could only test it on a finite (to say nothing of computable) set of such inputs. Still, if no one could come up with a test that stumped it, it would be reasonable to conclude that it worked as advertised. (Of course, some alternative explanation would be more plausible at first, given that the device as described would contradict well established physical principles, but eventually the weight of evidence would compel one to rewrite physics instead.) One could hypothesize that the device only behaved as advertised on inputs for which human brains have the resources to verify the correctness of its answers, but did something else on other inputs, but you could just as well say that about a normal computer. There’d be no reason to believe such an alternative model, unless it was somehow more parsimonious. I don’t know any reason to think that theories that don’t posit uncomputable behavior can always be found which are at

8Vanessa Kosoy7y

It is true that a human brain is more precisely described as a finite automaton than a Turing machine. And if we take finite lifespan into account, then it's not even a finite automaton. However, these abstractions are useful models since they become accurate in certain asymptotic limits that are sufficiently useful to describe reality. On the other hand, I doubt that there is a useful approximation in which the brain is a hypercomputer (except maybe some weak forms of hypercomputation like non-uniform computation / circuit complexity). Moreover, one should distinguish between different senses in which we can be "modeling" something. The first sense is the core, unconscious ability of the brain to generate models, and in particular that which we experience as intuition. This ability can (IMO) be thought of as some kind of machine learning algorithm, and, I doubt that hypercomputation is relevant there in any way. The second sense is the "modeling" we do by manipulating linguistic (symbolic) constructs in our conscious mind. These constructs might be formulas in some mathematical theory, including formulas that represent claims about uncomputable objects. However, these symbolic manipulations are just another computable process, and it is only the results of these manipulations that we use to generate predictions and/or test models, since this is the only access we have to those uncomputable objects. Regarding your hypothetical device, I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC? (In particular, the latter could tell you that some Turing machine halts when it "really" doesn't, because in the model it halts after some non-standard number of computing steps.) More generally, given an uncomputable function h and a system under test f, there is no sequence of computable tests that will allow you to form some credence about the hypothesis f=h s.t. thi

3lbThingrb7y

I didn't mean to suggest that the possibility of hypercomputers should be taken seriously as a physical hypothesis, or at least, any more seriously than time machines, perpetual motion machines, faster-than-light, etc. And I think it's similarly irrelevant to the study of intelligence, machine or human. But in my thought experiment, the way I imagined it working was that, whenever the device's universal-Turing-machine emulator halted, you could then examine its internal state as thoroughly as you liked, to make sure everything was consistent with the hypothesis that it worked as specified (and the non-halting case could be ascertained by the presence of pixie dust 🙂). But since its memory contents upon halting could be arbitrarily large, in practice you wouldn't be able to examine it fully even for individual computations of sufficient complexity. Still, if you did enough consistency checks on enough different kinds of computations, and the cleverest scientists couldn't come up with a test that the machine didn't pass, I think believing that the machine was a true halting-problem oracle would be empirically justified. It's true that a black box oracle could output a nonstandard "counterfeit" halting function which claimed that some actually non-halting TMs do halt, only for TMs that can't be proved to halt within ZFC or any other plausible axiomatic foundation humans ever come up with, in which case we would never know that it was lying to us. It would be trickier for the device I described to pull off such a deception, because it would have to actually halt and show us its output in such cases. For example, if it claimed that some actually non-halting TM M halted, we could feed it a program that emulated M and output the number of steps M took to halt. That program would also have to halt, and output some specific number n. In principle, we could then try emulating M for n steps on a regular computer, observe that M hadn't reached a halting state, and conclude th

3Vanessa Kosoy7y

Nearly everything you said here was already addressed in my previous comment. Perhaps I didn't explain myself clearly? I wrote before that "I wonder how would you tell whether it is the hypercomputer you imagine it to be, versus the realization of the same hypercomputer in some non-standard model of ZFC?" So, the realization of a particular hypercomputer in a non-standard model of ZFC would pass all of your tests. You could examine its internal state or its output any way you like (i.e. ask any question that can be formulated in the language of ZFC) and everything you see would be consistent with ZFC. The number of steps for a machine that shouldn't halt would be a non-standard number, so it would not fit on any finite storage. You could examine some finite subset of its digits (either from the end or from the beginning), for example, but that would not tell you the number is non-standard. For any question of the form "is n larger than some known number n0?" the answer would always be "yes". Once again, there is a difference of principle. I wrote before that: "...given an uncomputable function h and a system under test f, there is no sequence of computable tests that will allow you to form some credence about the hypothesis f=h s.t. this credence will converge to 1 when the hypothesis is true and 0 when the hypothesis is false. (This can be made an actual theorem.) This is different from the situation with normal computers (i.e. computable h) when you can devise such a sequence of tests." So, with normal computers you can become increasingly certain your hypothesis regarding the computer is true (even if you never become literally 100% certain, except in the limit), whereas with a hypercomputer you cannot. Yes, I already wrote that: "Although you can in principle have a class of uncomputable hypotheses s.t. you can asymptotically verify f is in the class, for example the class of all functions h s.t. it is consistent with ZFC that h is the halting function. But

1TAG6y

People tend to use the word physics in both the map and the territory sense. That would follow if testing a theory consisted of solely running a simulation in your head, but that is not how physics, the science, works. If the universe was hypercomputational, that would manifest as failures of computatable physics. Note that you only need to run computable physics to generate predictions that are then falsified. If true, that is a form of neo-Kantian idealism. Is that what you really wanted to say?

2Vanessa Kosoy6y

Well, it would manifest as a failure to create a complete and deterministic theory of computable physics. If your physics doesn't describe absolutely everything, hypercomputation can hide in places it doesn't describe. If your physics is stochastic (like quantum mechanics for example) then the random bits can secretly follow a hypercomputable pattern. Sort of "hypercomputer of the gaps". Like I wrote before, there actually can be situations in which we gradually become confident that something is a hypercomputer (although certainty would grow very slowly), but we will never know precisely what kind of hypercomputer it is. Unfortunately I am not sufficiently versed in philosophy to say. I do not make any strong claims to novelty or originality.

1Chris Hibbert7y

This reminds me of my rephrasing of the description of epistemology. The standard description started out as "the science of knowledge" or colloquially, "how do we know what we know". I've maintained, since reading Bartley ("The Retreat to Commitment"), that the right description is "How do we decide what to believe?" So your final sentence seems right to me, but that's different from the rest of your argument, which presumes that there's a "right" answer and our job is finding it. Our job is finding a decision procedure, and studying what differentiates "right" answers from "wrong" answers is useful fodder for that, but it's not the actual goal.

1Richard_Ngo7y

Similarly, you can define intelligence as expected performance on a broad suite of tasks. However, what I was trying to get at with "define its fitness in terms of more basic traits" is being able to build a model of how it can or should actually work, not just specify measurement criteria. I do consider computational learning theory to be evidence for rationality realism. However, I think it's an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents - to my knowledge it hasn't played an important role in the success of deep learning, for instance. It may be analogous to mathematical models of evolution, which are certainly true but don't help you build better birds. This feels more like a restatement of our disagreement than an argument. I do feel some of the force of this intuition, but I can also picture a world in which it's not the case. Note that most of the reasoning humans do is not math-like, but rather a sort of intuitive inference where we draw links between different vague concepts and recognise useful patterns - something we're nowhere near able to formalise. I plan to write a follow-up post which describes my reasons for being skeptical about rationality realism in more detail. I agree, but it's plausible that they are much less well-defined than they seem. The more we learn about neuroscience, the more the illusion of a unified self with coherent desires breaks down. There may be questions which we all agree are very morally important, but where most of us have ill-defined preferences such that our responses depend on the framing of the problem (e.g. the repugnant conclusion).

[-]Vanessa Kosoy7yΩ6170

...what I was trying to get at with “define its fitness in terms of more basic traits” is being able to build a model of how it can or should actually work, not just specify measurement criteria.

Once again, it seems perfectly possible to build an abstract theory of evolution (for example, evolutionary game theory would be one component of that theory). Of course, the specific organisms we have on Earth with their specific quirks is not something we can describe by simple equations: unsurprisingly, since they are a rather arbitrary point in the space of all possible organisms!

I do consider computational learning theory to be evidence for rationality realism. However, I think it’s an open question whether CLT will turn out to be particularly useful as we build smarter and smarter agents—to my knowledge it hasn’t played an important role in the success of deep learning, for instance.

It plays a minor role in deep learning, in the sense that some "deep" algorithms are adaptations of algorithms that have theoretical guarantees. For example, deep Q-learning is an adaptation of ordinary Q-learning. Obviously I cannot prove that it is possible to create an abstract theory of intellig

... (read more)

-1Richard_Ngo7y

It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like "species will evolve faster when there are predators in their environment" and "species which use sexual reproduction will be able to adapt faster to novel environments". The analogous abstract theory of intelligence can tell us things like "agents will be less able to achieve their goals when they are opposed by other agents" and "agents with more compute will perform better in novel environments". These sorts of conclusions are not very useful for safety. Sorry, my response was a little lazy, but at the same time I'm finding it very difficult to figure out how to phrase a counterargument beyond simply saying that although intelligence does allow us to understand physics, it doesn't seem to me that this implies it's simple or fundamental. Maybe one relevant analogy: maths allows us to analyse tic-tac-toe, but maths is much more complex than tic-tac-toe. I understand that this is probably an unsatisfactory intuition from your perspective, but unfortunately don't have time to think too much more about this now; will cover it in a follow-up. Agreed. But the fact that the main component of human reasoning is something which we have no idea how to formalise is some evidence against the possibility of formalisation - evidence which might be underweighted if people think of maths proofs as a representative example of reasoning. I'm going to cop out of answering this as well, on the grounds that I have yet another post in the works which deals with it more directly. One relevant claim, though: that extreme optimisation is fundamentally alien to the human psyche, and I'm not sure there's

[-]Vanessa Kosoy7yΩ7210

It seems like we might actually agree on this point: an abstract theory of evolution is not very useful for either building organisms or analysing how they work, and so too may an abstract theory of intelligence not be very useful for building intelligent agents or analysing how they work. But what we want is to build better birds! The abstract theory of evolution can tell us things like “species will evolve faster when there are predators in their environment” and “species which use sexual reproduction will be able to adapt faster to novel environments”. The analogous abstract theory of intelligence can tell us things like “agents will be less able to achieve their goals when they are opposed by other agents” and “agents with more compute will perform better in novel environments”. These sorts of conclusions are not very useful for safety.

As a matter of fact, I emphatically do not agree. "Birds" are a confusing example, because it speaks of modifying an existing (messy, complicated, poorly designed) system rather than making something from scratch. If we wanted to make something vaguely bird-like from scratch, we might have needed something like a "theory of self-sustaining, se

... (read more)

[-]Said Achmiz7y270

Excellent post!

I find myself agreeing with much of what you say, but there are a couple of things which strike me as… not quite fitting (at least, into the way I have thought about these issues), and also I am somewhat skeptical about whether your attempt at conceptually unifying these concerns—i.e., the concept of “rationality realism”—quite works. (My position on this topic is rather tentative, I should note; all that’s clear to me is that there’s much here that’s confusing—which is, however, itself a point of agreement with the OP, and disagreement with “rationality realists”, who seem much more certain of their view than the facts warrant.)

Some specific points:

… suppose that you just were your system 1, and that your system 2 was mostly a Hansonian rationalisation engine on top (one which occasionally also does useful maths)

This seems to me to be a fundamentally confused proposition. Regardless of whether Hanson is right about how our minds work (and I suspect he is right to a large degree, if not quite entirely right), the question of who we are seems to be a matter of choosing which aspect(s) of our minds’ functioning to endorse as ego-syntonic. Under this view, it is non

... (read more)

[-]Richard_Ngo7y160

Thanks for the helpful comment! I'm glad other people have a sense of the thing I'm describing. Some responses:

I am somewhat skeptical about whether your attempt at conceptually unifying these concerns—i.e., the concept of “rationality realism”—quite works.

I agree that it's a bit of a messy concept. I do suspect, though, that people who see each of the ideas listed above as "natural" do so because of intuitions that are similar both across ideas and across people. So even if I can't conceptually unify those intuitions, I can still identify a clustering.

Regardless of whether Hanson is right about how our minds work (and I suspect he is right to a large degree, if not quite entirely right), the question of who we are seems to be a matter of choosing which aspect(s) of our minds’ functioning to endorse as ego-syntonic. Under this view, it is nonsensical to speak of a scenario where it “turns out” that I “am just my system 1”.

I was a bit lazy in expressing it, but I think that the underlying idea makes sense (and have edited to clarify a little). There are certain properties we consider key to our identities, like consistency and introspective access. If w... (read more)

[-]Said Achmiz7y150

I agree that it’s a bit of a messy concept. I do suspect, though, that people who see each of the ideas listed above as “natural” do so because of intuitions that are similar both across ideas and across people. So even if I can’t conceptually unify those intuitions, I can still identify a clustering.

For the record, and in case I didn’t get this across—I very much agree that identifying this clustering is quite valuable.

As for the challenge of conceptual unification, we ought, I think, to treat it as a separate and additional challenge (and, indeed, we must be open to the possibility that a straightforward unification is not, after all, appropriate).

I was a bit lazy in expressing it, but I think that the underlying idea makes sense (and have edited to clarify a little). There are certain properties we consider key to our identities, like consistency and introspective access. If we find out that system 2 has much less of those than we thought, then that should make us shift towards identifying more with our system 1s. Also, the idea of choosing which aspects to endorse presupposes some sort of identification with the part of your mind that makes the choice. But I could imagine

... (read more)

[-]cousin_it7y180

The idea that there is an “ideal” decision theory.

There are many classes of decision problems that allow optimal solutions, but none of them can cover all of reality, because in reality an AI can be punished for having any given decision theory. That said, the design space of decision theories has sweet spots. For example, future AIs will likely face an environment where copying and simulation is commonplace, and we've found simple decision theories that allow for copies and simulations. Looking for more such sweet spots is fun and fruitful.

1Richard_Ngo7y

Imo we haven't found a simple decision theory that allows for copies and simulations. We've found a simple rule that works in limiting cases, but is only well-defined for identical copies (modulo stochasticity). My expectation that FDT will be rigorously extended from this setting is low, for much the same reason that I don't expect a rigorous definition of CDT. You understand FDT much better than I do, though - would you say that's a fair summary?

[-]cousin_it7y120

If all agents involved in a situation share the same utility function over outcomes, we should be able to make them coordinate despite having different source code. I think that's where one possible boundary will settle, and I expect the resulting theory to be simple. Whereas in case of different utility functions we enter the land of game theory, where I'm pretty sure there can be no theory of unilateral decision making.

1Richard_Ngo7y

I'm not convinced by the distinction you draw. Suppose you simulate me at slightly less than perfect fidelity. The simulation is an agent with a (slightly) different utility function to me. Yet this seems like a case where FDT should be able to say relevant things. In Abram's words, I expect that logical causality will be just as difficult to formalise as normal causality, and in fact that no "correct" formalisation exists for either.

1Alexander Gietelink Oldenziel4y

What. This seems obviously incorrect? The Pearl- Rubin- Sprites-Glymour- and others I theory of causality is a very powerful framework for causality that satisfies pretty what one intuitively understand as 'causality'. It is moreover powerful enough to make definite computations and even the much- craved for 'real applications'. It is 'a very correct' formalisation of 'normal' causality. I say 'very correct' instead of 'correct' because there are still areas of improvements - but this is more like GR improving on Newtonian gravity rather than Newtonian gravity being incorrect.

2Richard_Ngo4y

Got a link to the best overview/defense of that claim? I'm open to this argument but have some cached thoughts about Pearl's framework being unsatisfactory - would be useful to do some more reading and see if I still believe them.

2Alexander Gietelink Oldenziel4y

There are some cases where Pearl and others' causality framework can be improved - supposedly Factored Sets will, although I personally don't understand it. I was recently informed that certain abductive counterfactual phrases due to David Lewis are not well-captured by Pearl's system. I believe there are also other ways - all of this is actively being researched. What do you find unsatisfactory about Pearl? All of this is besides the point which is that there is a powerful well-developed, highly elegant theory of causality with an enourmous range of applications. Rubin's framework (which I am told is equivalent to Pearl) is used throughout econometrics - indeed econometrics is best understand as the Science of Causality. I am not an expert - I am trying to learn much of this theory right now. I am probably not the best person to ask about theory of causality. That said: I am not sure to what degree you are already familiar with Pearl's theory of causality but I recommend https://michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/ for an excellent introduction. THere is EY's https://www.lesswrong.com/posts/hzuSDMx7pd2uxFc5w/causal-diagrams-and-causal-models which you may or may not find convincing For a much more leisurely argument for Pearl's viewpoint, I recommend his "book of why". In a pinch you could take a look at the book review on the causality bloglist on LW. https://www.lesswrong.com/tag/causality

2Noosphere891y

To be fair, I expect a lot of the cases of identical copies modulo stochasticity to exist in the future, and indeed you could argue has already happened for AI, but I expect it to be more and more relevant by default, so FDT working in the identical copies case is still a really valuable niche.

[-]cousin_it7y140

Great post, thank you for writing this! Your list of natural-seeming ideas is very thought provoking.

The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general.

I used to think that way, but now I agree with your position more. Something like Bayesian rationality is a small piece that many problems have in common, but any given problem will have lots of other structure to be exploited as well. In many AI problems, like recognizing handwriting or playing board games, that lets you progress faster than if you'd tried to start with the Bayesian angle.

We could still hope that the best algorithm for any given problem will turn out to be simple. But that seems unlikely, judging from both AI tasks like MNIST, where neural nets beat anything hand-coded, and non-AI tasks like matrix multiplication, where asymptotically best algorithms have been getting more and more complex. As a rule, algorithms don't get simpler as they get better.

[-]Vladimir_Nesov7y160

I'm not sure what you changed your mind about. Some of the examples you give are unconvincing, as they do have simple meta-algorithms that both discover the more complicated better solutions and analyse their behavior. My guess is that the point is that for example looking into nuance of things like decision theory is an endless pursuit, with more and more complicated solutions accounting for more and more unusual aspects of situations (that can no longer be judged as clearly superior), and no simple meta-algorithm that could've found these more complicated solutions, because it wouldn't know what to look for. But that's content of values, the thing you look for in human behavior, and we need at least a poor solution to the problem of making use of that. Perhaps you mean that even this poor solution is too complicated for humans to discover?

1Richard_Ngo7y

There's a difference between discovering something and being able to formalise it. We use the simple meta-algorithm of gradient descent to train neural networks, but that doesn't allow us to understand their behaviour. Also, meta-algorithms which seem simple to us may not in fact be simple, if our own minds are complicated to describe.

[-]TurnTrout7y110

My impression is that an overarching algorithm would allow the agent to develop solutions for the specialized tasks, not that it would directly constitute a perfect solution. I don’t quite understand your position here – would you mind elaborating?

4cousin_it7y

My position goes something like this. There are many problems to be solved. Each problem may or may not have regularities to be exploited. Some regularities are shared among many problems, like Bayes structure, but others are unique. Solving a problem in reasonable time might require exploiting multiple regularities in it, so Bayes structure alone isn't enough. There's no algorithm for exploiting all regularities in all problems in reasonable time (this is similar to P≠NP). You can combine algorithms for exploiting a bunch of regularities, ending up with a longer algorithm that can't be compressed very much and doesn't have any simple core. Human intelligence could be like that: a laundry list of algorithms that exploit specific regularities in our environment.

9romeostevensit7y

> algorithms don't get simpler as they get better. or s you minimize cost along one dimension costs get pushed into other dimensions. Aether variables apply at the level of representation too.

[-]Rohin Shah6y120Nomination for 2018 Review

It's a common crux between me and MIRI / rationalist types in AI safety, and it's way easier to say "Realism about rationality" than to engage in an endless debate about whether everything is approximating AIXI or whatever that never seems to update me.

[-]DanielFilan6yΩ590Review for 2018 Review

I think it was important to have something like this post exist. However, I now think it's not fit for purpose. In this discussion thread, rohinmshah, abramdemski and I end up spilling a lot of ink about a disagreement that ended up being at least partially because we took 'realism about rationality' to mean different things. rohinmshah thought that irrealism would mean that the theory of rationality was about as real as the theory of liberalism, abramdemski thought that irrealism would mean that the theory of rationality would be about as real as the theo

This is one of the unfortunately few times there was *substantive* philosophical discussion on the forum. This is a central example of what I think is good about LW.

[-]Wei Dai7y80

It’s a mindset which makes the following ideas seem natural

I think within "realism about rationality" there are at least 5 plausible positions one could take on other metaethical issues, some of which do not agree with all the items on your list, so it's not really a single mindset. See this post, where I listed those 5 positions along with the denial of "realism about rationality" as the number 6 position (which I called normative anti-realism), and expressed my uncertainty as to which is the right one.

[-]Kaj_Sotala7y80

Curated this post for:

Having a very clear explanation of what feels like a central disagreement in many discussions, which has been implicit in many previous conversations but not explicitly laid out.
Having lots of examples of what kinds of ideas this mindset makes seem natural.
Generally being the kind of a post which I expect to be frequently referred back to as the canonical explanation of the thing.

[-]DanielFilan6y70Nomination for 2018 Review

This post gave a short name for a way of thinking that I naturally fall into, and implicitly pointing to the possibility of that way of thinking being mistaken. This makes a variety of discussions in the AI alignment space more tractable. I do wish that the post were more precise at characterising the position of 'realism about rationality' and its converse, or (even better) that it gave arguments for or against 'realism about rationality' (even a priors-based one as in this closely related Robin Hanson post), but pointing to a type of proposition and giving it a name seems very valuable.

4DanielFilan6y

Note that the linked technical report by Salamon, Rayhawk, and Kramar does a good job at looking at evidence for and against 'rationality realism', or as they call it, 'the intelligibility of intelligence'.

4DanielFilan6y

I do think that it was an interesting choice for the post to be about 'realism about rationality' rather than its converse, which the author seems to subscribe to. This probably can be chalked up to it being easier to clearly see a thinking pattern that you don't frequently use, I guess?

5Richard_Ngo6y

I think in general, if there's a belief system B that some people have, then it's much easier and more useful to describe B than ~B. It's pretty clear if, say, B = Christianity, or B = Newtonian physics. I think of rationality anti-realism less as a specific hypothesis about intelligence, and more as a default skepticism: why should intelligence be formalisable? Most things aren't! (I agree that if you think most things are formalisable, so that realism about rationality should be our default hypothesis, then phrasing it this way around might seem a little weird. But the version of realism about rationality that people buy into around here also depends on some of the formalisms that we've actually come up with being useful, which is a much more specific hypothesis, making skepticism again the default position.)

2DanielFilan6y

I think that rationality realism is to Bayesianism is to rationality anti-realism as theism is to Christianity is to atheism. Just like it's feasible and natural to write a post advocating and mainly talking about atheism, despite that position being based on default skepticism and in some sense defined by theism, I think it would be feasible and natural to write a post titled 'rationality anti-realism' that focussed on that proposition and described why it was true.

[-]Raemon7y70

Although not exactly the central point, seemed like a good time to link back to "Do you identify as the elephant or the rider?"

[-]linkhyrule57y60

I was kind of iffy about this post until the last point, which immediately stood out to me as something I vehemently disagree with. Whether or not humans naturally have values or are consistent is irrelevant -- that which is not required will happen only at random and thus tend not to happen at all, and so if you aren't very very careful to actually make sure you're working in a particular coherent direction, you're probably not working nearly as efficiently as you could be and may in fact be running in circles without noticing.

[-]drossbucket7y60

Thanks for writing this, it's a very concise summary of the parts of LW I've never been able to make sense of, and I'd love to have a better understanding of what makes the ideas in your bullet-pointed list appealing to those who tend towards 'rationality realism'. (It's sort of a background assumption in most LW stuff, so it's hard to find places where it's explicitly justified.)

Also:

What CFAR calls “purple”.

Is there any online reference explaining this?

6Richard_Ngo7y

I had a quick look for an online reference to link to before posting this, and couldn't find anything. It's not a particularly complicated theory, though: "purple" ideas are vague, intuitive, pre-theoretic; "orange" ones are explicable, describable and model-able. A lot of AI safety ideas are purple, hence why CFAR tells people not just to ignore them like they would in many technical contexts. I'll publish a follow-up post with arguments for and against realism about rationality.

1TAG6y

Or you could say vague and precise.

1drossbucket7y

Thanks for the explanation!

2Vaniver6y

This was my attempt to explain the underlying ideas.

[-]TruePath7y50

First, let me say I 100% agree with the idea that there is a problem in the rationality community of viewing rationality as something like momentum or gold (I named my blog rejectingrationality after this phenomena and tried to deal with it in my first post).

However, I'm not totally sure everything you say falls under that concept. In particular, I'd say that rationality realism is something like the belief that there is a fact of the matter about how best to form beliefs or take actions in response to a particular set of experiences and that ... (read more)

[-]Kaj_Sotala7y50

I like this post and the concept in general, but would prefer slightly different terminology. To me, a mindset being called "realism about rationality" implies that this is the realistic, or correct mindset to have; a more neutral name would feel appropriate. Maybe something like "'rationality is math' mindset" or "'intelligence is intelligible' mindset"?

5Richard_Ngo7y

Thanks for the link, I hadn't seen that paper before and it's very interesting. I chose "rationality realism" as a parallel to "moral realism", which I don't think carries the connotations you mentioned. I do like "intelligence is intelligible" as an alternative alliteration, and I guess Anna et al. have prior naming rights. I think it would be a bit confusing to retitle my post now, but happy to use either going forward.

4Kaj_Sotala7y

Glad you liked it! I guess you could infer that "just as moral realism implies that objective morality is real, rationality realism implies that objective rationality is real", but that interpretation didn't even occur to me before reading this comment. And also importantly, "rationality realism" wasn't the term that you used in the post; you used "realism about rationality". "Realism about morality" would also have a different connotation than "moral realism" does.

3Benquo7y

I realized a few paragraphs in that this was meant to be parallel to "moral realism," and I agree that a title of "rationality realism" would have been clearer.

[-]avturchin7y40

Some other ideas for the list of the "rationality realism":

Probability actually exists, and there is a correct theory of it.
Humans have values.
Rationality could be presented as a short set of simple rules.
Occam razor implies that simplest explanation is the correct one.
Intelligence could be measured by a single scalar - IQ.

2TAG6y

* correspondence theory is the correct theory of truth. * correspondence-truth is established by a combination of predictive accuracy and simplicity. * every AI has a utility function.. * .. even if its utility function is in the eye if the beholder * modal realism is false.. * .. but many worlds is true. * .. You shouldnt care about things that make no observable predictions.. * .. unless it's many worlds. * You are a piece of machinery with no free will... * ...but its vitally important to exert yourself to steer the world to a future without AI apocalypse.

1Richard_Ngo7y

These ideas are definitely pointing in the direction of rationality realism. I think most of them are related to items on my list, although I've tried to phrase them in less ambiguous ways.

[-]Jonathan Stray6y30

Very interesting post. I think exploring the limits of our standard models of rationality is very worthwhile. IMO the models used in AI tend to be far too abstract, and don't engage enough with situatedness, unclear ontologies, and the fundamental weirdness of the open world.

One strand of critique of rationality that I really appreciate is David Chapman's "meta-rationality," which he defines as "evaluating, choosing, combining, modifying, discovering, and creating [rational] systems"

https://meaningness.com/metablog/meta-rationality-curriculum

[-]DragonGod7y20

I consider myself a rational realist, but I don't believe some of the things you attribute to rational realism (particularly concerning morality) and particularly concerning consciousness. I don't think there's a true decision theory or true morality, but I do think that you could find systems of reasoning that are provably optimal within certain formal models.

There is no sense in which our formal models are true, but as long as they have high predictive power the models would be useful, and that I think is all that matters.

[-]Sammy Martin6y*10

"Implicit in this metaphor is the localization of personal identity primarily in the system 2 rider. Imagine reversing that, so that the experience and behaviour you identify with are primarily driven by your system 1, with a system 2 that is mostly a Hansonian rationalization engine on top (one which occasionally also does useful maths). Does this shift your intuitions about the ideas above, e.g. by making your CEV feel less well-defined?"

I find this very interesting because locating personal identity in system 1 feels conceptually impossible or... (read more)

7Sammy Martin6y

If this 'realism about rationality' really is rather like "realism about epistemic reasons/'epistemic facts'", then you have the 'normative web argument' to contend with - if you are a moral antirealist. Convergence and 'Dutch book' type arguments often appear in more recent metaethics, and the similarity has been noted, leading to arguments such as these: These considerations seem to clearly indicate 'realism about epistemic facts' in the metaethical sense: * The idea that there is an “ideal” decision theory. * The idea that, given certain evidence for a proposition, there's an "objective" level of subjective credence which you should assign to it, even under computational constraints. * The idea that having having contradictory preferences or beliefs is really bad, even when there’s no clear way that they’ll lead to bad consequences (and you’re very good at avoiding dutch books and money pumps and so on). This seems to directly concede or imply the 'normative web' Argument, or to imply some form of normative (if not exactly moral) realism: * The idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct. * The idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn’t depend very much on morally arbitrary factors. If 'realism about rationality' is really just normative realism in general, or realism about epistemic facts, then there is already an extensive literature on whether it is right or not. The links above are just the obvious starting points that came to my mind.

[+][comment deleted]6y20Nomination for 2018 Review

Moderation Log

Comment Permalink

Rohin Shah6y*Ω550

I think we disagree primarily on 2 (and also how doomy the default case is, but let's set that aside).

In claiming that rationality is as real as reproductive fitness, I'm claiming that there's a theory of evolution out there.

I think that's a crux between you and me. I'm no longer sure if it's a crux between you and Richard. (ETA: I shouldn't call this a crux, I wouldn't change my mind on whether MIRI work is on-the-margin more valuable if I changed my mind on this, but it would be a pretty significant update.)

Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.

Yeah, I was ignoring that sort of stuff. I do think this post would be better without the evolutionary fitness example because of this confusion. I was imagining the "unreal rationality" world to be similar to what Daniel mentions below:

I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.

But, separately, I don't get how you're seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they're separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution -- without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.

Yeah, I'm going to try to give a different explanation that doesn't involve "realness".

When groups of humans try to build complicated stuff, they tend to do so using abstraction. The most complicated stuff is built on a tower of many abstractions, each sitting on top of lower-level abstractions. This is most evident (to me) in software development, where the abstraction hierarchy is staggeringly large, but it applies elsewhere, too: the low-level abstractions of mechanical engineering are "levers", "gears", "nails", etc.

A pretty key requirement for abstractions to work is that they need to be as non-leaky as possible, so that you do not have to think about them as much. When I code in Python and I write "x + y", I can assume that the result will be the sum of the two values, and this is basically always right. Notably, I don't have to think about the machine code that deals with the fact that overflow might happen. When I write in C, I do have to think about overflow, but I don't have to think about how to implement addition at the bitwise level. This becomes even more important at the group level, because communication is expensive, slow, and low-bandwidth relative to thought, and so you need non-leaky abstractions so that you don't need to communicate all the caveats and intuitions that would accompany a leaky abstraction.

One way to operationalize this is that to be built on, an abstraction must give extremely precise (and accurate) predictions.

It's fine if there's some complicated input to the abstraction, as long as that input can be estimated well in practice. This is what I imagine is going on with evolution and reproductive fitness -- if you can estimate reproductive fitness, then you can get very precise and accurate predictions, as with e.g. the Price equation that Daniel mentioned. (And you can estimate fitness, either by using things like the Price equation + real data, or by controlling the environment where you set up the conditions that make something reproductively fit.)

If a thing cannot provide extremely precise and accurate predictions, then I claim that humans mostly can't build on top of it. We can use it to make intuitive arguments about things very directly related to it, but can't generalize it to something more far-off. Some examples from these comment threads of what "inferences about directly related things" looks like:

current theories about why England had an industrial revolution when it did

[biology] has far more practical consequences (thinking of medicine)

understanding why overuse of antibiotics might weaken the effect of antibiotics [based on knowledge of evolution]

Note that in all of these examples, you can more or less explain the conclusion in terms of the thing it depends on. E.g. You can say "overuse of antibiotics might weaken the effect of antibiotics because the bacteria will evolve / be selected to be resistant to the antibiotic".

In contrast, for abstractions like "logic gates", "assembly language", "levers", etc, we have built things like rockets and search engines that certainly could not have been built without those abstractions, but nonetheless you'd be hard pressed to explain e.g. how a search engine works if you were only allowed to talk with abstractions at the level of logic gates. This is because the precision afforded by those abstractions allows us to build huge hierarchies of better abstractions.

So now I'd go back and state our crux as:

Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?

I would guess not. It sounds like you would guess yes.

I think this is upstream of 2. When I say I somewhat agree with 2, I mean that you can probably get a theory of rationality that makes imprecise predictions, which allows you to say things about "directly relevant things", which will probably let you say some interesting things about AI systems, just not very much. I'd expect that, to really affect ML systems, given how far away from regular ML research MIRI research is, you would need a theory that's precise enough to build hierarchies with.

(I think I'd also expect that you need to directly use the results of the research to build an AI system, rather than using it to inform existing efforts to build AI.)

(You might wonder why I'm optimistic about conceptual ML safety work, which is also not precise enough to build hierarchies of abstraction. The basic reason is that ML safety is "directly relevant" to existing ML systems, and so you don't need to build hierarchies of abstraction -- just the first imprecise layer is plausibly enough. You can see this in the fact that there are already imprecise concepts that are directly talking about safety.)

The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don't usually need exact models of attackers, and a system which relies on those is less likely to be secure.

Your few assumptions need to talk about the system you actually build. On the model I'm outlining, it's hard to state the assumptions for the system you actually build, and near-impossible to be very confident in those assumptions, because they are (at least) one level of hierarchy higher than the (assumed imprecise) theory of rationality.

See in context