Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: lukeprog 07 April 2014 03:50:20AM *  3 points [-]

To do research, someone's got to have some actual interest in the problem space, or they'll end up fiddling around and doing stuff that's good for their interests or their long-term career but not necessarily for what their employer wants. So I don't know who has the capacity to acquire that interest. Tao would be good if he acquired an interest in the subject but I don't know if he could. Gowers at least commented on Baez's summary of the earlier Christiano result, but a short G+ comment isn't that much evidence. I don't currently know of any math superstars who want to work on FAI theory but only for a high salary — if I did, and I thought it would be a good hire, I'd reach out to MIRI's donors and try to solicit targeted donations for the hire.

Comment author: Adele_L 07 April 2014 05:45:23AM *  4 points [-]

Vladimir Voevodsky is a math superstar who plausibly could acquire such an interest.

Here is a summary of a recent talk he gave. After winning the Fields medal in 2002 for his work on motivic cohomology, he felt he was out of big ideas in that field. So he "decided to be rational in choosing what to do" and asked himself “What would be the most important thing I could do for math at this period of development and such that I could use my skills and resources to be helpful?” His first idea was to establish more connections between pure and applied mathematics. He worked on that for two years, and "totally failed." His second idea was to develop tools/software for mathematicians to help mathematicians check their proofs. There had already been lots of work on this subject, and several different software systems for this purpose already existed. So he looked at the already existing software. He found that either he could understand them, and see that they weren't what he wanted, or that they just didn't make any sense to him. "There was something obviously missing in the understanding of those." So he took a course at Princeton University on programming languages using the proof assistant Coq. Halfway through the course, he suddenly realized that Martin-Lof types could essentially be interpreted as homotopy types. This lead to a community of mathematicians who developed Homotopy Type Theory/Univalent Foundations with him, which is a completely new and self-contained foundation of mathematics.

Andrej Bauer, one of the Homotopy Type theorists, has said "We've already learned the lesson that we don't know how to program computers so they will have original mathematical ideas, maybe some day it will happen, but right now we know how to cooperate with computers. My expectation is that all these separate, limited AI success, like driving a car and playing chess, will eventually converge back, and then we're going to get computers that are really very powerful." Plausibly, Voevodsky himself also has some interest in AI.

So here is a mathematician with:

  • a solid track record of solving very difficult problems, and coming up with creative new insights.
  • good efforts to make rational decisions in what sort of mathematics he does, yielding an interest and willingness in completely switching fields if he thinks he can do more important things there.
  • an ability to solve practical problems using very abstract mathematics.

I think it would be worth trying to get him interested in FAI problems.

Comment author: ErinFlight 01 April 2014 03:31:52AM 1 point [-]

Thank you for the resources! Kahneman's book looks very interesting, and luckily my library has it. I'll check it out as soon as possible. I am planning on taking a Java Programming class next year. Does Java have the same set up/structure/foundation as the languages that are referenced on here? What would you say is the programming language that is most relevant to rationality (even if it isn't a good beginning language)?

Comment author: Adele_L 02 April 2014 01:13:11AM 1 point [-]

Awesome! Pretty much any language will give you enough background to understand the programming references here. I agree with John that Scheme and Python are good languages to start with. The most rational language to use depends a lot on what exactly you are trying to do, what you already know, and your personal style, so don't worry about that too much.

Comment author: ErinFlight 31 March 2014 01:19:55AM 11 points [-]

Hello, I'm Erin. I am currently in high school, so perhaps a little younger than the typical reader.

I'm fascinated by the thoughts here. This is the first community I've found that makes an effort to think about their own opinions, then is self aware enough to look at their own thought processes.

But, this might not be the place for this, I'm am struggling to understand anything technical on this website. I've enjoyed reading the sequences, and they have given me a lot to thing about. Still, I've read the introduction to Bayes theorem multiple times, and I simply can't grasp it. Even starting at the very beginning of the sequences I quickly get lost because there are references to programming and cognitive science which I simply do not understand.

I recently returned to this site after taking a statistics course, which has helped slightly. But I still feel rather lost.

Do you have any tips for how you utilized rationality when you were starting? How did you first incorporate it into your thought processes? Can you recommend any background material which might help me to understand the sequences better?

Comment author: Adele_L 31 March 2014 01:54:25AM *  4 points [-]

Hi Erin, I'm Adele! It's good to see young rationalists here. I think you might really like Thinking, Fast and Slow by Daniel Kahneman. Daniel Kahneman is a well-known psychologist, and winner of the 2002 Nobel prize in Economics. In this book, he goes through different thinking processes that humans often use, and how they are often wrong. It is not very technical, and is a pretty easy read IMO. It might also help with some of the cognitive science stuff in the sequences.

It's okay to not understand Bayes' theorem for now, knowing the math doesn't really make you that much better at being rational - there are easier things to do with larger gains. If you want to get the programming references, it might be worth learning to program. There are some online courses which make it relatively easy to get started. It's also a good skill to have for when you are looking for employment.

One thing that has helped me a lot in being more rational is having friends who can point out when I am being irrational. Another good place to look at (and go if you can) is CFAR, whose point is basically to help you get better at being rational.

Comment author: taryneast 28 February 2014 09:59:04AM 2 points [-]

Yeah. Beeminder doesn't work for me either - nor do most online punishment-based motivators.

My problem with it is that it doesn't punish you for failing to do the thing you need to do. It punishes you for failing to record the fact that you did the thing you need to do.

So if you're time-poor (like me) and still managed to do the thing... but didn't have time to go online and tell beeminder that you did the thing... you still get punished. :(

Comment author: Adele_L 23 March 2014 03:17:27PM 1 point [-]

Yeah, I have the same problem with it. When my productivity went up, I actually went off the road because I couldn't be bothered to record it all.

Comment author: VipulNaik 14 March 2014 01:08:40AM 2 points [-]

Thanks for your response.

I agree that the general criticisms that I made of physics can also be leveled against most upper-division undergraduate mathematics courses (i.e., stuff that is generally taken only by math majors). That's a topic that I plan to take up some other time. (As a math Ph.D., I certainly enjoyed a lot of upper-division mathematics).

What I think distinguishes physics from mathematics is that the diminishing returns from physics start setting in earlier than they do for mathematics, and the extent of applicability of physics is more limited (for instance, classical mechanics is somewhat useful, but not as much as calculus -- and both are done at roughly the same educational stage).

Your answer, however, is an update in favor of physics having value.

Comment author: Adele_L 14 March 2014 06:07:19AM *  4 points [-]

What I think distinguishes physics from mathematics is that the diminishing returns from physics start setting in earlier than they do for mathematics, and the extent of applicability of physics is more limited (for instance, classical mechanics is somewhat useful, but not as much as calculus -- and both are done at roughly the same educational stage).

This doesn't seem obviously true to me. It seems like learning to model actual systems with differential equations and the like would be much more applicable than most upper level mathematics. And the math stuff that is more generally useful, like linear algebra, gets adequately covered in physics. (For what it's worth, I'm a number theory grad student, and I minored in physics as an undergrad).

Comment author: jimrandomh 12 March 2014 05:36:50PM *  11 points [-]

If you limit the domain of your utility function to a sensory channel, you have already lost; you are forced into a choice between a utility function that is wrong, or a utility function with a second induction system hidden inside it. This is definitely unrecoverable.

However, I see no reason for Solomonoff-inspired agents to be structured that way. If the utility function's domain is a world-model instead, then it can find itself in that world-model and the self-modeling problem vanishes immediately, leaving only the hard but philosophically-valid problem of defining the utility function we want.

Comment author: Adele_L 13 March 2014 04:18:59PM 1 point [-]

Alex Mennen has described a version of AIXI with a utility function of the environment.

Comment author: Vulture 03 March 2014 03:37:37PM 1 point [-]

It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.

Apologies if this is a stupid question - I am not an expert - but how do we know what "level of reality" to have our UDT-agent model its world-models with? That is, if we program the agent to produce and scan universe-models consisting of unsliced representations of quark and lepton configurations, what happens if we discover that quarks and leptons are composed of more elementary particles yet?

Comment author: Adele_L 03 March 2014 10:23:59PM 2 points [-]

Wei Dai has suggested that the default setting for a decision theory be Tegmark's Level 4 Multiverse - where all mathematical structures exist in reality. So a "quark - lepton" universe and a string theory universe would both be considered among the possible universes - assuming they are consistent mathematically.

Of course, this makes it difficult to specify the utility function.

Comment author: RobbBB 02 March 2014 02:53:57AM *  5 points [-]

Have you considered that you may be spending a lot of time writing up a problem that has already been solved, and should spend a bit more time checking whether this is the case, before going much further on your path?

Yes! If UDT solves this problem, that's extremely good news. I mention the possibility here. Unfortunately, I (and several others) don't understand UDT well enough to tease out all the pros and cons of this approach. It might take a workshop to build a full consensus about whether it solves the problem, as opposed to just reframing it in new terms. (And, if it's a reframing, how much it deepens our understanding.)

Part of the goal of this sequence is to put introductory material about this problem in a single place, to get new workshop attendees and LWers on the same page faster. A lot of people are already familiar with these problems and have made important progress on them, but the opening moves are still scattered about in blog comments, private e-mails, wiki pages, etc.

It would be very valuable to pin down concrete examples of how UDT agents behave better than AIXI. (That may be easier after my next post, which goes into more detail about how and why AIXI misbehaves.) Even people who aren't completely on board with UDT itself should be very excited about the possibility of showing that AIXI not only runs into a problem, but runs into a formally solvable problem. That makes for a much stronger case.

But these seem to be two instances of the same general problem, and it seems like an AGI problem rather than an FAI problem -- if you don't know how to do this, then you can't use math to make predictions about physical systems, which makes it hard to be generally intelligent.

Goal stability looks like an 'AGI problem' in the sense that nearly all superintelligences converge on stable goals, but in practice it's an FAI problem because a UFAI's method of becoming stable is probably very different from an FAI's method of being stable. Naturalized induction is an FAI problem in the same way; it would get solved by an UFAI, but that doesn't help us (especially since the UFAI's methods, even if we knew them, might not generalize well to clean, transparent architectures).

Comment author: Adele_L 02 March 2014 07:50:03PM 6 points [-]

I think I can explain why we might expect an UDT agent to avoid these problems. You're probably already familiar with the argument at this level, but I haven't seen it written up anywhere yet.

First, we'll describe (informally) an UDT agent as a mathematical object. The preferences of the agent are built in (so no reward channel, which allows us to avoid preference solipsism). It will also have models of every possible universe, and also an understanding of its own mathematical structure. To make a decision given a certain input, it will scan each universe model for structures that will be logically dependent on its output. It will then predict what will happen in each universe for each particular output. Then, it will choose the output that maximizes its preferences.

Now let's see why it won't have the immortality problem. Let's say the agent is considering an output string corresponding to an anvil experiment. After running the predictions of this in its models, it will realize that it will lose a significant amount of structure which is logically dependent on it. So unless it has very strange preferences, it will mark this outcome as low utility, and consider better options.

Similarly, the agent will also notice that some outputs correspond to having more structures which are logically dependent on it. For example, an output that built a faster version of an UDT agent would allow more things to be affected by future outputs. In other words, it would be able to self-improve.

To actually implement an UDT agent with these preferences, we just need to create something (most likely a computer programmed appropriately) that will be logically dependent on this mathematical object to a sufficiently high degree. This, of course, is the hard part, but I don't see any reasons why a faithful implementation might suddenly have these specific problems again.

Another nice feature of UDT (which sometimes is treated as a bug) is that it is extremely flexible in how you can choose the utility function. Maybe you Just Don't Care about worlds that don't follow the Born probabilities - so just ignore anything that happens in such a universe in your utility function. I interpret this as meaning that UDT is a framework decision theory that could be used regardless of what the answers (or maybe just preferences) to anthropics, induction or other such things end up being.

Oh, and if anyone notices something I got wrong, or that I seem to be missing, please let me know - I want to understand UDT better :)

Comment author: RobbBB 18 February 2014 05:50:40AM *  3 points [-]

Thanks, Adele!

You need your AI to realize that the map is part of the territory.

That's right, if you mean 'representations exist, so they must be implemented in physical systems'.

But the Cartesian agrees with 'the map is part of the territory' on a different interpretation. She thinks the mental and physical worlds both exist (as distinct 'countries' in a larger territory). Her error is just to think that it's impossible to redescribe the mental parts of the universe in physical terms.

A Cartesian agent would probably be relatively slower at FOOMing

An attempt at a Cartesian seed AI would probably just break, unless it overcame its Cartesianness by some mostly autonomous evolutionary algorithm for generating successful successor-agents. A human programmer could try to improve it over time, but it wouldn't be able to rely much on the AI's own intelligence (because self-modification is precisely where the AI has no defined hypotheses), so I'd expect the process to become increasingly difficult and slow and ineffective as we reached the limits of human understanding.

I think the main worry with Cartesians isn't that they're dumb-ish, so they might become a dangerously unpredictable human-level AI or a bumbling superintelligence. The main worry is that they're so dumb that they'll never coalesce into a working general intelligence of any kind. Then, while the build-a-clean-AI people (who are trying to design simple, transparent AGIs with stable, defined goals) are busy wasting their time in the blind alley of Cartesian architectures, some random build-an-ugly-AI project will pop up out of left field and eat us.

Build-an-ugly-AI people care about sloppy, quick-and-dirty search processes, not so much about AIXI or Solomonoff. So the primary danger of Cartesians isn't that they're Unfriendly; it's that they're shiny objects distracting a lot of the people with the right tastes and competencies for making progress toward Friendliness.

The bootstrapping idea is probably a good one: There's no way we'll succeed at building a perfect FAI in one go, so the trick will be to cut corners in all the ways that can get fixed by the system, and that don't make the system unsafe in the interim. I'm not sure Cartesianism is the right sort of corner to cut. Yes, the AI won't care about self-preservation; but it also won't care about any other interim values we'd like to program it with, except ones that amount to patterns of sensory experience for the AI.

Comment author: Adele_L 18 February 2014 06:35:44AM 1 point [-]

Thank you, this helps clarify things for me.

Yes, the AI won't care about self-preservation; but it also won't care about any other interim values we'd like to program it with, except ones that amount to patterns of sensory experience for the AI.

I get why AIXI would behave like this, but it's not obvious to me that all Cartesian AIs would probably have this problem. If the AI has some model of the world, and this model can still update (mostly correctly) based on what the sensory channel inputs, and predict (mostly correctly) how different outputs can change the world, it seems like it could still try to maximize making as many paperclips as possible according to its model of the world. Does that make sense?

Comment author: Adele_L 18 February 2014 04:58:09AM 7 points [-]

I really appreciate your clear expositions!

I thought of a phrase to quickly describe the gist of this problem: You need your AI to realize that the map is part of the territory.

Also, I was thinking that the fact that this is a problem might be a good thing. A Cartesian agent would probably be relatively slower at FOOMing, since it can't natively conceive of modifying itself. (I still think a sufficiently intelligent one would still be highly dangerous and capable of FOOMing, though) A bigger advantage might be that it could potentially be used to control a 'baby' AI that is still being trained/built, since there is this huge blindspot in they way they can model the world. For example, imagine that a Cartesian AI is trying to increase its computational power, and it notices that there happens to be a lot of computational power right in easy access! So it starts reprogramming it to suit its own nefarious needs - but whoops, it just destroyed itself. Might act as a sort of fuse for a too ambitious AI. Or maybe, this could be used to more safely grow a seed AI - you tell it to write a design for a better version of itself. Then you could turn it off (which is easier to do since it is Cartesian), check that the design was sound, build it, and then work on the next generation AI, instead of trying to let it FOOM in controlled intervals. At some point, you could presumably ask it to solve this problem, and then design a new generation based on that. I don't know how plausible these scenarios are, but it is interesting to think about.

View more: Next