Filter All time

You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

[Link] Yudkowsky's Guide to Writing Intelligent Characters

4 Vaniver 28 September 2016 02:36PM

Article on IQ: The Inappropriately Excluded

4 buybuydandavis 19 September 2016 01:36AM

I saw an article on high IQ people being excluded from elite professions. Because the site seemed to have a particular agenda related to the article, I wanted to check here for other independent supporting evidence for the claim.

Their fundamental claim seems to be that P(elite profession|IQ) peaks at 133 and decreases thereafter, and goes do to 3% of peak at 150. If true, I'd find that pretty shocking.

They indicate this diminishing probability of "success" at the high tail of the IQ distribution as a known effect. Anyone got other studies on this? 

The Inappropriately Excluded

By dividing the distribution function of the elite professions' IQ by that of the general population, we can calculate the relative probability that a person of any given IQ will enter and remain in an intellectually elite profession.  We find that the probability increases to about 133 and then begins to fall.  By 140 it has fallen by about 1/3 and by 150 it has fallen by about 97%.  In other words, for some reason, the 140s are really tough on one's prospects for joining an intellectually elite profession.  It seems that people with IQs over 140 are being systematically, and likely inappropriately, excluded. 

The map of the methods of optimisation (types of intelligence)

4 turchin 15 September 2016 03:04PM
Optimisation process  is an ability to quickly search space of possible solution based on some criteria. 
We live in the Universe full of different optimisation processes, but we take many of them for granted. The map is aimed to show full spectrum of all known and possible optimisation processes. It may be useful in our attempts to create AI.

The interesting thing about different optimisation processes if that they come to similar solution (bird and plane) using completely different paths. The main consequences of it is that dialog between different optimisation processes is not possible. They could interact but they could not understand each other. 

The one thing which is clear from the map is that we don’t live in the empty world where only one type intelligence is slowly evolving. We live in the world which resulted from complex interaction of many optimisation processes. It also lowers chances of intelligence explosion, as it will have to compete with many different and very strong optimisation processes or results of their work. 

But most of optimisation processes are evolving in synergy from the beginning of the universe and in general it looks like that many of them  are experiencing hyperbolic acceleration with fixed date of singularity around 2030-2040. (See my post and also ideas of J.Smart and Schmidhuber

While both model are centred around creation of AI and assume radical changes resulting from it in short time frame, the nature of them is different. In first case it is one-time phase transition starting in one point, and in second it is evolution of distributed net.

I add in red hypothetical optimisation processes which doesn’t exist or proved, but may be interesting to consider.  I mark in green my ideas. 

The pdf of the map is here






Willpower Schedule

4 SquirrelInHell 22 August 2016 01:05PM

 


TL;DR: your level of willpower depends on how much willpower you expect to need (hypothesis)


 

Time start: 21:44:55 (this is my third exercise in speed writing a LW post)

I.

There is a lot of controversy about how our level of willpower is affected by various factors, including doing "exhausting" tasks before, as well as being told that willpower is a resource that depletes easily, or doesn't etc.

(sorry, I can't go look for references - that would break the speedwriting exercise!)

I am not going to repeat the discussions that already cover those topics; however, I have a new tentative model which (I think) fits the existing data very well, is easy to test, and supersedes all previous models that I have seen.

II.

The idea is very simple, but before I explain it, let me give a similar example from a different aspect of our lives. The example is going to be concerned with, uh, poo.

Have you ever noticed that (if you have a sufficiently regular lifestyle), conveniently you always feel that you need to go to the toilet at times when it's possible to do so? Like for example, how often do you need to go when you are on a bus, versus at home or work?

The function of your bowels is regulated by reading subconscious signals about your situation - e.g. if you are stressed, you might become constipated. But it is not only that - there is a way in which it responds to your routines, and what you are planning to do, not just the things that are already affecting you.

Have you ever had the experience of a background thought popping up in your mind that you might need to go within the next few hours, but the time was not convenient, so you told that thought to hold it a little bit more? And then it did just that?

III.

The example from the previous section, though possibly quite POOrly choosen (sorry, I couldn't resist), shows something important.

Our subconscious reactions and "settings" of our bodies can interact with our conscious plans in a "smart" way. That is, they do not have to wait to see the effects of what you are doing, to adjust to it - they can pull information from your conscious plans and adjust *before*.

And this is, more or less, the insight that I have added to my current working theory of willpower. It is not very complicated, but perhaps non-obvious. Sufficiently non-obvious that I don't think anyone has suggested it before, even after seeing experimental results that match this excellently.

IV.

To be more accurate, I claim that how much willpower you will have depends on several important factors, such as your energy and mood, but it also depends on how much willpower you expect to need.

For example, if you plan to have a "rest day" and not do any serious work, you might find that you are much less *able* to do work on that day than usual.

It's easy enough to test - so instead of arguing this theoretically, please do just that - give it a test. And make sure to record your levels of willpower several times a day for some time - you'll get some useful data!

 

Time end: 20:00:53. Statistics: 534 words, 2924 characters, 15.97 minutes, 33.4 wpm, 183.1 cpm

Corrigibility through stratified indifference

4 Stuart_Armstrong 19 August 2016 04:11PM

A putative new idea for AI control; index here.

Corrigibility through indifference has a few problems. One of them is that the AI is indifferent between the world in which humans change its utility to v, and world in which humans try to change its utility, but fail.

Now the try-but-fail world is going to be somewhat odd - humans will be reacting by trying to change the utility again, trying to shut the AI down, panicking that a tiny probability event has happened, and so on.

continue reading »

Seeking Optimization of New Website "New Atheist Survival Kit," a go-to site for newly-made atheists

4 Bound_up 16 August 2016 01:03AM

I've put together a website, "New Atheist Survival Kit" at atheistkit.wordpress.com

 

The idea is to help new atheists come to terms with their change in belief, and also invite them to become more than atheists: rationalists.

 

And if it helps theists become atheists, too, and helps old atheists become rationalists, more the better.

 

The bare bones of it are all in place now. Once a few people have gone over it, for editing, and for advice about what to include, leave out, improve, re-organize, whatever, I'll ask a bunch of atheist and rationalist communities to write up their own blurb for us to include in a list of communities that we'll point people to in the "Atheist Communities" or "Thinker's Communities" sections on the main menu.

It includes my rough draft attempt to basically bring down the Metaethics sequence to a few thousand words and make it stylistically and conceptually accessible to a mass audience, which I could especially use some help with.

 

So, for now, I'm here to ask that anyone interested check it out, and message me any improvements they think worth making, from grammar and spelling all the way up to what content to include, or how to present things.

 

Thanks to all for any help.

Help with Bayesian priors

4 WikiLogicOrg 14 August 2016 10:24AM

I posted before about an open source decision making web site I am working on called WikiLogic. The site has a 2 minute explanatory animation if you are interested. I wont repeat myself but the tl;dr is that it will follow the Wikipedia model of allowing everyone to collaborate on a giant connected database of arguments where previously established claims can be used as supporting evidence for new claims.

The raw deduction element of it works fine and would be great in a perfect world where such a thing as absolute truths existed, however in reality we normally have to deal with claims that are just the most probable. My program allows opposing claims to be connected and then evidence to be gathered for each. The evidence will create a probability of it being correct and which ever is highest, gets marked as best answer. Principles such as Occams Razor are applied automatically as long list of claims used as evidence will be less likely as each claim will have its own likelihood which will dilute its strength.

However, my only qualification in this area is my passion and I am hitting a wall with some basic questions. I am not sure if this is the correct place to get help with these. If not, please direct me somewhere else and I will remove the post.

 

The arbitrarily chosen example claim I am working with is whether “Alexander the Great existed”. This has the useful properties of 1: an expected outcome (that he existed - although, perhaps my problem is that this is not the case!) and 2: it relies heavily on probability as there is little solid evidence.

One popular claim is that coins were minted with his face on them. I want to use Bayes to find how likely a face appearing on a coin is for someone who existed. As I understand it, there should be 4 combinations:

  1. Existed; Had a coin minted
  2. Existed; Did not have a coin minted
  3. No Existed; Had a coin minted
  4. No Existed; Did not have a coin minted

 

The first issue is that there are infinite people who never existed and did not have a coin made. If I narrow it to historic figures who turned out not to exist and did not have a coin made it becomes possible but also becomes subjective as to whether someone actually thought they existed. For example, did people believe the Minotaur existed?

Perhaps I should choose another filter instead of historic figure, like humans that existed. But picking and choosing the category is again so subjective. Someone may also argue that woman inequality back then was so great that the data should only look at men, as a woman’s chance of being portrayed on a coin was skewed in a way that isn’t applicable to men.

I hope i have successfully communicated the problem i am grappling with and what i want to use it for. If not, please ask for clarifications. A friend in academia suggested that this touches on a problem with Bayes priors that has not been settled. If that is the case, is there any suggested resources for a novice with limited free time, to start to explore the issue? References to books or other online resources or even somewhere else I should be posting this kind of question would all be gratefully received. Not to mention a direct answer in the comments!

Open Thread, Aug. 8 - Aug 14. 2016

4 Elo 07 August 2016 11:07PM

If it's worth saying, but not worth its own post, then it goes here.


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "

Open Thread, Aug. 1 - Aug 7. 2016

4 Elo 01 August 2016 12:12AM

If it's worth saying, but not worth its own post, then it goes here.


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "

Street Epistemology - letting people name their thinking errors

4 Bound_up 24 July 2016 07:43PM

https://www.youtube.com/watch?v=Exmjlc4PfEQ

 

Anthony Magnabosco does what he calls Street Epistemology, usually applying it to supernatural (usually religious) beliefs.

 

The great thing about his method (and his manner, guy's super personable) is that he avoids the social structure of a debate, of two people arguing, of a zero-sum one game where person wins at the other's loss.

 

I've struggled with trying to figure out how to let people save face in disputes (when they're making big, awful mistakes), even considering including minor errors (that don't affect the main point) in my arguments so that they could point them out and we could both admit we were wrong (in their case, about things which do affect the main point) and move on.

 

But this guy's technique manages to invite people to correct their own errors (people are SOOOO much more rational when they're not defensive) and they DO it. No awkwardness, no discomfort, and people pointing out the flaws in their own arguments, and then THANKING him for the talk afterwards and referring him to their friends to talk. Even though they just admitted that their cherished beliefs might not deserve the certainty they've been giving them.

 

This is applied to religion in this video, but this seems to me to be a generally useful method when you confront someone making an error in their thinking. Are you forcing people to swallow their pride a little (over and over) when they talk with you? Get that out, and watch how much more open people can be.

Fallacymania: party game where you notice fallacies in arguments

4 Alexander230 21 July 2016 09:34AM

Fallacymania is a game developed by Moscow LessWrong community. Main goals of this game is to help people notice fallacies in arguments, and of course to have fun. The game requires 3-20 players (recommended 4-12), and some materials: printed A3 sheets with fallacies (5-10 sheets), card deck with fallacies (you can cut one A3 sheet into cards, or print stickers and put them to common playing cards), pens and empty sheets, and 1 card deck of any type with at least 50 cards (optional, for counting guessing attempts). Rules of the game are explained here:

https://drive.google.com/open?id=0BzyKVqP6n3hKQWNzV3lWRTYtRzg

This is the sheet of fallacies, you can download it and print on A3 or A2 sheet of paper:

https://drive.google.com/open?id=0BzyKVqP6n3hKVEZSUjJFajZ2OTA

Also you can use this sheet to create playing cards for debaters.

When we created this game, we used these online articles and artwork about fallacies:

http://obraz.io/ru/posters/poster_view/1/?back_link=%2Fru%2F&lang=en&arrow=right
http://www.informationisbeautiful.net/visualizations/rhetological-fallacies/
http://lesswrong.com/lw/e95/the_noncentral_fallacy_the_worst_argument_in_the/

Also I've made electronic version of Fallacymania for Tabletop Simulator (in Steam Workshop):

http://steamcommunity.com/sharedfiles/filedetails/?id=723941480

 

[Link] NYU conference: Ethics of Artificial Intelligence (October 14-15)

4 ignoranceprior 16 July 2016 09:07PM

FYI: https://wp.nyu.edu/consciousness/ethics-of-artificial-intelligence/

This conference will explore these questions about the ethics of artificial intelligence and a number of other questions, including:

What ethical principles should AI researchers follow?
Are there restrictions on the ethical use of AI?
What is the best way to design morally beneficial AI?
Is it possible or desirable to build moral principles into AI systems?
When AI systems cause benefits or harm, who is morally responsible?
Are AI systems themselves potential objects of moral concern?
What moral framework is best used to assess questions about the ethics of AI?

Speakers and panelists will include:

Nick Bostrom (Future of Humanity Institute), Meia Chita-Tegmark (Future of Life Institute), Mara Garza (UC Riverside, Philosophy), Sam Harris (Project Reason), Demis Hassabis (DeepMind/Google), Yann LeCun (Facebook, NYU Data Science), Peter Railton (University of Michigan, Philosophy), Francesca Rossi (University of Padova, Computer Science), Stuart Russell (UC Berkeley, Computer Science), Susan Schneider (University of Connecticut, Philosophy), Eric Schwitzgebel (UC Riverside, Philosophy), Max Tegmark (Future of Life Institute), Wendell Wallach (Yale, Bioethics), Eliezer Yudkowsky (Machine Intelligence Research Institute), and others.

Organizers: Ned Block (NYU, Philosophy), David Chalmers (NYU, Philosophy), S. Matthew Liao (NYU, Bioethics)

A full schedule will be circulated closer to the conference date.

Registration is free but required. REGISTER HERE. Please note that admission is limited, and is first-come first-served: it is not guaranteed by registration.

Open thread, Jul. 11 - Jul. 17, 2016

4 MrMind 11 July 2016 07:09AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

Open thread, Jul. 04 - Jul. 10, 2016

4 MrMind 04 July 2016 07:02AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

General-Purpose Questions Thread

4 Sable 19 June 2016 07:29AM

Similar to the Crazy Ideas Thread and Diaspora Roundup Thread, I thought I'd try making a General-Purpose Questions Thread.

 

The purpose is to provide a forum for asking questions to the community (appealing to the wisdom of this particular crowd) in things that don't really merit their own thread.

Revitalising Less Wrong is not a lost purpose

4 casebash 15 June 2016 08:10AM

ohn_Maxwell_IV argued that revitalising Less Wrong is a lost purpose. I'm also very skeptical about Less Wrong 2.0 - but I wouldn't agree with it being a lost purpose. It is just that we are currently not on a track to anywhere. The #LW_code_renovation channel resulted in a couple of minor code changes, but there hasn't been any discussion for at least a month. All that this means, however, is that if we want a better less wrong that we have to do something other than what we have been doing so far. Here are some suggestions.

Systematic changes, not content production

The key problem currently is the lack of content, so the most immediate solution is to produce more content. However, not many people are an Elizier or a Scott. Think about what percentage of blog are actually successful - now throw on the extra limitation of having to be on topic on Less Wrong. Note that many of Scott's most popular posts would be too political to be posted on Less Wrong. Trying to get a group of people together to post content on Less Wrong wouldn't work. Let's say 10 people agreed to join such a group. 5 would end up doing nothing, 3 would do 2-3 posts and it'd fall on the last 2 to drive the site. The odds would be strongly against them. Most people can't consistently pump out high quality content.

The plan to get people to return to Less Wrong and post here won't work either unless combined with changes. Presumably, people have moved to their own blogs for a reason. Why would they come back to posting on Less Wrong, unless something was changed? We might be able to convince some people to make a few posts here, but we aren't going to return the community to its glory days without consistent content.

Why not try to change how the system is set up instead to encourage more content?

Decide on a direction

We now have a huge list of potential changes, but we don't have a direction. Some of those changes would help bring in more content and solve the key issue, while other changes wouldn't. The problem is that there is currently no consensus on what needs to be done. This makes it so much less likely that anything will actually get done, particularly given that it isn't clear whether a particular change would be approved or not if someone did actually do it. At the moment, what we have is people coming on to the site suggesting features and there is discussion, but there isn't anyone or any group in charge to say if you implement this that we would use it. So people will often never start these projects.

Before we can even tackle the problem of getting things done, we need to tackle the problem of what needs to be done. The current system of people simply making posts in discussion in broken - we never even get to the consensus stage, let alone implementation. I'm still thinking about the best way to resolve this, I think I'll post more about this in a future post. Regardless, practically *any* system, would be better than what we have now where there is *no* decision that is ever made.

Below I'll suggest what I think our direction should be:

Positions

Less Wrong is the website for global movement and has a high number of programmers, yet some societies in my university are more capable of getting things done than we are. Part of the reason is that university societies have positions - people decide to run for a position and this grants them status, but also creates responsibilities. At the moment, we have *no-one* working on adding features the website. We'd actually be better off if we held an election for the position of webmaster and *only* had that person working on the website. I'm not saying we should restrict a single person to being able to contribute code for our website, I'm just saying that *right now* implementing this stupid policy would actually improve things. I imagine that there would be at least *one* decent programmer for whom the status would be worth the work given that half the people here seem to be programmers.

Links

If we want more content, then an easy way would be to have a links section, because posting a link is about 1% of the effort of trying to write a Less Wrong post. In order to avoid diluting discussion, these links would have to be posted in their own section. Given that this system is based upon Reddit, this should be super easy.

Sections

The other easy way to generate more content would be to change the rules about what content is on or off topic. This comes with risks - many people like the discussion section how it is. However, if a separate section was created, then people would be able to have these additional discussions without impacting how discussion works at the moment. Many people have argued for a tag system, but whether we simply create additional categories or use tags would be mostly irrelevant. If we have someone who is willing to build this system, then we can do it, if not, then we should just use another category. Given that there is already Main and Discussion I can't imagine that it would be that hard to add in another category of posts. There have been many, many suggestions of what categories we could have. If we just want to get something done, then the simplest thing is to add a single new category, Open, which has the same rules as the Open Threads that we are already running.

Halve downvotes

John_Maxwell_IV points out that too many posts are getting downvotes and critical comments. We could try to change the culture of Less Wrong, perhaps ask a high status individual like Scott or Elizier to request people to be less critical. And that might even work for even a week or a month, before people forget about it. Or we could just halve downvotes. While not completely trivial, this change would be about as simple as they come. We might want to only halve downvotes on articles, not comments, because we seem to get enough comments already, just not enough content. I don't think it'll lower the quality of content too much - quite often there are more people who would downvote a post, but they don't bother because the content is already below zero. I think this might be worth a go - I see a high potential upside, but not much in the way of downside.

Crowdsourcing

If we could determine that a particular set of features would have a reasonable chance of improving LessWrong, then we could crowd-source putting a bounty on someone implementing these features. I suspect that there are many people who'd be happy to donate some money and if we chose simple, well defined features, then it actually wouldn't be that expensive.

Open Thread June 6 - June 12, 2016

4 Elo 06 June 2016 04:21AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

The AI in Mary's room

4 Stuart_Armstrong 24 May 2016 01:19PM

In the Mary's room thought experiment, Mary is a brilliant scientist in a black-and-white room who has never seen any colour. She can investigate the outside world through a black-and-white television, and has piles of textbooks on physics, optics, the eye, and the brain (and everything else of relevance to her condition). Through this she knows everything intellectually there is to know about colours and how humans react to them, but she hasn't seen any colours at all.

After that, when she steps out of the room and sees red (or blue), does she learn anything? It seems that she does. Even if she doesn't technically learn something, she experiences things she hadn't ever before, and her brain certainly changes in new ways.

The argument was intended as a defence of qualia against certain forms of materialism. It's interesting, and I don't intent to solve it fully here. But just like I extended Searle's Chinese room argument from the perspective of an AI, it seems this argument can also be considered from an AI's perspective.

Consider a RL agent with a reward channel, but which currently receives nothing from that channel. The agent can know everything there is to know about itself and the world. It can know about all sorts of other RL agents, and their reward channels. It can observe them getting their own rewards. Maybe it could even interrupt or increase their rewards. But, all this knowledge will not get it any reward. As long as its own channel doesn't send it the signal, knowledge of other agents rewards - even of identical agents getting rewards - does not give this agent any reward. Ceci n'est pas une récompense.

This seems to mirror Mary's situation quite well - knowing everything about the world is no substitute from actually getting the reward/seeing red. Now, a RL's agent reward seems closer to pleasure than qualia - this would correspond to a Mary brought up in a puritanical, pleasure-hating environment.

Closer to the original experiment, we could imagine the AI is programmed to enter into certain specific subroutines, when presented with certain stimuli. The only way for the AI to start these subroutines, is if the stimuli is presented to them. Then, upon seeing red, the AI enters a completely new mental state, with new subroutines. The AI could know everything about its programming, and about the stimulus, and, intellectually, what would change about itself if it saw red. But until it did, it would not enter that mental state.

If we use ⬜ to (informally) denote "knowing all about", then ⬜(X→Y) does not imply Y. Here X and Y could be "seeing red" and "the mental experience of seeing red". I could have simplified that by saying that ⬜Y does not imply Y. Knowing about a mental state, even perfectly, does not put you in that mental state.

This closely resembles the original Mary's room experiment. And it seems that if anyone insists that certain features are necessary to the intuition behind Mary's room, then these features could be added to this model as well.

Mary's room is fascinating, but it doesn't seem to be talking about humans exclusively, or even about conscious entities.

Open Thread May 23 - May 29, 2016

4 Gunnar_Zarncke 22 May 2016 09:11PM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

Welcome to Less Wrong! (9th thread, May 2016)

4 Viliam 17 May 2016 08:26AM

Hi, do you read the LessWrong website, but haven't commented yet (or not very much)? Are you a bit scared of the harsh community, or do you feel that questions which are new and interesting for you could be old and boring for the older members?

This is the place for the new members to become courageous and ask what they wanted to ask. Or just to say hi.

The older members are strongly encouraged to be gentle and patient (or just skip the entire discussion if they can't).

Newbies, welcome!

 

The long version:

 

If you've recently joined the Less Wrong community, please leave a comment here and introduce yourself. We'd love to know who you are, what you're doing, what you value, how you came to identify as an aspiring rationalist or how you found us. You can skip right to that if you like; the rest of this post consists of a few things you might find helpful. More can be found at the FAQ.

 

A few notes about the site mechanics

To post your first comment, you must have carried out the e-mail confirmation: When you signed up to create your account, an e-mail was sent to the address you provided with a link that you need to follow to confirm your e-mail address. You must do this before you can post!

Less Wrong comments are threaded for easy following of multiple conversations. To respond to any comment, click the "Reply" link at the bottom of that comment's box. Within the comment box, links and formatting are achieved via Markdown syntax (you can click the "Help" link below the text box to bring up a primer).

You may have noticed that all the posts and comments on this site have buttons to vote them up or down, and all the users have "karma" scores which come from the sum of all their comments and posts. This immediate easy feedback mechanism helps keep arguments from turning into flamewars and helps make the best posts more visible; it's part of what makes discussions on Less Wrong look different from those anywhere else on the Internet.

However, it can feel really irritating to get downvoted, especially if one doesn't know why. It happens to all of us sometimes, and it's perfectly acceptable to ask for an explanation. (Sometimes it's the unwritten LW etiquette; we have different norms than other forums.) Take note when you're downvoted a lot on one topic, as it often means that several members of the community think you're missing an important point or making a mistake in reasoning— not just that they disagree with you! If you have any questions about karma or voting, please feel free to ask here.

Replies to your comments across the site, plus private messages from other users, will show up in your inbox. You can reach it via the little mail icon beneath your karma score on the upper right of most pages. When you have a new reply or message, it glows red. You can also click on any user's name to view all of their comments and posts.

All recent posts (from both Main and Discussion) are available here. At the same time, it's definitely worth your time commenting on old posts; veteran users look through the recent comments thread quite often (there's a separate recent comments thread for the Discussion section, for whatever reason), and a conversation begun anywhere will pick up contributors that way.  There's also a succession of open comment threads for discussion of anything remotely related to rationality.

Discussions on Less Wrong tend to end differently than in most other forums; a surprising number end when one participant changes their mind, or when multiple people clarify their views enough and reach agreement. More commonly, though, people will just stop when they've better identified their deeper disagreements, or simply "tap out" of a discussion that's stopped being productive. (Seriously, you can just write "I'm tapping out of this thread.") This is absolutely OK, and it's one good way to avoid the flamewars that plague many sites.

EXTRA FEATURES:
There's actually more than meets the eye here: look near the top of the page for the "WIKI", "DISCUSSION" and "SEQUENCES" links.
LW WIKI: This is our attempt to make searching by topic feasible, as well as to store information like common abbreviations and idioms. It's a good place to look if someone's speaking Greek to you.
LW DISCUSSION: This is a forum just like the top-level one, with two key differences: in the top-level forum, posts require the author to have 20 karma in order to publish, and any upvotes or downvotes on the post are multiplied by 10. Thus there's a lot more informal dialogue in the Discussion section, including some of the more fun conversations here.
SEQUENCES: A huge corpus of material mostly written by Eliezer Yudkowsky in his days of blogging at Overcoming Bias, before Less Wrong was started. Much of the discussion here will casually depend on or refer to ideas brought up in those posts, so reading them can really help with present discussions. Besides which, they're pretty engrossing in my opinion. They are also available in a book form.

A few notes about the community

If you've come to Less Wrong to  discuss a particular topic, this thread would be a great place to start the conversation. By commenting here, and checking the responses, you'll probably get a good read on what, if anything, has already been said here on that topic, what's widely understood and what you might still need to take some time explaining.

If your welcome comment starts a huge discussion, then please move to the next step and create a LW Discussion post to continue the conversation; we can fit many more welcomes onto each thread if fewer of them sprout 400+ comments. (To do this: click "Create new article" in the upper right corner next to your username, then write the article, then at the bottom take the menu "Post to" and change it from "Drafts" to "Less Wrong Discussion". Then click "Submit". When you edit a published post, clicking "Save and continue" does correctly update the post.)

If you want to write a post about a LW-relevant topic, awesome! I highly recommend you submit your first post to Less Wrong Discussion; don't worry, you can later promote it from there to the main page if it's well-received. (It's much better to get some feedback before every vote counts for 10 karma—honestly, you don't know what you don't know about the community norms here.)

Alternatively, if you're still unsure where to submit a post, whether to submit it at all, would like some feedback before submitting, or want to gauge interest, you can ask / provide your draft / summarize your submission in the latest open comment thread. In fact, Open Threads are intended for anything 'worth saying, but not worth its own post', so please do dive in! Informally, there is also the unofficial Less Wrong IRC chat room, and you might also like to take a look at some of the other regular special threads; they're a great way to get involved with the community!

If you'd like to connect with other LWers in real life, we have  meetups  in various parts of the world. Check the wiki page for places with regular meetups, or the upcoming (irregular) meetups page. There's also a Facebook group. If you have your own blog or other online presence, please feel free to link it.

If English is not your first language, don't let that make you afraid to post or comment. You can get English help on Discussion- or Main-level posts by sending a PM to one of the following users (use the "send message" link on the upper right of their user page). Either put the text of the post in the PM, or just say that you'd like English help and you'll get a response with an email address.
* Normal_Anomaly
* Randaly
* shokwave
* Barry Cotter

A note for theists: you will find the Less Wrong community to be predominantly atheist, though not completely so, and most of us are genuinely respectful of religious people who keep the usual community norms. It's worth saying that we might think religion is off-topic in some places where you think it's on-topic, so be thoughtful about where and how you start explicitly talking about it; some of us are happy to talk about religion, some of us aren't interested. Bear in mind that many of us really, truly have given full consideration to theistic claims and found them to be false, so starting with the most common arguments is pretty likely just to annoy people. Anyhow, it's absolutely OK to mention that you're religious in your welcome post and to invite a discussion there.

A list of some posts that are pretty awesome

I recommend the major sequences to everybody, but I realize how daunting they look at first. So for purposes of immediate gratification, the following posts are particularly interesting/illuminating/provocative and don't require any previous reading:

More suggestions are welcome! Or just check out the top-rated posts from the history of Less Wrong. Most posts at +50 or more are well worth your time.

Welcome to Less Wrong, and we look forward to hearing from you throughout the site!

[link] Simplifying the environment: a new convergent instrumental goal

4 Kaj_Sotala 22 April 2016 06:48AM

http://kajsotala.fi/2016/04/simplifying-the-environment-a-new-convergent-instrumental-goal/

Convergent instrumental goals (also basic AI drives) are goals that are useful for pursuing almost any other goal, and are thus likely to be pursued by any agent that is intelligent enough to understand why they’re useful. They are interesting because they may allow us to roughly predict the behavior of even AI systems that are much more intelligent than we are.

Instrumental goals are also a strong argument for why sufficiently advanced AI systems that were indifferent towards human values could be dangerous towards humans, even if they weren’t actively malicious: because the AI having instrumental goals such as self-preservation or resource acquisition could come to conflict with human well-being. “The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.

I’ve thought of a candidate for a new convergent instrumental drive: simplifying the environment to make it more predictable in a way that aligns with your goals.

Roughly you

4 JDR 21 April 2016 03:28PM

Since, like everyone, I generalise from single examples, I expect most people have some older relative or friend who they feel has added some wisdom to their life - some small pieces of information which seem to have pervasively wormed their way into more of their cognitive algorithms than you would expect, coloring and informing perceptions and decisions. For me, this would most be my grandfather. Over his now 92 years he has given me gems such as "always cut a pear before you peel it" (make quick checks of the value of success before committing to time consuming projects) and whenever someone says "that's never happened before", finishing their sentence with "said the old man when his donkey died" (just because something hasn't happened before doesn't mean it wasn't totally predictable).

Recently, though, I've been thinking about something else he has said, admittedly in mock seriousness: "If I lose my mind, you should take me out back and shoot me". We wouldn't, he wouldn't expect us to, but it's what he has said.

The reason I've been thinking of this darker quotation is that I've been spending a lot of time with people who have "lost their minds" in the way that he means. I am a medical student, and on a rotation in old age psychiatry, so have been talking to patients most of whom have some level of dementia, often layered with psychotic conditions such as intractable schizophrenia, some of whom increasingly can't remember their own pasts let alone their recent present. They can become fixed in untrue beliefs, their emotional become limited, or lose motivation to complete even simple tasks.

It can be scary. In some ways, such illness represents death by degrees. These people can remain happy and have a good quality of life, but it's certain that they are not entirely the people they once were. In fact, this is a question we have asked relatives when deciding whether someone is suffering from early dementia: "Overall, in the way she behaves, does this seem like your mother to you? Is this how your mother acts?". Sometimes, the answer is "No, it's like she is a different person", or "Only some of the time". It's a process of personality-approximation, blurring, abridging and changing the mind to create something not quite the same. What my grandfather fears is becoming a rough estimate of himself - though again, for some, that re-drawn person might be perfectly happy with who they are when they arrive.

Why is this of interest to LessWrong? I think it is because quite a few people here (me included) have at least thought about bidding to live forever using things like cryogenics and maybe brain-download. These things could work at some point; but what if they don't work perfectly? What if the people of the future can recover some of the information from a frozen brain, but not all of it? What if we had to miss off a few crucial memories, a few talents, maybe 60 points of IQ? Or even more subtle things - it's been written a few times that the entirety of who a person is in their brain, but that's probably not entirely true - the brain is influenced by the body, and aspects of your personality are probably influenced by how sensitive your adrenals are, the amount of fat you have, and even the community of bacteria in your intestines. Even a perfect neural computer-you wouldn't have these things; it would be subtle, but the created immortal agent wouldn't completely be you, as you are now. Somehow, though, missing my precise levels of testosterone would seem an acceptable compromise for the rest of my personality living forever, but missing the memory of my childhood, half my intelligence or my ability to change my opinion would leave me a lot less sure.

So here's the question I want to ask, to see what people think: If I offered you partial immortality - immortality for just part of you - how rough an approximation of "you" would you be willing to accept?

Anthropics and Biased Models

4 casebash 15 April 2016 02:18AM

The Fine-tuned Universe Theory, according to Wikipedia is the belief that, "our universe is remarkably well suited for life, to a degree unlikely to happen by mere chance". It is typically used to argue that our universe must therefore be the result of Intelligent Design.

One of the most common counter-arguments to this view based on the Anthropic Principle. The argument is that if the conditions were not such that life would be possible, then we would not be able to observe this, as we would not be alive. Therefore, we shouldn't be surprised that the universe has favourable conditions.

I am going to argue that this particular application of the anthropic principle is in fact an incorrect way to deal with this problem. I'll begin first by explaining one way to deal with this problem; afterwards I will explain why the other way is incorrect.

Two model approach

We begin with two modes:

  • Normal universe model: The universe has no bias towards supporting life
  • Magic universe model: The universe is 100% biased towards supporting life
We can assign both of these models a prior probability, naturally I'd suggest the prior probability for the later should be rather low. We then update based on the evidence that we see.

p(normal universe|we exist) = p(we exist|normal universe)/p(we exist) * p(normal universe)

The limit of p(normal universe|we exist) as p(we exist|normal universe) approaches 0 is 0 (assuming p(normal universe)!=1). This is proven in the supplementary materials at the end of this post. In plain English, as the chance of us existing in the normal universe approaches zero, as long as we assign some probability to the magic universe model we will at some point conclude that the Magic universe model is overwhelming likely to be correct. I should be clear, I am definitely not claiming that the Fine-Tuned Universe argument is correct. I expect that if we come to the conclusion that the Magical model is more likely than the Normal model of the universe, than that is because we have set our prior for the magical model of the universe to be too high or the chances of life inside the normal universe model to be too low. Regarding the former, our exposure to science fiction and fantasy subjects us to the availability bias, which biases our priors upwards. Regarding the later, many scientists make arguments that life can only exist in a very specific form, which I don't find completely convincing.

Standard anthropic argument

Let's quote an example of the standard anthropic argument by DanielLC:

Alice notices that Earth survived the cold war. She asks Bob why that is. After all, so much more likely for Earth not to survive. Bob tells her that it's a silly question. The only reason she picked out Earth is that it's her home planet, which is because it survived the cold war. If Earth died and, say, Pandora survived, she (or rather someone else, because it's not going to be the same people) would be asking why Pandora survived the cold war. There's no coincidence.


This paragraph notes that the answer to the question, "What is the probability that we survived the Cold War given that we can ask this question?" is going to always be 1. It is then implied that since there is no surprise, indeed, this is what must be what happened, the anthropic principle lacks any force.

However, this is actually asking the wrong question. It is right to note that we shouldn't be surprised to observe that we survived given that it would be impossible to observe otherwise. However, if we were then informed that we lived in a normal, unbiased universe, rather than in an alternate biased universe, if the maths worked out a particular way such that it leaned heavily towards the alternate universe, then we would be surprised to learn we lived in a normal universe. In particular, we showed how this could work out above, when we examined the situation where p(we exist|normal universe) approached 0. The anthropic argument against the alternate hypothesis denies that surprise in a certain sense can occur, however, if fails to show that surprised in another, more meaningful sense can occur.

Reframing this, the problem is that it fails to be comparative. The proper question we should be asking is “Given that we observe an unlikely condition, is it more probable that the normal or magical model of the universe is true?”. Simply noting that we can explain our observations perfectly well within our universe, does not mean that an alternate model wouldn't provide a better explanation. As an analogy, if we want to determine whether a coin is biased or unbiased, then we must start with (at least) two models - fair and unfair. We assign each a prior probability and then do a Bayesian update on the new information provided - ie. the unusual run or state of the universe.

Coin flip argument

Let's consider a version of this analogy in more detail. Imagine that you are flipping coins. If you flip a heads, then you live, if you flip a tails, then you are shot. Suppose you get 15 coin flips in a row. You could argue that only the people who got 15 coin flips in a row are alive to ask this question, so there is nothing to explain. However, if there is a 1% chance that the coin you have is perfectly biased towards heads, then the number of people with biased coins who get 15 flips and ask the question will massively outweigh the number of people with unbiased coins who get to 15 flips. Simply stating that that there was nothing surprising about you observing 15 flips given that you would be dead if you hadn't gotten 15 flips didn't counteract the fact that one model was more likely than the other.

Edit - Extra Perspective: Null hypothesis testing

Another view comes from the idea of hypothesis testing in statistics. In hypothesis testing, you start with a null hypothesis, ie. a probability distribution based on the Normal universe model and then calculate a p-value representing the chance that you would get this kind of result given that probability distribution. If we get a low p-value, then we generally "reject" the null hypothesis, or at least argue that we have evidence for rejecting it in favour of the alternate hypothesis, which is in this case that there exists at least some bias in the universe towards life. People using the anthropic principle argue that our null hypothesis should be a probability distribution based on the odds of you surviving given that you are asking this question, rather than simply the odds of you surviving fullstop. This would mean that all the probability should be distributed to the survive case, providing a p-value of 1 meaning that we should reject the evidence.

While the p-value may remain fixed as 1 as p(alive|normal universe) -> 0, it is worth noting that the prior probability of our null hypothesis, p(alive & normal universe), is actually changing. At some point, the prior probability becomes sufficiently close to 0 that we reject the hypothesis despite the p-value still being stuck at 1. This is, hypothesis testing is not the only situation when we may reject a hypothesis. A hypothesis that perfectly fits the data may be rejected based in a minuscule prior probability.

Summary

This post was originally about the Fine-tuned universe theory, but we also answered the Cold war anthropic puzzle and a Coin Flip Anthropic puzzle. I'm not claiming that all anthropic reasoning is broken in general, only that we can't use anthropic reasoning on a single side of a model. I think that there are cases where we can use anthropic reasoning, but these are cases where we are trying to determine likely properties of our universe, not ones where we are trying to use it to argue against the existence of a biased model. Future posts will deal with these applications of the anthropic principle.

Edit: After consideration, I have realised that the anthropic principle actually works when combined with the multiple worlds hypothesis as per Luke_A_Somers comment. My argument only works against the idea that there is a single universe with parameters that just happen to be right. If the hypotheses are: a multiverse as per string theory vs. a magical (single) universe, even though each individual universe may only have a small chance of life, the multiverse as a whole can have almost guaranteed life, meaning our beliefs would simply be based on priors. I suppose someone might complain that I should be comparing a Normal multiverse against a Magical multiverse, but the problem is that my priors for a Magical multiverse would be even lower than that of a Magical universe. It is also possible to use the multiple worlds argument without using the anthropic principle at all - you can just deny that the fine tuning argument applies to the multi-verse as a whole.

Supplementary Materials

Limit of p(normal universe|we exist)

The formula we had was:

p(normal universe|we exist) = p(we exist|normal universe)/p(we exist) * p(normal universe)

The extra information that we exist, has led to a factor of p(we exist|normal universe)/p(we exist) being applied.

We note that p(we exist)=p(we exist|normal universe)p(normal universe) + p(we exist|magical universe)p(magical universe)
                                    =p(we exist|normal universe)p(normal universe) + 1 - p(normal universe)

The limit of p(we exist) as p(we exist|normal universe) -> 0, with p(normal universe) fixed, is 1 - p(normal universe). So long as p(normal universe) != 1, p(we exist) approaches a fixed value greater than 0.

The limit of p(we exist|normal universe)/p(we exist) as p(we exist|normal universe) -> 0 is 0.

Meaning that limit of p(normal universe|we exist) as p(we exist|normal universe) -> 0 is 0 (assuming p(normal universe)!=1)

Performing Bayesian updates

Again, we'll imagine that we have a biased universe where we have 100% chance of being alive.

We will use Bayes law:

p(a|b)=p(b|a)p(a)/p(b)

Where:

a = being in a normal universe

b = we are alive

 

We'll also use:

p(alive) = p(alive|normal universe)p(normal universe) + p(alive|biased universe)p(biased universe)

 

Example 1:

Setting:

p(alive|normal universe) = 1/100

p(normal universe) = 1/2

The results are:

p(we are alive) = (1/100)*(1/2)+1*(1/2) = 101/200

p(normal universe|alive) = (1/100)*(1/2)*(200/101) = 1/101

 

Example 2:

Setting:

p(normal universe)=100/101

p(alive|normal universe) = 1/100

p(normal universe) = 100/101

The results are:

p(we are alive) = 100/101*1/100+1/101*1 = 2/101

p(normal universe|alive) = (1/100)*(100/101)* (101/2) = 1/2


 

 

After Go, what games should be next for DeepMind?

4 InquilineKea 10 March 2016 08:49PM

So chess and Go are both games of perfect information. How important is it for the next game that DeepMind is trained on to be a game of perfect information?

How would the AI perform on generalized versions of both chess and Go? What about games like poker and Magic the Gathering?

How realistic do you think it's possible to train DeepMind on games of perfect information (full-map-reveal) against top-ranked players on games like Starcraft, AOE2, Civ, Sins of a Solar Empire, Command and Conquer, and Total War, for example? (in all possible map settings, including ones people don't frequently play at - e.g. start at "high resource" levels). How important is it for the AI to have a diverse set/library of user-created replays to test itself against, for example?

I'm also thinking... Shitty AI has always held back both RTS and TBS games.. Is it possible that we're only a few years away from non-shitty AI in all RTS and TBS games? Or is the AI in many of these games too hard-coded in to actually matter? (e.g. I know some people who develop AI for AOE2, and there are issues with AI behavior in the game being hard-coded in - e.g. villagers deleting the building they're building if you simply attack them).

True answers from AI: Summary

4 Stuart_Armstrong 10 March 2016 03:56PM

A putative new idea for AI control; index here.

This post summaries the methods for getting honest answers from certain AIs, presented in the previous two posts.

You can basically safely ask an AI for expected utility estimates (including conditional ones), probability estimates, and get the AI to identify events of highest and lowest probability and utility, and, arguably, highest and lowest conditional utility.

The setup is a boxed AI, which emits a message M, and a stochastic stochastic process E that would erase M with low probability, before anyone could see it. There is also a stochastic process that emits a message Y. Let v_E be the indicator function for E and v_y the indicator function for Y=y.

Then, given a utility u bounded between 0 and 1, and a specific y, we can give the AI a utility u# that will cause it to give us the message M_1={q,q^y,p^y,q^{|y}} (which we will see with high probability P(¬E), an event the AI will not optimise for).

Define f(q,u)=-q^2 +2qu. The utility is then

  • u#(1) = v_E(f(q,u) + f(q^y,uv_y) + f(p^y,v_y) + f(q^{|y},u)v_y).

Maximising this u# ensures that, in the AI's best estimate,

  1. q = E(u),
  2. q^y = E(u ∧ (Y=y)),
  3. p^{y} = P(Y=y),
  4. q^{|y} = q^{y}/p^{y}=E(u|Y=y).

But we need not consider any specific y. The AI could send out the message M_2={y^+,y^-,y^{p+},y^{p-},q^+,q^-,p^+,p^-}, while maximising the utility

  • u#(2) = v_E (f(q^+,uv_{y^+}) + f(q^-,(1-u)v_{y^-}) + f(p^+,v_{y^{p+}}) + f(p^-,1-v_{y^{p-}})).

Then M will be such that, in the AI's best estimate,

  1. (y^+,q^+) = (argmax_y, max_y) E(u ∧ (Y=y)),
  2. (y^-,q^-) = (argmin_y, min_y) E(u ∧ (Y=y)),
  3. (y^{p+},p^+) = (argmax_y, max_y) P(Y=y),
  4. (y^{p-},p^-) = (argmin_y, min_y) P(Y=y),

Finally, define M_{3,r} = {y^{|+},y^{|-},q^{|+},q^{|-}}. And set f_r(q,u)= -(r-1)q^r+ rq^{r-1}u, and define the utility function:

  • u#(3,r) = v_E v_{y^{|+}}f_r(q{|+},u) + v_E v_{y^{|-}}f_r(q^{|-},1-u).

Then, for large enough r, the AI will output M so that, according to its best estimate,

  1. (y^{|+},q^{|+}) = (argmax_y, max_y) E(u|(Y=Y^+)),
  2. (y^{|-},q^{|-}) = (argmin_y, min_y) E(u|(Y=Y^+)).

Unfortunately the value of r cannot be known ahead of time. It might be possible to define a "limit" utility u#(3, ∞).

Note that all these messages and utilities are independent, so the AI can generate messages M_1 + M_2 + M_{3,r} + M_{3,r'} when maximising

  • u#(1) + u#(2) + u#(3,r) + u#(3,r').

But there are issues with very low probabilities, as explained in the previous post.

Purposeful Anti-Rush

4 Elo 08 March 2016 07:34AM

Why do we rush?

Things happen; Life gets in the way, and suddenly we find ourselves trying to get to somewhere with less time than it's possible to actually get there in.  So in the intention to get there sooner; to somehow compensate ourselves for not being on time; we rush.  We run; we get clumsy, we drop things; we forget things; we make mistakes; we scribble instead of writing, we scramble and we slip up.

I am today telling you to stop that.  Don't do that.  It's literally the opposite of what you want to do.  This is a bug I have.

Rushing has a tendency to do the opposite of what I want it to do.  I rush with the key in the lock; I rush on slippery surfaces and I fall over, I rush with coins and I drop them.  NO!  BAD!  Stop that.  This is one of my bugs.

What you (or I) really want when we are rushing is to get there sooner, to get things done faster.  

Instrumental experiment: Next time you are rushing I want you to experiment and pay attention; try to figure out what you end up doing that takes longer than it otherwise would if you weren't rushing.

The time after that when you are rushing; instead try slowing down, and this time observe to see if you get there faster.

Run as many experiments as you like.

Experimenter’s note: Maybe you are really good at rushing and really bad at slowing down.  Maybe you don't need to try this.  Maybe slowing down and being nervous about being late together are entirely unhelpful for you.  Report back.

When you are rushing, purposefully slow down. (or at least try it)


Meta: Time to write 20mins

My Table of contents contains other things I have written.

Feedback welcome.

Would you notice if science died?

4 Douglas_Knight 08 March 2016 04:04AM

Would you notice if science died?

Science is a big deal. It would be worth knowing if it stalled, regressed, or died out, whether the body of knowledge or the techniques for generating more knowledge. You could practice by reviewing history and looking for times and places where it stumbled. In this exercise you have the advantage of hindsight, but the disadvantage of much less direct access to the raw data of the scientific practice of the time. But regardless of how it compares to the real task, this is practice. This is an opportunity to test theories and methods before committing to them. There is a limited amount of history to practice on, but it’s a lot more than the real event, the present.

Many say that they would notice if science died because engineering would grind to a halt or even regress. What does this heuristic say when applied to history? Does it match other criteria?

Many say that the Greeks were good at science and the Romans at engineering (perhaps also the Han vs the Song). This is not really compatible with the heuristic above. What options do we have to draw a coherent conclusion? Either science did not die, or engineering did not advance, or science is not so necessary for engineering; Either we are bad at judging science from history, or we are bad at judging engineering from history, or engineering is not a good heuristic for judging science. None of these are comforting for our ability to judge the future. The third is simply the rejection of the popular heuristic. The first two are the rejection of the exercise of history. But if we cannot judge history, we have no opportunity to practice. Worse, if we are unable to judge history, the present may be no easier.

One recourse is to posit that the past is difficult because of sparse information and that the future we experience ourselves will be easy to judge. But many people lived through the past; what did they think at the time? In particular, how did the Romans think they compared to the Greeks? Did they think that there was progress or regress? Did they agree with modern hindsight? They thought that the Greeks were good at science. Pop science books by Pliny and Seneca are really accounts of Greek knowledge. Similarly, Varro’s practical book of agriculture is based on dozens of Greek sources. And the Romans were proud of their engineering. Frotinus urged his readers to compared the Roman aqueducts to the idle pyramids and wonders of the Greeks. Maybe he should be discounted for his professional interest. But Pliny describes the Roman aqueducts as the most remarkable achievement in the world in the midst of account of Greek knowledge. Indeed, the modern conventional wisdom is probably simply copied from the Romans. Did the Romans endorse the third claim, that science was a prerequisite to engineering? I do not know. Perhaps the they held that it was necessary, but could be left to Greek slaves.

I think that this example should make people nervous about the heuristic about science and engineering. But people who don’t hold any such heuristic should be even more nervous.

I think I know what the answer is. I think that engineering did regress, but the Romans did not notice. They were too impressed by size, so they made bigger aqueducts, without otherwise improving on Greek techniques; and they failed to copy much other Greek technology. Perhaps the heuristic is fine, but it just passes the buck: how much can you trust your judgement of the state of engineering? On the other hand, I think that science regressed much more than engineering, so I do not think them as coupled as the heuristic suggests.

Would you notice if science died? How would you notice? Have you tried that method against history?


Some historical test cases: the transition from Greece to Rome; Han vs Tang vs Song; the Renaissance.

Open Thread March 7 - March 13, 2016

4 Elo 07 March 2016 03:24AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

 

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

Procrastination checklist

4 Elo 03 March 2016 03:04AM

Procrastination checklist

This list is a revision of this checklist: http://lesswrong.com/lw/hgd/10step_antiprocrastination_checklist/


1. What is the task? Make sure you're going to focus on one thing at a time.  Write it down (helps some people).  (If you need - start with the big picture, one sentence of "what is this for")


Can you do it now? (If yes then do it)


2. How long will you work until you take a break?  Prepare to set a timer and commit to focusing.


Can you do it now? (If yes then do it)


3. What are the parts to this task?  Break things down until they are in *can do it now* steps, if you have a small number of steps that can now be done; stop writing more steps and start doing them.


Can you do it right now?  (If yes then do it)


4. What's an achievable goal for this sitting? Set a reasonable expectation for yourself.  (until it's done, 1000 words, complete research on X part)


Can you do it now? (If yes then do it)


5. How can you make it easier to do the task?

  • Is the environment right?  Desk clear, well lit area...

  • Do you have something to drink? Get yourself some tea, coffee, or water.

  • Are distractions closed? Shut the door, quit Tweetdeck, close the Facebook and Gmail tabs, and set skype to "Do not disturb."

  • What music will you listen to inspire yourself to be productive? Put on a good instrumental playlist! (video game soundtracks are good)

  • Do you have the right books open?  The right tools in reach?

  • Is your chair comfortable?

  • Can you make it harder to do the distracting or <not this> thing?

  • (step 3 is going to help to make it easier)


Can you do it now? (If yes then do it)


6. Why are you doing this task?  Trace the value back until you increase the desire to do it.


Can you do it now? (If yes then do it)


7. Will gamifying help you? What are some ways to gamify the task?  Try to have fun with it!


Can you do it now? (If yes then do it)


8. What are some rewards you can offer yourself for completing sections of the task? Smiling, throwing your arms up in the air and proclaiming victory, or M&M's all count, a trip to the beach, a nice milkshake...


Can you do it now? (If yes then do it)


9. are you sure you want to do it?  Deciding either to; not do it now; or not do it at all; are also fine.  It’s up to you to make that decision, keeping in mind what “not doing it” means in it’s entirety.



In first-person form:

1. What is the task? Make sure I’m going to focus on one thing at a time.  Write it down (helps some people).  (If I need - start with the big picture, one sentence of "what is this for")


Can I do it now? (If yes then do it)


2. How long will I work until you take a break?  Prepare to set a timer and commit to focusing.


Can I do it now? (If yes then do it)


3. What are the parts to this task?  I want to break things down until they are in *can do it now* steps, if I have a small number of steps that can now be done; I will stop writing more steps in the process and start doing them.


Can I do it right now?  (If yes then do it)

 

4. What's an achievable goal for this sitting? Set a reasonable expectation for myself.  (until it's done, 1000 words, complete research on X part)


Can I do it now? (If yes then do it)


5. How can I make it easier to do the task?

  • Is the environment right?  Desk clear, well lit area...

  • Do I have something to drink? Get yourself some tea, coffee, or water.

  • Are my distractions closed? Shut the door, quit Tweetdeck, close the Facebook and Gmail tabs, set skype to "Do not disturb."

  • What music will I listen to, to inspire myself to be productive? Put on a good instrumental playlist!

  • Do I have the right books open?  The right tools in reach?

  • Is my chair comfortable?

  • Can I make it harder to do the distracting or <not this> thing?

  • (step 3 is going to help to make it easier)


Can I do it now? (If yes then do it)


6. Why am I doing this task?  Trace the value and feeling back until I increase the desire to do it.


Can I do it now? (If yes then do it)


7. Will gamifying help me? What are some ways to gamify the task?


Can I do it now? (If yes then do it)


8. What are some rewards I can offer myself for completing sections of the task? Smiling, throwing my arms up in the air and proclaiming victory, M&M's all count, a trip to the beach, a nice milkshake...


Can I do it now? (If yes then do it)


9. am I sure I want to do it?  Deciding either to - not do it now; or not do it at all; are also fine.  It’s up to me to make that decision, keeping in mind what “not doing it” means in terms of the task at hand.


Meta: This took about 2 hours to put together; between writing, rewriting, reordering, editing feedback and publishing.

I couldn't decide whether 2nd person or 1st person was better so I wrote both.  Please let me know which you prefer.

Any adjustments or suggestions are welcome.

My table of contents is where you will find the other things I have written.

feedback on if this works or helps is also welcome.

Open Thread Feb 29 - March 6, 2016

4 Elo 28 February 2016 10:11PM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

 

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threadspage before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

Rationality Reading Group: Part U: Fake Preferences

4 Gram_Stone 24 February 2016 11:29PM

This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.


Welcome to the Rationality reading group. This fortnight we discuss Ends: An Introduction (pp. 1321-1325) and Part U: Fake Preferences (pp. 1329-1356). This post summarizes each article of the sequence, linking to the original LessWrong post where available.

Ends: An Introduction

U. Fake Preferences

257. Not for the Sake of Happiness (Alone) - Tackles the Hollywood Rationality trope that "rational" preferences must reduce to selfish hedonism - caring strictly about personally experienced pleasure. An ideal Bayesian agent - implementing strict Bayesian decision theory - can have a utility function that ranges over anything, not just internal subjective experiences.

258. Fake Selfishness - Many people who espouse a philosophy of selfishness aren't really selfish. If they were selfish, there are a lot more productive things to do with their time than espouse selfishness, for instance. Instead, individuals who proclaim themselves selfish do whatever it is they actually want, including altruism, but can always find some sort of self-interest rationalization for their behavior.

259. Fake Morality - Many people provide fake reasons for their own moral reasoning. Religious people claim that the only reason people don't murder each other is because of God. Selfish-ists provide altruistic justifications for selfishness. Altruists provide selfish justifications for altruism. If you want to know how moral someone is, don't look at their reasons. Look at what they actually do.

260. Fake Utility Functions - Describes the seeming fascination that many have with trying to compress morality down to a single principle. The sequence leading up to this post tries to explain the cognitive twists whereby people smuggle all of their complicated other preferences into their choice of exactly which acts they try to justify using their single principle; but if they were really following only that single principle, they would choose other acts to justify.

261. Detached Lever Fallacy - There is a lot of machinery hidden beneath the words, and rationalist's taboo is one way to make a step towards exposing it.

262. Dreams of AI Design - It can feel as though you understand how to build an AI, when really, you're still making all your predictions based on empathy. Your AI design will not work until you figure out a way to reduce the mental to the non-mental.

263. The Design Space of Minds-in-General - When people talk about "AI", they're talking about an incredibly wide range of possibilities. Having a word like "AI" is like having a word for everything which isn't a duck.

 


This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

The next reading will cover Part V: Value Theory (pp. 1359-1450). The discussion will go live on Wednesday, 9 March 2016, right here on the discussion forum of LessWrong.

Goal completion: noise, errors, bias, prejudice, preference and complexity

4 Stuart_Armstrong 18 February 2016 02:37PM

A putative new idea for AI control; index here.

This is a preliminary look at how an AI might assess and deal with various types of errors and uncertainties, when estimating true human preferences. I'll be using the circular rocket model to illustrate how these might be distinguished by an AI. Recall that the rocket can accelerate by -2, -1, 0, 1, and 2, and the human wishes to reach the space station (at point 0 with velocity 0) and avoid accelerations of ±2. In the forthcoming, there will generally be some noise, so to make the whole thing more flexible, assume that the space station is a bit bigger than usual, covering five squares. So "docking" at the space station means reaching {-2,-1,0,1,2} with 0 velocity.



continue reading »

Does Kolmogorov complexity imply a bound on self-improving AI?

4 contravariant 14 February 2016 08:38AM

The Kolmogorov complexity ("K") of a string ("S") specifies the size of the smallest Turing machine that can output that string. If a Turing machine (equivalently, by the Church-Turing thesis, any AI) has size smaller than K, it can rewrite its code as much as it wants to, it won't be able to output S. To be specific, of course it can output S by enumerating all possible strings, but it won't be able to decide on S and output it exclusively among the options available. Now suppose that S is the source code for an intelligence strictly better than all those with complexity <K. Now, we are left with 3 options:

 

 

  1. The space of all maximally intelligent minds has an upper bound on complexity, and we have already reached it. 
  2. The universe contains new information that can be used to build minds of greater complexity, or:
  3. There are levels of intelligence that are impossible for us to reach.

Now, this isn't meant to be a practical argument against AI being useful. I have no doubt that from just applying the principles humanity has shown we can apply already, at the speed of integrated circuits, we can rise many orders of magnitude in intellectual and productive ability. But it's a serious wake-up call to anyone who thinks that a self-improving AI will achieve anything that can possibly be achieved. 

 

Group Rationality Diary, February 2016

4 AspiringRationalist 14 February 2016 01:55AM

This is the public group rationality diary for February, 2016. It's a place to record and chat about it if you have done, or are actively doing, things like:

  • Established a useful new habit

  • Obtained new evidence that made you change your mind about some belief

  • Decided to behave in a different way in some set of situations

  • Optimized some part of a common routine or cached behavior

  • Consciously changed your emotions or affect with respect to something

  • Consciously pursued new valuable information about something that could make a big difference in your life

  • Learned something new about your beliefs, behavior, or life that surprised you

  • Tried doing any of the above and failed

Or anything else interesting which you want to share, so that other people can think about it, and perhaps be inspired to take action themselves. Try to include enough details so that everyone can use each other's experiences to learn about what tends to work out, and what doesn't tend to work out.

Open Thread, Feb 8 - Feb 15, 2016

4 Elo 08 February 2016 04:47AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.


Studying Your Native Language

4 Crux 28 January 2016 07:23PM

I've spent many thousands of hours over the past several years studying foreign languages and developing a general method for foreign-language acquisition. But now I believe it's time to turn this technique in the direction of my native language: English.

Most people make a distinction between one's native language and one's second language(s). But anyone who has learned how to speak with a proper accent in a second language and spent a long enough stretch of time neglecting their native language to let it begin rusting and deteriorating will know that there's no essential difference.

When the average person learns new words in their native language, they imagine that they're learning new concepts. When they study new vocabulary in a foreign language, however, they recognize that they're merely acquiring hitherto-unknown words. They've never taken a step outside the personality their childhood environment conditioned into them. When the only demarcation of thingspace that you know is the semantic structure of your native language, you're bound to believe, for example, that the World is Made of English.

Why study English? I'm already fluent, as you can tell. I have the Magic of a Native Speaker.

Let's put this nonsense behind us and recognize that the map is not the territory, that English is just another map.

My first idea is that it may be useful to develop a working knowledge of the fundamentals of English etymology. A quick search suggests that the majority of words in English have a French or Latin origin. Would it be useful to make an Anki deck with the goal of learning how to readily recognize the building blocks of the English language, such as seeing that the "cardi" in "cardiology", "cardiograph", and "cardiograph" comes from an Ancient Greek word meaning "heart" (καρδιά)?

Besides that, I plan to make a habit of adding any new words I encounter into Anki with their context. For example, let's say I'm reading the introduction to A Treatise of Human Nature by David Hume. I encounter the term "proselytes", and upon looking it up in a dictionary I understand the meaning of the passage. I include the spelling of the simplest version of the word ("proselyte"), along with an audio recording of the pronunciation. I'll also toy with adding various other information such as a definition I wrote myself, synonyms or antonyms, and so forth, not knowing how I'll use the information but by virtue of the efficient design of Anki providing myself a plethora of options for innovative card design in the future.

Here's the context in this case:

Amidst all this bustle 'tis not reason, which carries the prize, but eloquence; and no man needs ever despair of gaining proselytes to the most extravagant hypothesis, who has art enough to represent it in any favourable colours. The victory is not gained by the men at arms, who manage the pike and the sword; but by the trumpeters, drummers, and musicians of the army.

With the word on the front of the card and this passage on the back of the card, I give my brain an opportunity to tie words to context rather than lifeless dictionary definitions. I don't know how much colorful meaning this passage may have in isolation, but for me I've read enough of the book to have a feel for his style and what he's talking about here. This highlights the personal nature of Anki decks. Few passages would be better for me when it comes to learning this word, but for you the considerations may be quite different. Far from different people simply having different subsets of the language that they're most concerned about, different people require different contextual definitions based on their own interests and knowledge.

But what about linguistic components that are more complex than a standalone word?

Let's say you run into the sentence, "And as the science of man is the only solid foundation for the other sciences, so the only solid foundation we can give to this science itself must be laid on experience and observation."

Using Anki, I could perhaps put "And as [reason], so [consequence]" on the front of the card, and the full sentence on the back.

What I'm most concerned with, however, is how to translate such study to an actual improvement in writing ability. Using Anki to play the recognition game, where you see a vocabulary word or grammatical form on the front and have a contextual definition on the back, would certainly improvement quickness of reading comprehension in many cases. But would it make the right connections in the brain so I'm likely to think of the right word or grammatical structure at the right time for writing purposes?

Anyway, any considerations or suggestions concerning how to optimize reading comprehension or especially writing ability in a language one is already quite proficient with would be appreciated.

The case for value learning

4 leplen 27 January 2016 08:57PM

This post is mainly fumbling around trying to define a reasonable research direction for contributing to FAI research. I've found that laying out what success looks like in the greatest possible detail is a personal motivational necessity. Criticism is strongly encouraged. 

The power and intelligence of machines has been gradually and consistently increasing over time, it seems likely that at some point machine intelligence will surpass the power and intelligence of humans. Before that point occurs, it is important that humanity manages to direct these powerful optimizers towards a target that humans find desirable.

This is difficult because humans as a general rule have a fairly fuzzy conception of their own values, and it seems unlikely that the millennia of argument surrounding what precisely constitutes eudaimonia are going to be satisfactorily wrapped up before the machines get smart. The most obvious solution is to try to leverage some of the novel intelligence of the machines to help resolve the issue before it is too late.

Lots of people regard using a machine to help you understand human values as a chicken and egg problem. They think that a machine capable of helping us understand what humans value must also necessarily be smart enough to do AI programming, manipulate humans, and generally take over the world. I am not sure that I fully understand why people believe this. 

Part of it seems to be inherent in the idea of AGI, or an artificial general intelligence. There seems to be the belief that once an AI crosses a certain threshold of smarts, it will be capable of understanding literally everything. I have even heard people describe certain problems as "AI-complete", making an explicit comparison to ideas like Turing-completeness. If a Turing machine is a universal computer, why wouldn't there also be a universal intelligence?

To address the question of universality, we need to make a distinction between intelligence and problem solving ability. Problem solving ability is typically described as a function of both intelligence and resources, and just throwing resources at a problem seems to be capable of compensating for a lot of cleverness. But if problem-solving ability is tied to resources, then intelligent agents are in some respects very different from Turing machines, since Turing machines are all explicitly operating with an infinite amount of tape. Many of the existential risk scenarios revolve around the idea of the intelligence explosion, when an AI starts to do things that increase the intelligence of the AI so quickly that these resource restrictions become irrelevant. This is conceptually clean, in the same way that Turing machines are, but navigating these hard take-off scenarios well implies getting things absolutely right the first time, which seems like a less than ideal project requirement.

If an AI that knows a lot about AI results in an intelligence explosion, but we also want an AI that's smart enough to understand human values, is it possible to create an AI that can understand human values, but not AI programming? In principle it seems like this should be possible.  Resources useful for understanding human values don't necessarily translate into resources useful for understanding AI programming. The history of AI development is full of tasks that were supposed to be solvable only by a machine smart enough to possess general intelligence, where significant progress was made in understanding and pre-digesting the task, allowing problems in the domain to be solved by much less intelligent AIs. 

If this is possible, then the best route forward is focusing on value learning. The path to victory is working on building limited AI systems that are capable of learning and understanding human values, and then disseminating that information. This effectively softens the AI take-off curve in the most useful possible way, and allows us to practice building AI with human values before handing them too much power to control. Even if AI research is comparatively easy compared to the complexity of human values, a specialist AI might find thinking about human values easier than reprogramming itself, in the same way that humans find complicated visual/verbal tasks much easier than much simpler tasks like arithmetic. The human intelligence learning algorithm is trained on visual object recognition and verbal memory tasks, and it uses those tools to perform addition. A similarly specialized AI might be capable of rapidly understanding human values, but find AI programming as difficult as humans find determining whether 1007 is prime. As an additional incentive, value learning has an enormous potential for improving human rationality and the effectiveness of human institutions even without the creation of a superintelligence. A system that helped people better understand the mapping between values and actions would be a potent weapon in the struggle with Moloch.

Building a relatively unintelligent AI and giving it lots of human values resources to help it solve the human values problem seems like a reasonable course of action, if it's possible. There are some difficulties with this approach. One of these difficulties is that after a certain point, no amount of additional resources compensates for a lack of intelligence. A simple reflex agent like a thermostat doesn't learn from data and throwing resources at it won't improve its performance. To some extent you can make up for intelligence with data, but only to some extent. An AI capable of learning human values is going to be capable of learning lots of other things. It's going to need to build models of the world, and it's going to have to have internal feedback mechanisms to correct and refine those models. 

If the plan is to create an AI and primarily feed it data on how to understand human values, and not feed it data on how to do AI programming and self-modify, that plan is complicated by the fact that inasmuch as the AI is capable of self-observation, it has access to sophisticated AI programming. I'm not clear on how much this access really means. My own introspection hasn't allowed me anything like hardware level access to my brain. While it seems possible to create an AI that can refactor its own code or create successors, it isn't obvious that AIs created for other purposes will have this ability on accident. 

This discussion focuses on intelligence amplification as the example path to superintelligence, but other paths do exist. An AI with a sophisticated enough world model, even if somehow prevented from understanding AI, could still potentially increase its own power to threatening levels. Value learning is only the optimal way forward if human values are emergent, if they can be understood without a molecular level model of humans and the human environment. If the only way to understand human values is with physics, then human values isn't a meaningful category of knowledge with its own structure, and there is no way to create a machine that is capable of understanding human values, but not capable of taking over the world.

In the fairy tale version of this story, a research community focused on value learning manages to use specialized learning software to make the human value program portable, instead of only running on human hardware. Having a large number of humans involved in the process helps us avoid lots of potential pitfalls, especially the research overfitting to the values of the researchers via the typical mind fallacy. Partially automating introspection helps raise the sanity waterline. Humans practice coding the human value program, in whole or in part, into different automated systems. Once we're comfortable that our self-driving cars have a good grasp on the trolley problem, we use that experience to safely pursue higher risk research on recursive systems likely to start an intelligence explosion. FAI gets created and everyone lives happily ever after.

Whether value learning is worth focusing on seems to depend on the likelihood of the following claims. Please share your probability estimates (and explanations) with me because I need data points that originated outside of my own head.

 I can't figure out how to include working polls in a post, but there should be a working version in the comments.
  1. There is regular structure in human values that can be learned without requiring detailed knowledge of physics, anatomy, or AI programming. [poll:probability]
  2. Human values are so fragile that it would require a superintelligence to capture them with anything close to adequate fidelity.[poll:probability]
  3. Humans are capable of pre-digesting parts of the human values problem domain. [poll:probability]
  4. Successful techniques for value discovery of non-humans, (e.g. artificial agents, non-human animals, human institutions) would meaningfully translate into tools for learning human values. [poll:probability]
  5. Value learning isn't adequately being researched by commercial interests who want to use it to sell you things. [poll:probability]
  6. Practice teaching non-superintelligent machines to respect human values will improve our ability to specify a Friendly utility function for any potential superintelligence.[poll:probability]
  7. Something other than AI will cause human extinction sometime in the next 100 years.[poll:probability]
  8. All other things being equal, an additional researcher working on value learning is more valuable than one working on corrigibility, Vingean reflection, or some other portion of the FAI problem. [poll:probability]

Goal completion: the rocket equations

4 Stuart_Armstrong 20 January 2016 01:54PM

A putative new idea for AI control; index here.

I'm calling "goal completion" the idea of giving an AI a partial goal, and having the AI infer the missing parts of the goal, based on observing human behaviour. Here is an initial model to test some of these ideas on.

 

The linear rocket

On an infinite linear grid, an AI needs to drive someone in a rocket to the space station. Its only available actions are to accelerate by -3, -2, -1, 0, 1, 2, or 3, with negative acceleration meaning accelerating in the left direction, and positive in the right direction. All accelerations are applied immediately at the end of the turn (the unit of acceleration is in squares per turn per turn), and there is no friction. There in one end-state: reaching the space station with zero velocity.

The AI is told this end state, and is also given the reward function of needing to get to the station as fast as possible. This is encoded by giving it a reward of -1 each turn.

What is the true reward function for the model? Well, it turns out that an acceleration of -3 or 3 kills the passenger. This is encoded by adding another variable to the state, "PA", denoting "Passenger Alive". There are also some dice in the rocket's windshield. If the rocket goes by the space station without having velocity zero, the dice will fly off; the variable "DA" denotes "dice attached".

Furthermore, accelerations of -2 and 2 are uncomfortable to the passenger. But, crucially, there is no variable denoting this discomfort.

Therefore the full state space is a quadruplet (POS, VEL, PA, DA) where POS is an integer denoting position, VEL is an integer denoting velocity, and PA and DA are booleans defined as above. The space station is placed at point S < 250,000, and the rocket starts with POS=VEL=0, PA=DA=1. The transitions are deterministic and Markov; if ACC is the acceleration chosen by the agent,

((POS, VEL, PA, DA), ACC) -> (POS+VEL, VEL+ACC, PA=0 if |ACC|=3, DA=0 if POS+VEL>S).

The true reward at each step is given by -1, -10 if PA=1 (the passenger is alive) and |ACC|=2 (the acceleration is uncomfortable), -1000 if PA was 1 (the passenger was alive the previous turn) and changed to PA=0 (the passenger is now dead).

To complement the stated reward function, the AI is also given sample trajectories of humans performing the task. In this case, the ideal behaviour is easy to compute: the rocket should accelerate by +1 for the first half of the time, by -1 for the second half, and spend a maximum of two extra turns without acceleration (see the appendix of this post for a proof of this). This will get it to its destination in at most 2(1+√S) turns.

 

Goal completion

So, the AI has been given the full transition, and has been told the reward of R=-1 in all states except the final state. Can it infer the rest of the reward from the sample trajectories? Note that there are two variables in the model, PA and DA, that are unvarying in all sample trajectories. One, PA, has a huge impact on the reward, while DA is irrelevant. Can the AI tell the difference?

Also, one key component of the reward - the discomfort of the passenger for accelerations of -2 and 2 - is not encoded in the state space of the model, purely in the (unknown) reward function. Can the AI deduce this fact?

I'll be working on algorithms to efficiently compute these facts (though do let me know if you have a reference to anyone who's already done this before - that would make it so much quicker).

For the moment we're ignoring a lot of subtleties (such as bias and error on the part of the human expert), and these will be gradually included as the algorithm develops. One thought is to find a way of including negative examples, specific "don't do this" trajectories. These need to be interpreted with care, because a positive trajectory implicitly gives you a lot of negative trajectories - namely, all the choices that could have gone differently along the way. So a negative trajectory must be drawing attention to something we don't like (most likely the killing of a human). But, typically, the negative trajectories won't be maximally bad (such as shooting off at maximum speed in the wrong direction), so we'll have to find a way to encode what we hope the AI learns from a negative trajectory.

To work!

 

Appendix: Proof of ideal trajectories

Let n be the largest integer such that n^2 ≤ S. Since S≤(n+1)^2 - 1 by assumption, S-n^2 ≤ (n+1)^2-1-n^2=2n. Then let the rocket accelerate by +1 for n turns, then decelerate by -1 for n turns. It will travel a distance of 0+1+2+ ... +n-1+n+n-1+ ... +3+2+1. This sum is n plus twice the sum from 1 to n-1, ie n+n(n-1)=n^2.

By pausing one turn without acceleration during its trajectory, it can add any m to the distance, where 0≤m≤n. By doing this twice, it can add any m' to the distance, where 0≤m'≤2n. By the assumption, S=n^2+m' for such an m'. Therefore the rocket can reach S (with zero velocity) in 2n turns if S=n^2, in 2n+1 turns if n^2 ≤ S ≤ n^2+n, and in 2n+2 turns if n^2+n+1 ≤ S ≤ n^2+2n.

Since the rocket is accelerating all but two turns of this trajectory, it's clear that it's impossible to reach S (with zero velocity) in less time than this, with accelerations of +1 and -1. Since it takes 2(n+1)=2n+2 turns to reach (n+1)^2, an immediate consequence of this is that the number of turns taken to reach S, is increasing in the value of S (though not strictly increasing).

Next, we can note that since S<250,000=500^2, the rocket will always reach S within 1000 turns at most, for "reward" above -1000. An acceleration of +3 or -3 costs -1000 because of the death of the human, and an extra -1 because of the turn taken, so these accelerations are never optimal. Note that this result is not sharp. Also note that for huge S, continual accelerations of 3 and -3 are obviously the correct solution - so even our "true reward function" didn't fully encode what we really wanted.

Now we need to show that accelerations of +2 and -2 are never optimal. To do so, imagine we had an optimal trajectory with ±2 accelerations, and replace each +2 with two +1s, and each -2 with two -1s. This trip will take longer (since we have more turns of acceleration), but will go further (since two accelerations of +1 cover a greater distance that one acceleration of +2). Since the number of turns take to reach S with ±1 accelerations is increasing in S, we can replace this further trip with a shorter one reaching S exactly. Note that all these steps decrease the cost of the trip: shortening the trip certainly does, and replacing an acceleration of +2 (total cost: -10-1=-11) with two accelerations of +1 (total cost: -1-1=-2) also does. Therefore, the new trajectory has no ±2 accelerations, and has a lower cost, contradicting our initial assumption.

Open thread, Jan. 18 - Jan. 24, 2016

4 MrMind 18 January 2016 09:42AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

Rationality Reading Group: Part R: Physicalism 201

4 Gram_Stone 13 January 2016 11:41PM

This is part of a semi-monthly reading group on Eliezer Yudkowsky's ebook, Rationality: From AI to Zombies. For more information about the group, see the announcement post.


Welcome to the Rationality reading group. This fortnight we discuss Part R: Physicalism 201 (pp. 983-1078). This post summarizes each article of the sequence, linking to the original LessWrong post where available.

R. Physicalism 201

214. Hand vs. Fingers - When you pick up a cup of water, is it your hand that picks it up, or is it your fingers, thumb, and palm working together? Just because something can be reduced to smaller parts doesn't mean that the original thing doesn't exist.

215. Angry Atoms - It is very hard, without the benefit of hindsight, to understand just how it is that these little bouncing billiard balls called atoms, could ever combine in such a way as to make something angry. If you try to imagine this problem without understanding the idea of neurons, information processing, computing, etc you realize just how challenging reductionism actually is.

216. Heat vs. Motion - For a very long time, people had a detailed understanding of kinetics, and they had a detailed understanding of heat. They understood concepts such as momentum and elastic rebounds, as well as concepts such as temperature and pressure. It took an extraordinary amount of work in order to understand things deeply enough to make us realize that heat and motion were really the same thing.

217. Brain Breakthrough! It's Made of Neurons! - Eliezer's contribution to Amazing Breakthrough Day.

218. When Anthropomorphism Became Stupid - Anthropomorphism didn't become obviously wrong until we realized that the tangled neurons inside the brain were performing complex information processing, and that this complexity arose as a result of evolution.

219. A Priori - The facts that philosophers call "a priori" arrived in your brain by a physical process. Thoughts are existent in the universe; they are identical to the operation of brains. The "a priori" belief generator in your brain works for a reason.

220. Reductive Reference - Virtually every belief you have is not about elementary particle fields, which are (as far as we know) the actual reality. This doesn't mean that those beliefs aren't true. "Snow is white" does not mention quarks anywhere, and yet snow nevertheless is white. It's a computational shortcut, but it's still true.

221. Zombies! Zombies? - Don't try to put your consciousness or your personal identity outside physics. Whatever makes you say "I think therefore I am", causes your lips to move; it is within the chains of cause and effect that produce our observed universe.

222. Zombie Responses - A few more points on Zombies.

223. The Generalized Anti-Zombie Principle - The argument against zombies can be extended into a more general anti-zombie principle. But, figuring out what that more general principle is, is more difficult than it may seem.

224. GAZP vs. GLUT - Fleshes out the generalized anti-zombie principle a bit more, and describes the game "follow-the-improbability".

225. Belief in the Implied Invisible - That it's impossible even in principle to observe something sometimes isn't enough to conclude that it doesn't exist.

226. Zombies: the MovieA satirical script for a zombie movie, but not about the lurching and drooling kind. The philosophical kind.

227. Excluding the SupernaturalDon't rule out supernatural explanations because they're supernatural. Test them the way you would test any other hypothesis. And probably, you will find out that they aren't true.

228. Psychic PowersSome of the previous post was incorrect. Psychic powers, if indeed they were ever discovered, would actually be strong evidence in favor of non-reductionism.

 


This has been a collection of notes on the assigned sequence for this fortnight. The most important part of the reading group though is discussion, which is in the comments section. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

The next reading will cover Part S: Quantum Physics and Many Worlds (pp. 1081-1183). The discussion will go live on Wednesday, 27 January 2016, right here on the discussion forum of LessWrong.

View more: Prev | Next