Counterfactual do-what-I-mean
A putative new idea for AI control; index here.
The counterfactual approach to value learning could be used to possibly allow natural language goals for AIs.
The basic idea is that when the AI is given a natural language goal like "increase human happiness" or "implement CEV", it is not to figure out what these goals mean, but to follow what a pure learning algorithm would establish these goals as meaning.
This would be safer than a simple figure-out-the-utility-you're-currently-maximising approach. But it still doesn't solve a few drawbacks. Firstly, the learning algorithm has to be effective itself (in particular, modifying human understanding of the words should be ruled out, and the learning process must avoid concluding the simpler interpretations are always better). And secondly, humans' don't yet know what these words mean, outside our usual comfort zone, so the "learning" task also involves the AI extrapolating beyond what we know.
Open thread, Oct. 24 - Oct. 30, 2016
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
What's the most annoying part of your life/job?
Hi, I'm an entrepreneur looking for a startup idea.
In my experience, the reason most startups fail is because they never actually solve anyone's problem. So I'm cheating and starting out by identifying a specific person with a specific problem.
So I'm asking you, what's the most annoying part of your life/job? Also, how much would you pay for a solution?
Internal Race Conditions
Time start: 14:40:36
I
You might be familiar with the concept of a 'bug', as introduced by CFAR. By using the computer programming analogy, it frames any problem you might have in your life as something fixable... even more - as something to be fixed, something such that fixing it or thinking about how to fix it is the first thing that comes to mind when you see such a problem, or 'bug'.
Let's try another analogy in the same style, with something called 'race conditions' in programming. A race condition as a particular type of bug, that is typically very hard to find and fix ('debug'). It occurs when two or more parts of the same program 'race' to access some data, resource, decision point etc., in a way that is not controlled by any organised principle.
For example, imagine that you have a document open in an editor program. You make some changes, you give a command to save the file. While this operation is in progress, you drag and drop the same file in a file manager, moving to another hard drive. In this case, depending on timing, on the details of the programs, and on the operating system that you are using, you might get different results. The old version of the file might be moved to the new location, while the new one is saved in the old location. Or the file might get saved first, and then moved. Or the saving operation will end in an error, or in a truncated or otherwise malformed file on the disk.
If you know enough details about the situation, you could in fact work out what exactly would happen. But the margin of error in your own handling of the software is so big, that you cannot in practice do this (e.g. you'd need to know the exact milisecond when you press buttons etc.). So in practice, the outcome is random, depending on how the events play out on a scale smaller that you can directly control (e.g. minute differences in timing, strength of reactions etc.).
II
What is the analogy in humans? One of the places in which when you look hard, you'll see this pattern a lot is the relation of emotions and conscious decision making.
E.g., a classic failure mode is a "commitment to emotions", which goes like this:
- I promise to love you forever
- however if I commit to this, I will have doubts and less freedom, which will generate negative emotions
- so I'll attempt to fall in love faster than my doubts grow
- let's do this anyway, why won't we?
The problem here is a typical emotional "race condition": there is a lot of variability in the outcome, depending on how events play out. There could be a "butterfly effect", in which e.g. a single weekend trip together could determine the fate of the relationship, by creating a swing up or down, which would give one side of emotions a head start in the race.
III
Another typical example is making a decision about continuing a relationship:
- when I spend time with you, I like you more
- when I like you more, I want to continue our relationship
- when we have a relationship, I spend more time with you
As you can see, there is a loop in decision process. This cannot possibly end well.
A wild emotional rollercoaster is probably around the least bad outcome of this setup.
IV
So how do you fix race conditions?
By creating structure.
By following principles which compute the result explicitly, without unwanted chaotic behaviour.
By removing loops from decision graphs.
First and foremost, by recognizing that leaving a decision to a race condition is strictly worse than any decision process that we consciously design, even if this process is flipping the coin (at least you know the odds!).
Example: deciding to continue the relationship.
Proposed solution (arrow represent influence):
(1) controlled, long-distance emotional evaluation -> (2) systemic decision -> (3) day-to-day emotions
The idea is to remove the loop by organising emotions into tho groups: those that are directly influenced by the decision or its consequences (3), and more distant "evaluation" emotions (1). A possibility to feel emotions as in (1) can be created by pre-deciding a time to have some time alone and judge the situation from more distance, e.g. "after 6 months of this relationship I will go for a 2 week vacation to by aunt in France, and think about it in a clear-headed way, making sure I consider emotions about the general picture, not day-to-day things like physical affection etc.".
V
There is much to write on this topic, so please excuse my brevity (esp. in the last part, giving some examples of systemic thinking about emotions) - there is easily enough content about this to fill a book (or two). But I hope I gave you some idea.
Time end: 15:15:42
Writing stats: 31 minutes, 23 wpm, 133 cpm
New LW Meetup: Zurich
New meetups (or meetups with a hiatus of more than a year) are happening in:
Irregularly scheduled Less Wrong meetups are taking place in:
- Munich Meetup in October: 29 October 2016 04:00PM
- Stockholm: Mental contrasting: 21 October 2016 04:00PM
The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:
- Bay Area Winter Solstice 2016: 17 December 2016 07:00PM
- [Moscow] Games in Kocherga club: FallacyMania, Tower of Chaos, Scientific Discovery: 26 October 2016 07:40PM
- NY Solstice 2016 - The Story of Smallpox: 17 December 2016 06:00PM
- San Francisco Meetup: Stories: 24 October 2016 06:15PM
- Washington, D.C.: Technology of Communication: 23 October 2016 03:30PM
Locations with regularly scheduled meetups: Austin, Berlin, Boston, Brussels, Buffalo, Canberra, Columbus, Denver, Kraków, London, Madison WI, Melbourne, Moscow, New Hampshire, New York, Philadelphia, Research Triangle NC, San Francisco Bay Area, Seattle, St. Petersburg, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers and a Slack channel for daily discussion and online meetups on Sunday night US time.
*How* people shut down thought because of high-status respectable halos
https://srconstantin.wordpress.com/2016/10/20/ra/
A detailed look at the belief that high status social structures can be so much better than anything one can think of that there's no point in even trying to think about the details of what to do, and how debilitating this is.
Discussion of the essay
A problem in anthropics with implications for the soundness of the simulation argument.
What are your intuitions about this? It has direct implications for whether the Simulation Argument is sound.
Imagine two rooms, A and B. Between times t1 and t2, 100 trillion people sojourn in room A while 100 billion sojourn in room B. At any given moment, though, exactly 1 person occupies room A while 1,000 people occupy room B. At t2, you find yourself in a room, but you don't know which one. If you have to place a bet on which room it is (at t2), what do you say? Do you consider the time-slice or the history of room occupants? How do you place your bet?
If you bet that you're in room B, then the Simulation Argument may be flawed: there could be a fourth disjunct that Bostrom misses, namely that we become a posthuman civilization that runs a huge number of simulations yet we don't have reason for believing that we're stimulants.
Thoughts?
View more: Next
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)