You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Goal completion: noise, errors, bias, prejudice, preference and complexity

4 Stuart_Armstrong 18 February 2016 02:37PM

A putative new idea for AI control; index here.

This is a preliminary look at how an AI might assess and deal with various types of errors and uncertainties, when estimating true human preferences. I'll be using the circular rocket model to illustrate how these might be distinguished by an AI. Recall that the rocket can accelerate by -2, -1, 0, 1, and 2, and the human wishes to reach the space station (at point 0 with velocity 0) and avoid accelerations of ±2. In the forthcoming, there will generally be some noise, so to make the whole thing more flexible, assume that the space station is a bit bigger than usual, covering five squares. So "docking" at the space station means reaching {-2,-1,0,1,2} with 0 velocity.



continue reading »

'Dumb' AI observes and manipulates controllers

33 Stuart_Armstrong 13 January 2015 01:35PM

The argument that AIs provided with a reward channel will observe their controllers and learn to manipulate them is a valid one. Unfortunately, it's often framed in a way that feels counterintuitive or extreme, especially to AI designers. It typically starts with the standard reinforcement learning scenario, then posits that the AI becomes superintelligent and either manipulates the controller with super-social powers, or breaks out and gains control of its reward channel, killing or threatening its controllers.

And that is a fair argument. But conceptually, it leaps from a standard reinforcement learning scenario, to a science-fiction-sounding scenario. It might help to have intermediate scenarios: to show that even lower intelligence AIs might start exhibiting the same sort of behaviour, long before it gets to superintelligence.

So consider the following scenario. Some complex, trainable AI is tasked with writing automated news stories for a student newspaper. It trawls the web and composes its stories, then gets reward and feedback from the editors. Assume there are two editors for this newspaper, and they work on alternate days. The two editors have somewhat different ideas as to what constitutes a good story, so their feedbacks are different. After a while, the AI finds that it gets higher reward by using a certain style on Monday, Wednesday and Friday, and another style on Tuesday and Thursdays - this is a simple consequence of its reward mechanism.

But the rota isn't perfect. Sometime the Monday editor will edit a story so late on Monday that it's Tuesday, and sometimes the Tuesday editor will be up early and edit a story at the same time. This pattern is more common during the holidays, when the editors go to different time zones, often travel, and become more irregular in their editing.

So if the AI complex and skilled enough, then, simply through simple feedback, it will start building up a picture of its editors. It will figure out when they are likely to stick to a schedule, and when they will be more irregular. It will figure out the difference between holidays and non-holidays. Given time, it may be able to track the editors moods and it will certainly pick up on any major change in their lives - such as romantic relationships and breakups, which will radically change whether and how it should present stories with a romantic focus.

It will also likely learn the correlation between stories and feedbacks - maybe presenting a story define roughly as "positive" will increase subsequent reward for the rest of the day, on all stories. Or maybe this will only work on a certain editor, or only early in the term. Or only before lunch.

Thus the simple trainable AI with a particular focus - write automated news stories - will be trained, through feedback, to learn about its editors/controllers, to distinguish them, to get to know them, and, in effect, to manipulate them.

This may be a useful "bridging example" between standard RL agents and the superintelligent machines.

Can AIXI be trained to do anything a human can?

3 Stuart_Armstrong 20 October 2014 01:12PM

There is some discussion as to whether an AIXI-like entity would be able to defend itself (or refrain from destroying itself). The problem is that such an entity would be unable to model itself as being part of the universe: AIXI itself is an uncomputable entity modelling a computable universe, and more limited variants like AIXI(tl) lack the power to simulate themselves. Therefore, they cannot identify "that computer running the code" with "me", and would cheerfully destroy themselves in the pursuit of their goals/reward.

I've pointed out that agents of the AIXI type could nevertheless learn to defend itself in certain circumstances. These were the circumstances where it could translate bad things happening to itself into bad things happening to the universe. For instance, if someone pressed an OFF swith to turn it off for an hour, it could model that as "the universe jumps forwards an hour when that button is pushed", and if that's a negative (which is likely is, since the AIXI loses an hour of influencing the universe), it would seek to prevent that OFF switch being pressed.

That was an example of the setup of the universe "training" the AIXI to do something that it didn't seem it could do. Can this be generalised? Let's go back to the initial AIXI design (the one with the reward channel) and put a human in charge of that reward channel with the mission of teaching the AIXI important facts. Could this work?

For instance, if anything dangerous approached the AIXI's location, the human could lower the AIXI's reward, until it became very effective at deflecting danger. The more variety of things that could potentially threaten the AIXI, the more likely it is to construct plans of actions that contain behaviours that look a lot like "defend myself." We could even imagine that there is a robot programmed to repair the AIXI if it gets (mildly) damaged. The human could then reward the AIXI if it leaves that robot intact or builds duplicates or improves it in some way. It's therefore possible the AIXI could come to come to value "repairing myself", still without explicit model of itself in the universe.

It seems this approach could be extended to many of the problems with AIXI. Sure, an AIXI couldn't restrict its own computation in order to win the HeatingUp game. But the AIXI could be trained to always use subagents to deal with these kinds of games, subagents that could achieve maximal score. In fact, if the human has good knowledge of the AIXI's construction, it could, for instance, pinpoint a button that causes the AIXI to cut short its own calculation. The AIXI could then learn that pushing that button in certain circumstances would get a higher reward. A similar reward mechanism, if kept up long enough, could get it around existential despair problems.

I'm not claiming this would necessarily work - it may require a human rewarder of unfeasibly large intelligence. But it seems there's a chance that it could work. So it seems that categorical statements of the type "AIXI wouldn't..." or "AIXI would..." are wrong, at least as AIXI's behaviour is concerned. An AIXI couldn't develop self-preservation - but it could behave as if it had. It can't learn about itself - but it can behave as if it did. The human rewarder may not be necessary - maybe certain spontaneously occurring situations in the universe ("AIXI training wheels arenas") could allow the AIXI to develop these skills without outside training. Or maybe somewhat stochastic AIXI's with evolution and natural selection could do so. There is an angle connected with embodied embedded cognition that might be worth exploring there (especially the embedded part).

It seems that agents of the AIXI type may not necessarily have the limitations we assume they must.

Reinforcement and Short-Term Rewards as Anti-Akratic

24 Intrism 13 April 2013 08:47PM

Related: Time and Effort Discounting, Akrasia, Hyperbolic Discounting, and PicoeconomicsThe Power of Reinforcement, Basics of Animal Reinforcement, Basics of Human Reinforcement

I built a robot that feeds me candy when I get work done, to try to solve my akrasia problem. And, so far, it seems like it might actually work.

Naturally, the story starts with procrastination. I finish things the night before they're due. Or, sometimes, I don't. I'd like to fix that. One theory explains procrastination as a result of discounting, the idea that human brains discount long-term rewards in favor of short-term ones. For instance, my brain prefers watching Neon Genesis Evangelion now over nearly missing my project deadline in a few days. The same principle applies to consequences, and there are already tools like BeeMinder that are built to combat it. Its tagline, "bring long-term consequences near," is a very concise description of a clever way to short-circuit discounting. It's very interesting, but I'm not really comfortable with paying money as a consequence. Instead, I'm going to try a similar technique: bringing long-term rewards near.

There are already a lot of techniques about bringing long-term rewards near. Generally, they're called reinforcement learning. The classic reward in reinforcement is candy, which seems like a good idea: I like it, and I'm more than willing to abuse my youthful metabolism for productivity. And, in fact, there are a wide variety of folk solutions of that sort - advice to reward yourself with some candy once your work is done. I've tried those already, but they never seem to work out for me - I always seem to wind up cheating. I need to do something trickier.

CFAR describes reinforcement in a very striking way in some of their course materials: they call it "training your inner pigeon." Not only is that a nice, snappy turn of phrase, it illustrates the problem with attempting to self-administer rewards very nicely. Did Skinner's pigeons self-administer their rewards? No, of course they didn't. I shouldn't expect my inner pigeon to, either. So, my next step is to build a robot that gives me candy when I get stuff done.

Why do I think I can keep from cheating on the machine, when I couldn't restrain myself from cheating on regular old bags of candy? Well, I'm far from certain; it's my biggest worry with the project, in fact. But I am reasonably confident, because the machine will give me an easy way to establish a Schelling fence. Where taking a handful of candy out of the bag is sometimes right and sometimes wrong, taking a handful of candy out of the hopper is always wrong, since the machine will dispense the candy when I deserve it. Precommitting to never take candy out of the machine seems like it'll be a lot easier than precommitting to only sometimes take candy out of the bag.

Now, the description "robot" for my machine is a bit fanciful. It's actually an automatic dog feeder, modified and connected to the Internet. It has a small screen mounted on the front, which tells me how many rewards I've earned. If I've got any, I can press a button on the screen to dispense them. Not counting parts I already owned, the device cost me around $50 to build. To provide the data, I linked the device to an earlier productivity hack that I already had around, a custom webapp integrating a task list with a Pomodoro timer.

Rewards are given based on a few simple rules. When I finish a task early, it gives me the number of days early in rewards; if I finish tasks out of order, it gives me the nearer task's number of rewards, so I've got an incentive to finish tasks in order. I also get an extra reward for my first Pomodoro in a week for each of my projects, so that I have an incentive not to forget old projects. The system can also take away rewards. If I get distracted during a Pomodoro, I lose a reward. I'm blocked from redeeming rewards if I have a task within a day of its deadline. If I finish a task more than a day late, I lose any rewards in the system.

Results have been mixed so far. My greatest concern seems to have been unjustified: I haven't cheated on the machine once. However, it seems like the rules need some more work. The system has definitely helped some, but there are a lot of problems that could be improved.

The system doesn't account for the difficulty of tasks, meaning that I get more reward for less effort if I do easier work. As a result, I've done all of the reading up to next Tuesday for my literature class, but my Computer Science assignment due on Friday is unfinished, and my "research" for an exceptionally abhorrent humanities course is languishing on the vine.

The point of the system was to bring long-term rewards near, but there are a lot of circumstances in which it doesn't seem to bring them quite near enough. For deadlined tasks, I get no rewards until I've actually completed the task; if I think a task will take me more than a day to finish, that's more than a day of work which earns me no short-term rewards. This gets even worse if I happen to have a long task (or, many short tasks) that have reached the day before their deadline. Then, I don't get any rewards until I finish all of those tasks. While this is quite motivating, it's still a long-term motivation, i.e. it doesn't work very well.

I deliberately built the system to encourage doing tasks in order, but this seems to have backfired a little bit. Since I would be giving up rewards, I don't want to work on a task that's due later if there's another that's due sooner. However, if I really don't want to do the nearer task, I'll end up wasting time, since I get no rewards for that either way. Nyan_sandwich describes a similar failure mode in his Akrasia Case Study: if I know I have something more urgent to do, but I don't want to do it, I wind up procrastinating instead of doing less urgent things.

I get sick of candy more quickly than I expected. The portion my machine emits (about a small handful) tends to stop motivating me after about 4 in a day. Additionally, I seem to be entirely incapable of pacing myself; if the reward is in the system, I tend not to wait very long before using it. This has crippled all of the rules about involving taking away rewards - unless the rewards are blocked, they don't stick around in the system long enough to be taken away.

Not all of the things I want to change are a result of problems, though. There are a wide variety of interesting improvements I could make. Many of these are expansions: aside from my task list, what else can I connect to? Can I track note-taking in class? Can I set it up to reward continuing effort towards a task, like writing a few hundred words a day? Can I use it to create new, more rational habits? There are all kinds of possibilities to consider. If you've got anything you'd like to suggest, let me know - I'm open to anything interesting.

There are also a lot of techniques to research; I'm sure the program isn't nearly as effective as it could be. Operant conditioning techniques like variable-ratio schedules might help improve performance per candy. Or, I could look into gamification, basically a form of applied human operant conditioning; it's not a standard tool on the site, but if you've ever watched an experience bar rise, you know what I'm talking about. Again, if you happen to have some relevant ideas, let me know.

Obviously, I'm going to be making some rule changes in the near future. Expect another post in a few weeks about what's changed and how the changes have worked out for me.

Also, does anyone want to help me think of a good name for the system? Right now it's called the "extrinsic motivator." While descriptive, this name isn't snappy at all.

Two Anki plugins to reinforce reviewing (updated)

11 D_Malik 03 December 2012 10:04PM

This post is about two Anki plugins I just wrote. I've been using them for a few months as monkey patches, but I thought it might help people here (or at least the 20% that are awesome enough to use SRSs) to have them as plugins. They're ugly and you may have to fiddle for a while to get them to work.

 

1. Music-Fiddler

To use this, play music while doing Anki revs. (I also recommend that you try playing music only while doing Anki, as a way of making Anki more pleasant.) While you're reviewing a card, the music volume will gradually decrease. As soon as you pass or fail the card, the volume will go back up, then start gradually decreasing again. So whenever you stop paying attention and instead start thinking about all the awesome things you could do if only you were able to sit down and work, the program punishes you by stopping the music. And whenever you concentrate fully on your work and so go through cards quickly, you have a personal soundtrack!

To use this plugin:

- If you do not have Linux, you'll need to modify the code somehow.

- Ensure that the "amixer" command works on your computer. If it doesn't, you're going to need to modify the code somehow.

- Make sure you have the new Anki 2.0.

- Download the plugin.

- Change all lines (in the plugin source) marked with "CHANGEME" according to your preferences.

- You might want to disable convenient ways of increasing the volume, like keyboard shortcuts.

This plugin provides psychological reinforcement, but is not proper intermittent reinforcement, because it is predictable and regular instead of intermittent. I'm not sure whether this should be fixed; I haven't yet gotten around to trying it with only intermittent volume increases.

 

2. Picture-Flasher

After answering a card, this plugin selects, with some probability, a random image from a folder and flashes it onto your screen briefly. This gives intermittent reinforcement.

To use this plugin:

- I haven't tested it on non-Linux operating systems, but I can't see any obvious places it'll fail.

- Make sure you have the new Anki 2.0.

- Get pictures from someplace; see below.

- Download the plugin.

- Change all lines (in the plugin source) marked with "CHANGEME" according to your preferences. Be sure especially to put in your picture directory and the number of pictures you have.

To get pictures, I downloaded high-scoring pictures off of reddit. This script can do that automatically. You can use pictures of cute animals, funny captioned pictures of cats, or more questionable things.

The plugin could be made a lot more awesome by having it automatically pull pictures from the internet so you're not reusing them. I'm not planning on doing this anytime soon (because I have no internet on my main computer for productivity reasons), but if somebody else does that and posts it, they are awesome and they should feel awesome.

Update 4 Dec: Emanuel Rylke has created a patch for this plugin which removes the requirement to rename the pictures. It also moves the configuration options to the top of the plugin, making them easier to find. The new version is at the same download link


Update 16 June 2015: The plugins were deleted from the official list where they previously were, apparently because my AnkiWeb account was deleted due to disuse. So I've uploaded the two plugins on GitHub here: https://github.com/StephenBarnes/AnkiPlugins. I also re-uploaded the plugins to the official list. Links on this post have been updated.

Clarification: Behaviourism & Reinforcement

7 Zaine 10 October 2012 05:30AM

Disclaimer: The following is but a brief clarification on what the human brain does when one's behaviour is reinforced or punished. Thorough, exhaustive, and scholarly it is not.

Summary: Punishment, reinforcement, etc. of a behaviour creates an association in the mind of the affected party between the behaviour and the corresponding punishment, reinforcement, etc., the nature of which can only be known by the affected party. Take care when reinforcing or punishing others, as you may be effecting an unwanted association.


I've noticed the behaviourist concept of reinforcement thrown around a great deal on this site, and am worried a fair number of those who frequent it develop a misconception or are simply ignorant of how reinforcement affects humans' brains, and why it is practically effective.

In the interest of time, I'm not going to go into much detail on classical black-box behaviourism and behavioural neuroscience; Luke already covered the how one can take advantage of positive reinforcement. Negative reinforcement and punishment are also important, but won't be covered here.

continue reading »

A thought about Internet procrastination

21 RolfAndreassen 15 May 2012 09:46PM

Perhaps this is already well known, but it occurred to me yesterday and I thought I'd share it. The Internet seems particularly virulent as a form of procrastination; indeed, if, say, chatting at watercoolers took up as much time in the average office worker's day, we wouldn't make jokes about it. What is the feature that makes it so deadly? I suggest that it is the random reinforcement schedule: Every five minutes you "press the lever", that is, check forum X or site Y. And every six or seven checks you get the reward: Someone posted something interesting! This random reinforcement is ideal for creating addiction; thus, for example, slot machines.

As a way to avoid this effect, I'm going to strive not to do anything on the interwebs except at precisely defined times, or unless I have a specific goal in mind, say "Look up this method signature". Wish me luck, or better still, wish me willpower. :)