In partially observable environments, stochastic policies can be optimal

5 Stuart_Armstrong 19 July 2016 10:42AM

I always had the informal impression that the optimal policies were deterministic (choosing the best option, rather than some mix of options). Of course, this is not the case when facing other agents, but I had the impression this would hold when facing the environment rather that other players.

But stochastic policies can also be needed if the environment is partially observable, at least if the policy is Markov (memoryless). Consider the following POMDP (partially observable Markov decision process):

There are two states, 1a and 1b, and the agent cannot tell which one they're in. Action A in state 1a and B in state 1b, gives a reward of -R and keeps the agent in the same place. Action B in state 1a and A in state 1b, gives a reward of R and moves the agent to the other state.

The returns for the two deterministic policies - A and B - are -R every turn except maybe for the first. While the return for the stochastic policy of 0.5A + 0.5B is 0 per turn.

Of course, if the agent can observe the reward, the environment is no longer partially observable (though we can imagine the reward is delayed until later). And the general policy of "alternate A and B" is more effective that the 0.5A + 0.5B policy. Still, that stochastic policy is the best of the memoryless policies available in this POMDP.

Comment author: Larks 12 July 2016 12:18:07AM 0 points [-]

Yup, I think I understand that, and agree you need to at least tend to one. I'm just wondering why you initially use the loser definition of theta (where it doesn't need to tend to one, and can instead be just 0 )

Comment author: Stuart_Armstrong 12 July 2016 01:50:26PM 0 points [-]

When defining safe interruptibility, we let theta tend to 1. We probably didn't specify that earlier, when we were just introducing the concept?

Comment author: Viliam 11 July 2016 02:31:59PM 1 point [-]

perfectly feasible

Citation needed.

Comment author: Stuart_Armstrong 11 July 2016 05:55:18PM 1 point [-]

In software, it's trivial: create a subroutine with only a very specific output, include the entity inside it. Some precautions are then needed to prevent the entity from hacking out through hardware weaknesses, but that should be doable (using isolation in faraday cage if needed).

Comment author: Viliam 11 July 2016 02:33:00PM 3 points [-]

I like how the examples of the robot failures are... uhm... not like from the Terminator movie. May make some people discuss them more seriously.

Comment author: Stuart_Armstrong 11 July 2016 05:52:59PM 1 point [-]

Yep!

Comment author: Larks 10 July 2016 03:33:57AM 0 points [-]

Very interesting paper, congratulations on the collaboration.

I have a question about theta. When you initially introduce it, theta lies in [0,1]. But it seems that if you choose theta = (0n)n, just a sequence of 0s, all policies are interruptible. Is there much reason to initially allow such a wide ranging theta - why not restrict them to converge to 1 from the very beginning? (Or have I just totally missed the point?)

Comment author: Stuart_Armstrong 10 July 2016 05:09:02AM 0 points [-]

We're working on the theta problem at the moment. Basically we're currently defining interruptibility in terms of convergence to optimality. Hence we need the agent to explore sufficiently, hence we can't set theta=1. But we want to be able to interrupt the agent in practice, so we want theta to tend to one.

Comment author: morganism 07 July 2016 08:27:10PM 0 points [-]

these folks say that you won't be able to sandbox a AGI, due to the nature of computing itself.

Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires simulations of such a program, something theoretically (and practically) infeasible.

http://arxiv.org/abs/1607.00913v1

But perhaps we could fool it, by poisoning some crucial databases it uses in subtle ways.

DeepFool: a simple and accurate method to fool deep neural networks

http://arxiv.org/abs/1511.04599v3

Comment author: Stuart_Armstrong 09 July 2016 08:13:59AM 1 point [-]

strict containment requires simulations of such a program, something theoretically (and practically) infeasible.

Sandboxing just requires that you be sure that the sandboxed entity can't send bits outside the system (except on some defined channel, maybe), which is perfectly feasible.

[LINK] Concrete problems in AI safety

15 Stuart_Armstrong 05 July 2016 09:33PM

From the Google Research blog:

We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks. So today we’re publishing a technical paper, Concrete Problems in AI Safety, a collaboration among scientists at Google, OpenAI, Stanford and Berkeley.

While possible AI safety risks have received a lot of public attention, most previous discussion has been very hypothetical and speculative. We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably.

We’ve outlined five problems we think will be very important as we apply AI in more general circumstances. These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:

  • Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?
  • Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.
  • Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.
  • Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.
  • Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.

We go into more technical detail in the paper. The machine learning research community has already thought quite a bit about most of these problems and many related issues, but we think there’s a lot more work to be done.

We believe in rigorous, open, cross-institution work on how to build machine learning systems that work as intended. We’re eager to continue our collaborations with other research groups to make positive progress on AI.

Comment author: fubarobfusco 23 June 2016 08:07:44PM 1 point [-]

With the internet of things physical goods can treat their owner differently than other people. A car can be programmed to only be driven by their owner.

Theoretically yes, but that doesn't seem to be how "smart" devices are actually being programmed.

Comment author: Stuart_Armstrong 23 June 2016 11:44:40PM 1 point [-]

With the internet of things physical goods can treat their owner differently than other people. A car can be programmed to only be driven by their owner.

Which shift the verification to the imperfect car code.

Are smart contracts AI-complete?

11 Stuart_Armstrong 22 June 2016 02:08PM

Many people are probably aware of the hack at DAO, using a bug in their smart contract system to steal millions of dollars worth of the crypto currency Ethereum.

There's various arguments as to whether this theft was technically allowed or not, and what should be done about it, and so on. Many people are arguing that the code is the contract, and that therefore no-one should be allowed to interfere with it - DAO just made a coding mistake, and are now being (deservedly?) punished for it.

That got me wondering whether its ever possible to make a smart contract without a full AI of some sort. For instance, if the contract is triggered by the delivery of physical goods - how can you define what the goods are, what constitutes delivery, what constitutes possession of them, and so on. You could have a human confirm delivery - but that's precisely the kind of judgement call you want to avoid. You could have an automated delivery confirmation system - but what happens if someone hacks or triggers that? You could connect it automatically with scanning headlines of media reports, but again, this is relying on aggregated human judgement, which could be hacked or influenced.

Digital goods seem more secure, as you can automate confirmation of delivery/services rendered, and so on. But, again, this leaves the confirmation process open to hacking. Which would be illegal, if you're going to profit from the hack. Hum...

This seems the most promising avenue for smart contracts that doesn't involve full AI: clear out the bugs in the code, then ground the confirmation procedure in such a way that it can only be hacked in a way that's already illegal. Sort of use the standard legal system as a backstop, fixing the basic assumptions, and then setting up the smart contracts on top of them (which is not the same as using the standard legal system within the contract).

Comment author: Gurkenglas 13 June 2016 02:12:51PM *  0 points [-]

The approach assumes that it knows everything there is to know about off switches in general, or what its creators know about off switches.

If the AI can guess that its creators would install an off switch, it will attempt to work around as many possible classes of off switches as possible, and depending on how much of off-switch space it can outsmart simultaneously, whichever approach the creators chose might be useless.

Such an AI desperately needs more FAI mechanisms behind it, it desperately needing an off switch assumes that off switches help.

Comment author: Stuart_Armstrong 14 June 2016 02:55:48AM 0 points [-]

This class of off switch is designed for the AI not to work around.

View more: Prev | Next