perfectly feasible
Citation needed.
I always had the informal impression that the optimal policies were deterministic (choosing the best option, rather than some mix of options). Of course, this is not the case when facing other agents, but I had the impression this would hold when facing the environment rather that other players.
But stochastic policies can also be needed if the environment is partially observable, at least if the policy is Markov (memoryless). Consider the following POMDP (partially observable Markov decision process):

There are two states, 1a and 1b, and the agent cannot tell which one they're in. Action A in state 1a and B in state 1b, gives a reward of -R and keeps the agent in the same place. Action B in state 1a and A in state 1b, gives a reward of R and moves the agent to the other state.
The returns for the two deterministic policies - A and B - are -R every turn except maybe for the first. While the return for the stochastic policy of 0.5A + 0.5B is 0 per turn.
Of course, if the agent can observe the reward, the environment is no longer partially observable (though we can imagine the reward is delayed until later). And the general policy of "alternate A and B" is more effective that the 0.5A + 0.5B policy. Still, that stochastic policy is the best of the memoryless policies available in this POMDP.
perfectly feasible
Citation needed.
In software, it's trivial: create a subroutine with only a very specific output, include the entity inside it. Some precautions are then needed to prevent the entity from hacking out through hardware weaknesses, but that should be doable (using isolation in faraday cage if needed).
I like how the examples of the robot failures are... uhm... not like from the Terminator movie. May make some people discuss them more seriously.
Very interesting paper, congratulations on the collaboration.
I have a question about theta. When you initially introduce it, theta lies in [0,1]. But it seems that if you choose theta = (0n)n, just a sequence of 0s, all policies are interruptible. Is there much reason to initially allow such a wide ranging theta - why not restrict them to converge to 1 from the very beginning? (Or have I just totally missed the point?)
We're working on the theta problem at the moment. Basically we're currently defining interruptibility in terms of convergence to optimality. Hence we need the agent to explore sufficiently, hence we can't set theta=1. But we want to be able to interrupt the agent in practice, so we want theta to tend to one.
these folks say that you won't be able to sandbox a AGI, due to the nature of computing itself.
Assuming that a superintelligence will contain a program that includes all the programs that can be executed by a universal Turing machine on input potentially as complex as the state of the world, strict containment requires simulations of such a program, something theoretically (and practically) infeasible.
http://arxiv.org/abs/1607.00913v1
But perhaps we could fool it, by poisoning some crucial databases it uses in subtle ways.
DeepFool: a simple and accurate method to fool deep neural networks
strict containment requires simulations of such a program, something theoretically (and practically) infeasible.
Sandboxing just requires that you be sure that the sandboxed entity can't send bits outside the system (except on some defined channel, maybe), which is perfectly feasible.
From the Google Research blog:
We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks. So today we’re publishing a technical paper, Concrete Problems in AI Safety, a collaboration among scientists at Google, OpenAI, Stanford and Berkeley.
While possible AI safety risks have received a lot of public attention, most previous discussion has been very hypothetical and speculative. We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably.
We’ve outlined five problems we think will be very important as we apply AI in more general circumstances. These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:
We go into more technical detail in the paper. The machine learning research community has already thought quite a bit about most of these problems and many related issues, but we think there’s a lot more work to be done.
We believe in rigorous, open, cross-institution work on how to build machine learning systems that work as intended. We’re eager to continue our collaborations with other research groups to make positive progress on AI.
With the internet of things physical goods can treat their owner differently than other people. A car can be programmed to only be driven by their owner.
Theoretically yes, but that doesn't seem to be how "smart" devices are actually being programmed.
With the internet of things physical goods can treat their owner differently than other people. A car can be programmed to only be driven by their owner.
Which shift the verification to the imperfect car code.
Many people are probably aware of the hack at DAO, using a bug in their smart contract system to steal millions of dollars worth of the crypto currency Ethereum.
There's various arguments as to whether this theft was technically allowed or not, and what should be done about it, and so on. Many people are arguing that the code is the contract, and that therefore no-one should be allowed to interfere with it - DAO just made a coding mistake, and are now being (deservedly?) punished for it.
That got me wondering whether its ever possible to make a smart contract without a full AI of some sort. For instance, if the contract is triggered by the delivery of physical goods - how can you define what the goods are, what constitutes delivery, what constitutes possession of them, and so on. You could have a human confirm delivery - but that's precisely the kind of judgement call you want to avoid. You could have an automated delivery confirmation system - but what happens if someone hacks or triggers that? You could connect it automatically with scanning headlines of media reports, but again, this is relying on aggregated human judgement, which could be hacked or influenced.
Digital goods seem more secure, as you can automate confirmation of delivery/services rendered, and so on. But, again, this leaves the confirmation process open to hacking. Which would be illegal, if you're going to profit from the hack. Hum...
This seems the most promising avenue for smart contracts that doesn't involve full AI: clear out the bugs in the code, then ground the confirmation procedure in such a way that it can only be hacked in a way that's already illegal. Sort of use the standard legal system as a backstop, fixing the basic assumptions, and then setting up the smart contracts on top of them (which is not the same as using the standard legal system within the contract).
The approach assumes that it knows everything there is to know about off switches in general, or what its creators know about off switches.
If the AI can guess that its creators would install an off switch, it will attempt to work around as many possible classes of off switches as possible, and depending on how much of off-switch space it can outsmart simultaneously, whichever approach the creators chose might be useless.
Such an AI desperately needs more FAI mechanisms behind it, it desperately needing an off switch assumes that off switches help.
This class of off switch is designed for the AI not to work around.
Yup, I think I understand that, and agree you need to at least tend to one. I'm just wondering why you initially use the loser definition of theta (where it doesn't need to tend to one, and can instead be just 0 )
When defining safe interruptibility, we let theta tend to 1. We probably didn't specify that earlier, when we were just introducing the concept?