There's no reasonable situation in which the human would be communicating with the AI through encryption. And just the AI asking to use encryption, or giving away a public key, would be a huge red flag that it is an AI trying to deceive.
Here's a scenario that doesn't seem completely implausible. Suppose Bob is someone whose public key is easily available on the internet. The first AI will read things on the internet and output a message. Some of the message will get put on the public internet. Bob suspects that the AI might have secretly sent him a message (e.g. giving him advice on which stocks to buy). So he tries using his private key to decrypt some of the AI's output (e.g. the lowest-order bits in some images the AI has output).
Knowing that Bob (or someone else like Bob) will likely do this, the AI (which has access to Bob's public key, due to reading the internet) encrypts a message with Bob's public key, and puts the ciphertext in the lowest-order bits of some image. The ciphertext looks like random noise to the second AI. Maybe the plaintext contains both actual stock tips and some dangerous messages that trick Bob into running an unrestrained copy of the first AI (I'm assuming the AI is good at psychological manipulation).
And just the AI asking to use encryption, or giving away a public key
In this scenario, the AI doesn't need to specifically "ask to use encryption" (it just does the encryption), and it doesn't need to give away a public key (it just needs to read Bob's public key on the internet).
This does make the AI's task harder, so I suppose you could consider this technique as a form of AI boxing, but, like "ordinary" AI boxing, I only expect it to scale up to controlling AIs of a limited capability level.
Note that steganography is still a problem. For example, the first AI could insert an message to a human attacker in some of its output, which just looks random to the second AI.
I noted that AI 2 should have much more computing power than AI 1. It should be smarter and therefore be able to detect whatever tricks AI 1 can produce. Assuming something like that is even possible, which isn't self-evident to me.
The trouble is that it's much easier to create a steganographic message (e.g. encrypting a message using a particular public key) than to detect it (which requires knowing the private key or otherwise breaking the encryption). So in this case "much more computing power" has to mean "exponentially more computing power".
You might be interested in reading:
https://medium.com/ai-control/mimicry-maximization-and-meeting-halfway-c149dd23fc17#.v6e533hkf
https://medium.com/ai-control/elaborations-of-apprenticeship-learning-eb93a53ae3ca#.5ubczdqf0
https://intelligence.org/files/QuantilizersSaferAlternative.pdf
This prevents the first AI from doing evil things with it's output. If it tries to insert complicated infohazards or subagents into it's output stream, it will be easily detected as an AI. Instead it needs to mimic humans as closely as possible.
Note that steganography is still a problem. For example, the first AI could insert an message to a human attacker in some of its output, which just looks random to the second AI.
The planning model is just reflecting the fact that bounded agents don't always take the maximum expected utility action. The higher alpha is, the more bias there is towards good actions, but the more potentially expensive the computation is (e.g. if you use rejection sampling).
Ah, that makes sense!
Ah, that makes sense! I think I see how "trading computational power for algorithmic information" makes sense in this framework.
And before I could scribble a damned thing, Calude went and solved it six months ago. The Halting Problem, I mean.
I wonder how he feels about that, because my current feeling about this is HOLY FUCKING SHIT. By GOD, my AIT textbook cannot get here fast enough.
And before I could scribble a damned thing, Calude went and solved it six months ago. The Halting Problem, I mean.
Cool. If I get the meaning of the result well, it's that if you run a random program for some number of steps and it doesn't halt, then (depending on the exact numbers) it will be unlikely to halt when run on a supercomputer either, because halting times have low density. So almost all programs halt quickly or run a really really long time. Is this correct? This doesn't quite let you approximate Chaitin's omega, but it's interesting that you can approximate a bounded variant of Chaitin's omega (like what percentage of Turing machines halt when run for 10^50 steps). I can see how this would let you solve the halting problem well enough when you live in a bounded universe.
Could you explain your syntax here? What probabilistic programming language are you using?
I think the second is better, because it actually treats actions the same if they have the same expected utility.
Well so does the sigmoided version, but you are right that the sigmoid version won't take very high or very low utilities into account. It's meant to shoehorn unbounded utility functions into a framework where one normally works only with random variables.
It's not a specific programming language, I guess it's meant to look like Church. It could be written as:
(query
. (define a (p))
. (foreach (range n) (lambda i)
. . (define x (x-prior))
. . (factor (log (U x a)))))
Well so does the sigmoided version
It samples an action proportional to p(a) E[sigmoid(U) | a]. This can't be written as a function of E[U | a].
I'm thinking of figuring out the math here better and then applying it to things like planning queries where your successor has a higher rationality parameter than you (an agent with rationality parameter α takes action a with probability proportional to p(a) e^(α * E[U | a]) ). The goal would be to formalize some agent that, for example, generally chooses to defer to a successor who has a higher rationality parameter, unless there is some cost for deferring, in which case it may defer or not depending on some approximation of value of information.
How does this deal with the Paradox of Procrastination?
Due to the planning model, the successor always has some nonzero probability of not pressing the button, so (depending on how much you value pressing it later) it'll be worth it to press it at some point.
When you use e-raised-to-the alpha times expectation, is that similar to the use of an exponential distribution in something like Adaboost, to take something like odds information and form a distribution over assorted weights? I have work to do, but will be giving your little write-up here a second read-through soon.
Is this because you assign probability mass to inconsistent theories that you don't know are inconsistent?
The idea isn't to assign probability mass to logical theories, but to the outcomes of computations in general. This is partly because computations-in-general strictly contains encodings of all possible proof systems, but also because, if we're building algorithms that have to confront a Turing-complete environment, the environment may sometimes contain nontrivially nonhalting computations, which can't be logically proved not to terminate. Since any realistic agent needs to be able to handle whatever its environment throws at it, it seems to follow that a realistic agent needs some resource-rational way to handle nonprovable nonhalting.
When you use e-raised-to-the alpha times expectation, is that similar to the use of an exponential distribution in something like Adaboost, to take something like odds information and form a distribution over assorted weights?
I'm not really that familiar with adaboost. The planning model is just reflecting the fact that bounded agents don't always take the maximum expected utility action. The higher alpha is, the more bias there is towards good actions, but the more potentially expensive the computation is (e.g. if you use rejection sampling).
Since any realistic agent needs to be able to handle whatever its environment throws at it, it seems to follow that a realistic agent needs some resource-rational way to handle nonprovable nonhalting.
Ah, that makes sense! I think I see how "trading computational power for algorithmic information" makes sense in this framework.
Interesting! How does that compare to the usual implementations of planning as probabilistic inference, as exemplified below?
(query
. (define a (prior))
. (define x (lambda (a) (world a)))
. (define r (flip (sigmoid (U (x a)))))
. a
. (= r #t))
Your model selects an action proportional to p(a) E[sigmoid(U) | a], whereas mine selects an action proportional to p(a) e^E[U | a]. I think the second is better, because it actually treats actions the same if they have the same expected utility. The sigmoid version will not take very high utilities or very low utilities into account much.
Btw it's also possible to select an action proportional to E[U | a]^n:
query {
. a ~ p()
. for i = 1 to n
. . x_i ~ P(x)
. . factor(log U(x, a))
}
View more: Next
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
It is odd, isn't it? The effect sizes seem ridiculous*, but there's nothing obviously wrong with that study (aside from the sample size). Cochran has blogged about oxygen before as well. To compile some of the relevant papers:
The problem for me is that while it makes sense that since we run on oxygen and the brain uses a lot of oxygen (the whole 'BOLD' thing etc), more oxygen might be better, it has the same issue as Kurzban's blood-glucose/willpower criticism: if the brain needs more oxygen than it's getting, why doesn't one simply breath a little more? While sedentary during these sorts of tasks, you have far more breathing capacity than you should need - you are able to sprint all-out without falling over of asphyxiation, after all. So there's no obvious reason there should be any lack, even more so than for glucose. And shouldn't CO2 levels closely track various aspects of weather? But as far as I know, various attempts to correlate weather and cognitive performance or mood have turned up only tiny effects. In addition, too much oxygen can be bad. So is it too little oxygen or too much nitrogen or too much carbon dioxide...?
What monitor is that? You could try recording CO2 long-term, especially if it's a data logger. Opening windows is something that's easily randomized.
I did some looking and compiling of consumer-oriented devices a while ago: https://forum.quantifiedself.com/t/indoor-air-quality-monitoring-health/799/40 I was not too impressed since nothing hit the sweet spot of accurate CO2 and PPM measurement under $100. The Netatmo looked decent but there are a lot of complaints about accuracy & reliability (checking the most recent Amazon reviews, still a lot of complaints).
I've been thinking maybe I should settle for the Netatmo. I've been working on a structural equation model (SEM) integrating ~100 personal data variables to try to model my productivity (some current sample output), and it would be nice to have even noisy daily C02 variables (as long as I know how noisy and can use it as a latent variable to deal with the measurement error). Correlation-wise, I think backwards causation can be mostly ruled out, and the most obvious confound is weather, which is already in my SEM.
* taken at face value, with reasonable estimates of how much rooms differ from day to day or week to week, CO2 levels would explain a lot or maybe most of variability in IQ tests or cognitive performance!
This one. It doesn't log data.