ike comments on Summoning the Least Powerful Genie - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (48)
Its goal is to predict as accurately as it can. Clearly taking over the world and reassigning all computing power to calculate the prediction is the best move.
How does it know how long a plan will take to design until it actually designs it? (I'm assuming your "time" is time to design a plan). How do we know the fastest designed plan is the safest? Maybe this AI generates unsafe plans faster than safe ones.
Tl;Dr not pessimistic enough.
Wouldnt taking over the world be a rather agentive thing for an AI that is not an agent to do?
"it is not an agent" is not a description of how to build an AI that is in fact, not an agent. It's barely better than "not an unsafe AI".
Besides, isn't "giving an answer to the prediction" a rather agenty thing for such an AI to do?
Non-agents aren't all that mysterious. We can already build non agents. Google is a non-agent.
No, it;s a response. Non agency means not doing anything unless prompted.
Compare: safe (in the FAI sense) computer programs aren't that mysterious. We can already build safe computer programs. Android is a safe computer program.
Well, who cares if it doesn't do anything unless prompted, if it takes over the universe when prompted to answer a question? And if you can rigorously tell it not to do that, you've already solved FAI.
Do you have a valid argument that nonagentive programmes would be dangerous? Because saying "it would agentively do X" isn't a valid argument. Pointing out the hidden pitfalls of such programmes is something MIRI could usefully do. An unargued belief that everything is dangerous is not useful.
Oh, you went there.
Well: how likely is an AI designed to be nonagentive as a safety feature to have that particular failure mode?
You may have achieved safety., but it has nothing to do with "achieving FAI" in the MIRI sense of hardcoding the totality of human value. The whole point is that it is much easier, because you are just not building in agency.
A program designed to answer a question necessarily wants to answer that question. A superintelligent program trying to answer that particular question runs the risk of acting as a paperclip maximizer.
Suppose you build a superintelligent program that is designed to make precise predictions, by being more creative and better at predictions than any human would. Why are you confident that one of the creative things this program does to make itself better at predictions isn't turning the matter of the Earth into computronium as step 1?
I don't think my calculator wants anything.
Does an amoeba want anything? Does a fly? A dog? A human?
You're right, of course, that we have better models for a calculator than as an agent. But that's only because we understand calculators and they have a very limited range of behaviour. As a program gets more complex and creative it becomes more predictive to think of it as wanting things (or rather, the alternative models become less predictive).
Notice the difference (emphasis mine):
vs
What does that mean? It's necessarily satisfying a utility function? It isn't as Lumifer's calculator shows.
I can be confident that nonagents wont't do agentive things.
Why are you so confident your program is a nonagent? Do you have some formula for nonagent-ness? Do you have a program that you can feed some source code to and it will output whether that source code forms an agent or not?
It's all standard software engineering.
I'm claiming that "nonagent" is not descriptive enough to actually build one. You replied that we already have non agents, and I replied that we already have safe computer programs. Just like we can't extrapolate from our safe programs that any AI will be safe, we can't extrapolate from our safe nonagents that any non-agent will be safe.
I still have little idea what you mean by nonagent. It's a black box, that may have some recognizable features from the outside, but doesn't tell you how to build it.
I replied that we can already build nonagents.
It remains the case that if you think they could be dangerous, you need to explain how.
Again, we already know how to build them, in that we have them.
Worse than that. MIRI can't actually build anything they propose. It's just that some MIRI people have a reflex habit of complaining that anything outside of MIRI land is too vague.
Think of Solomonoff Induction or some approximation of it. It is not an agent. It just tries every possible hypothesis on that data and does a Bayesian update.
It doesn't know. It needs to predict. But in general humans have a general idea of how solvable a problem is before they solve it. An engineer knows that building a certain kind of machine is possible, long before he works out the exact specification. A computer programmer knows a problem is probably solvable with a certain approach before they work out the exact computer code to produce it.
This AI is highly incentivized to work fast. Searching down the totally wrong tree is punished highly. Trying simpler ideas before more complex ones is rewarded. So is being able to quickly come up with possible solutions that might work, before reviewing them in more depth.
I don't know exactly what strategies it will use, but it's utility function is literally to minimize computing power. If you trust the AI to fulfill it's utility function, then you can trust it will do this to the best of it's ability.
Safety is not guaranteed with this approach. I am fully upfront about this. What it does is minimize optimization. The plan you get will be the stupidest one the AI can come up with. This significantly decreases risk.
IMHO extreme pessimism leads to throwing out a huge number of ideas. Some of which might be practical, or lead to more practical approaches. I was extremely pessimistic about FAI for a long time until reading some of the recent proposals to actually attack the problem. None of the current ideas are sufficient, but they show it's at least approachable.
Which is uncomputable, and an approximation would presumably benefit from increased computing power.
Like that's simple? How exactly do you make an AI that isn't an agent? With no goals, why does it do anything?
But why are plans that take less computing power to come up with more likely to be safe? Besides, if it calculates that searching for simple solutions is likely not going to meet the 90% criteria, it can forgoe that and jump straight to complicated ones.
Your idea is similar to http://lesswrong.com/lw/854/satisficers_want_to_become_maximisers/, have you seen that?
I gave the simplest possible counter example to your objection. I never proposed that we actually use pure Solomonoff induction.
EDIT: I realize you said something different. You implied that an approximation of Solomonoff induction would benefit from more computing power, and so would act as an agent to obtain it. This is totally incorrect. Solomonoff induction can be approximated in various ways by bounding the run time of the programs, or using simpler models instead of computer programs, etc. None of these create any agentness. They still just do prediction. I'm not sure you understand the distinction between agents and predictive non-agents, and this is very important for FAI work.
The entire field of machine learning is about building practical approximations of Solomonoff inductions. Algorithms which can predict things and which are not agents. Agents are just special cases of prediction algorithms, where they take the action that has the highest predicted reward.
Because they are plans that less powerful intelligences could have come up with. We don't worry about humans taking over the world, because they aren't intelligent enough. The danger of superintelligence is because it could be far more powerful than us. This is a limit on that power.
That's a feature. We don't know how much computing power is necessary. We just want it to minimize it.
I think several of your objections were addressed in http://lesswrong.com/lw/tj/dreams_of_friendliness/. That's pretty much where I'm coming from. Do you have good responses to the arguments there?
Response
EY is talking about oracles which answer questions. I am just talking about prediction.
But yes you do have a point that building a powerful predictive AI is not completely trivial. But it's certainly possible. If you have infinite computing power, you can just run Solomonoff induction.
Realistically we will have to find good approximations, and this might require using agenty-AI. And if so we will have to do work on controlling that AI. I believe this is possible, because it's a simple domain with a well specified goal, and no output channels except a single number.
Anyway, the other AI judge isn't an important or necessary part of my idea. I just wanted to have a simple outside judge of solutions. You could make the judge internal, have the AI use it's own probability estimates to decide when to output a solution. It is essentially doing that already by trying to predict what the judge will say to it's plan. The judge is redundant.