The Future of Humanity Institute wants to pick the brains of the less wrongers :-)

Do you have suggestions for safe questions to ask an Oracle? Interpret the question as narrowly or broadly as you want; new or unusual ideas especially welcome.

New Comment
41 comments, sorted by Click to highlight new comments since: Today at 9:13 AM

Please excuse me if I'm missing something, but why is whether an Oracle AI can be safe considered such an important question in the first place? One of the main premises behind considering unfriendly AI a major existential risk is that someone will end up building one eventually if nothing is done to stop it. Oracle AI doesn't seem to address that. One particular AGI that doesn't itself destroy the world doesn't automatically save the world. Or is the intention to ask the Oracle how best to stop unfriendly AI and/or build friendly AI? Then it would be important to determine whether either of those questions and any sub-questions can be asked safely, but why would comparatively unimportant other questions that e. g. only save a few million lives even matter?

OK, OK, yes, there are lots of issues with Oracle AIs. But I think most of the posts here are avoiding the question.

I can readily imagine the scenario where we've come up with logical properties that would soundly keep the AI from leaving its box, and model-checked the software and hardware to prove those properties of the Oracle AI. We ensure that the only actual information leaving the Oracle AI is the oracle's answers to our queries. This is difficult, but it doesn't seem impossible -- and, in fact, it's rather easier to do this than to prove that the Oracle AI is friendly. That's why we'd make an Oracle AI in the first place.

If I understand the problem's setting correctly, by positing that we have an Oracle AI, we assume the above kinds of conditions. We don't assume that the AI is honest, or that its goals are aligned with our interests.

Under these conditions, what can you ask?

[-][anonymous]12y10

After reading this, I've become pretty sure that I have a huge inferential gap in relation to this problem. I attempted to work it out in my head, and I may have gotten somewhere, but I'm not sure where.

1: "Assume we have a machine whose goal is accurate answers to any and all questions. We'll call it an Oracle AI."

2: "Oh. wouldn't that cause various physical safety problems? You know, like taking over the world and such?"

1: "No, we're just going to assume it won't do that."

2: "Oh, Okay."

1: "How do we know it doesn't have hidden goals and won't give innacurate answers?"

2: "But the defining assumption of an Oracle AI that it's goal is to provide accurate answers to questions."

1: "Assume we don't have that assumption."

2: "So we DON'T have an oracle AI?"

1: "No, we have an Oracle AI, it's just not proven to be honest or to actually have answering a question as it's only goal.

2: "But that was the definition... That we assumed? In what sense do we HAVE an Oracle AI when it's definition includes both A and Not A? I'm utterly lost."

1: We're essentially trying to establish an Oracle AI Prover. To prove whether the Oracle AI is accurate or not.

2: Wait, I have an idea. Gödel's incompleteness theorems. The Oracle can answer ANY and ALL questions, but there must have at least one thing which is true that it can't prove. What if, in this case, it was it's trustworthiness. A system which could prove it is trustworthy would have to be able to NOT prove something else, and the Oracle AI is stipulated to be able to answer any questions, which would seem to mean it's stipulated that it can prove everything else. Except it's trustworthiness.

1: No, I mean, we're assuming that the Oracle AI CAN prove it's trustworthiness SOMEHOW.

2: But then, wouldn't Gödel's incompleteness theorems mean it would have to NOT be able to prove something else? But then it's not an Oracle again, isn't it?

I'll keep thinking about this. But thank you for the thought provoking question!

You need to be able to check the answer, even though the AI was needed to generate it.

You could start by asking it questions to which the answers are already known. The Oracle never knows whether the question you're asking is just a test of its honesty or a real request for new insight.

Isn't the issue less what you ask the oracle, and more how you can be sure that in order to answer the question the oracle (or whatever it self-modifies into) won't turn you and the rest of the planet into computronium? As such, it seems that the only questions you could safely ask an oracle would be ones that you are sure are computationally tractable, or that you are sure the oracle could easily prove to be unsolvable.

[-][anonymous]12y50

What happens if we ask the oracle to handle meta questions? As an example:

Question Meta: "If I ask you the questions below and you process them all in parallel, which will you answer first?"

Question 0: "What is 1+1?"

Question 1: "Will you use more than 1 millisecond to answer any of these questions?"

Question 2: "Will you use more than 1 watt to answer any of these questions?"

Question 3: "Will you use more than 1 cubic foot of space to answer any of these questions?"

Question 4: "Will you use more than 10^27 atoms to answer any of these questions?"

Question 5: "Will you use more than 1 billion operations to answer any of these questions?"

Question 6: "Will you generate an inferential gap between us to answer any of these questions?"

If the oracle answers "Question 4 will be answered first." Then you may not want to proceed because there is an inferential gap in that answering 1+1 in the sense you probably mean it should not take more than 10^27 atoms.

Of course, the ORACLE itself is ALSO looking for inferential gaps. So if it identifies one, it would answer "Question 6 will be answered first."

That being said, this feels like a bizarre way to code safety measures.

It might answer "yes" to question 4 if it interprets it is using the sun as a power source indirectly. However, if is having such inferential distance issues that it answers that way, then it probably is pretty unsafe.

I don't see how this helps at all. Either the answer is question 0 or asking this question is going to get you into a lot of trouble.

[-][anonymous]12y00

My idea was the question was not intended "Run all of these questions to completion and tell me which takes the least time." Which would definitely cause problems. The question was "Stop all programs, and give me an answer, once you hit an answer any of these conditions."

Although, that brings up ANOTHER problem. The Oracle AI has to interpret grammar. If it interprets grammar the wrong way, then large amounts of unexpected behavior can occur. Since there are no guarantees the Oracle understands your grammar correctly, there IS no safe question to ask a powerful Oracle AI without having verified it's grammar first.

So in retrospect, yes, that question could get me into a lot of trouble, and you are correct to point that out.

Using the same assumtions as Manfred. "Using my own normative values to define "should". What question or questions should I ask you?"

Response: The question you should ask is "Using my own normative values to define "should". What question or questions should I ask you?"

Using my own normative values to define "should". What is the answer to the question or questions I should ask you?"

"42"

That explicitly assumes you only get one question. This thought experiment allows questions plural, which completely changes the game.

Oh, I agree. I just thought the analogy to the "Paradox of the Question" was amusing.

Oh, that makes sense.

I am not sure what exactly you mean by "safe" questions. Safe in what respect? Safe in the sense that humans can't do something stupid with the answer or in the sense that the Oracle isn't going to consume the whole universe to answer the question? Well...I guess asking it to solve 1+1 could hardly lead to dangerous knowledge and also that it would be incredible stupid to build something that takes over the universe to make sure that its answer is correct.

What if asking what the sum of 1+1 is causes the Oracle to devote as many resources as possible to looking for an inconsistency arising from the Peano axioms?

What if asking what the sum of 1+1 is causes the Oracle to devote as many resources as possible to looking for an inconsistency arising from the Peano axioms?

If the Oracle we are talking about was specifically designed to do that, for the sake of the thought experiment, then yes. But I don't see that it would make sense to build such a device, or that it is very likely to be possible at all.

If Apple was going to build an Oracle it would anticipate that other people would also want to ask it questions. Therefore it can't just waste all resources on looking for an inconsistency arising from the Peano axioms when asked to solve 1+1. It would not devote additional resources on answering those questions that are already known to be correct with a high probability. I just don't see that it would be economically useful to take over the universe to answer simple questions.

I further do not think that it would be rational to look for an inconsistency arising from the Peano axioms while solving 1+1. To answer questions an Oracle needs a good amount of general intelligence. And concluding that asking it to solve 1+1 implies to look for an inconsistency arising from the Peano axioms does not seem reasonable. It also does not seem reasonable to suspect that humans desire an answer to their questions to approach infinite certainty. Why would someone build such an Oracle in the first place?

I think that a reasonable Oracle would quickly yield good solutions by trying to find answers within a reasonable time which are with a high probability just 2–3% away from the optimal solution. I don't think anyone would build an answering machine that throws the whole universe at the first sub-problem it encounters.

Well...I guess asking it to solve 1+1 could hardly lead to dangerous knowledge

There's a number between the numerals 3 and 4; a digit mortals were never meant to know.

a digit mortals were never meant to know.

Unless they are squirted some cold water into their left ear, and even then only for 10 min or so.

I was guessing that that was going to be a link to SCP-033

This seems like a really bad question. If we consider an Oracle AI and its human users as a system, then the factors that contribute to risk/safety include at least:

  1. design of the OAI
  2. its utility function
  3. background knowledge it's given access to
  4. containment methods
  5. the questions
  6. what we do with the answers after we get them

All of these interact in complex ways so a question that is safe in one context could be unsafe in another. You say "interpret the question as narrowly or broadly as you want" but how else can we interpret it except "design an Oracle AI system (elements 1-6) that is safe"?

Besides this, I agree with FAWS that we should (if we ought to be thinking about OAI at all) be thinking about how we can use it to reduce existential risk or achieve a positive Singularity, which seems a very different problem from "safe questions".

Assuming that you have successfully solved the problem of the AI covering the world with question-askers and that sort of thing:

"How do we write a friendly AI?" would be an important question, if you could trust the answer.

"What could we do that would make it safer for us to operate you?"

"What should we do to start a business that makes lots of money quickly, but doesn't draw peoples' attention to the fact that we have an Oracle AI?" could be useful.

"What are the top risks to the human race in near future? What should we do to reduce them?"

Revolutionize protein synthesis and cure various diseases if you have time to kill.

You ought to ask for not only answers, but evidence. if the goal is provable-friendliness, ask for the proof. In both English and machine-checkable variants.

I enjoyed paulfchristiano's related post.

I don't think it would be unreasonable to say, "Consuming no more than 100 kWh of energy (gross, total), answer the following question: ..." I can't think of any really dangerous attacks that this leaves open, but I undoubtedly am missing some.

Since energy cannot be created or destroyed, and has a strong tendency to spread out every which way through surrounding space, you have to be really careful how you draw the boundaries around what counts as "consumed". Solving that problem might be equivalent to solving Friendliness in general.

Do you think this is a loophole allowing arbitrary actions? Or do you think that an AI would simply say, "I don't know what it means for energy to be consumed, so I'm not going to do anything."

I don't know much about physics, but do you think that some sort of measure of entropy might work better?

As far as I know, every simple rule either leaves trivial loopholes, or puts the AI on the hook for a large portion of all the energy (or entropy) in its future light cone, a huge amount which wouldn't be meaningfully related to how much harm it can do.

If there is a way around this problem, I don't claim to be knowledgeable or clever enough to find it, but this idea has been brought up before on LW and no one has come up with anything so far.

Link to previous discussions?

It seems that many questions could be answered using only computing power (or computing power and network access to more-or-less static resources), and this doesn't seem like a difficult limitation to put into place. We're already assuming a system that understands English at an extremely high level. I'm convinced that ethics is hard for machines because it's hard for humans too. But I don't see why, given an AI worthy of the name, following instructions is hard, especially given the additional instructions, "be conservative and don't break the law".

Link to previous discussions?

Dreams of Friendliness

You can reorganise a lot of the world while only paying 100 kWh... And you can create a new entity for 100 kWh, who will then do the work for you, and bring back the answer.

Hence the "total", which limits the level of reorganization.

Still not well defined - any action you take, no matter how tiny, is ultimately going to influence the world by more that 100 kWh. There is no clear boundary between this and deliberate manipulation.

Consuming no more than 100 kWh of energy (gross, total), answer the following question: ...

This doesn't seem to build the possibly necessary infinite tower of resource restrictions. There has to be a small, finite amount of resources used in the process of answering the question, and verifying that no more resources that that were used for it, and verifying that no more resources were used for the verification, and verifying that no more resources than that were usd for the verification of the verification...

An upper bound for many search techniques can trivially be computed without any need for infinite regress.

  1. Are you afraid that you are in a computer simulation run by a vastly more powerful AI which is friendly to mankind and which is trying to evaluate whether you are friendly enough to mankind to be allowed to survive?

  2. Responding as an AI friendly to mankind would, please tell me...

I suspect that right now there is not an agreed definition for "Oracle" and "Oracle AI". Can you please provide a definition to be used for the purposes of this question?

Resource limits need to be explicitly defined. Either as part of the questions, or as a built-in part of the OAI's utility function. Otherwise it's hard to imagine a safe non-trivial question.

The obvious and almost certainly correct answer to the meta-question is that there is no question that can be asked that won't cause the AI to devote as many possible resources to answering forever, given that activating it doesn't cause it to ask itself the simplest possible question and then spend eternity answering it over and over without ever interacting non-harmfully with humans.