Stuart_Armstrong comments on Siren worlds and the perils of over-optimised search - Less Wrong

27 Post author: Stuart_Armstrong 07 April 2014 11:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (411)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 30 April 2014 04:55:07AM 0 points [-]

tell the AI not to take actions which the simulated brain thinks offend against liberty.

How? "tell", "the simulated brain thinks" "offend": defining those incredibly complicated concepts contains nearly the entirety of the problem.

Comment author: PhilosophyTutor 30 April 2014 06:28:16AM 1 point [-]

I could be wrong but I believe that this argument relies on an inconsistent assumption, where we assume we have solved the problem of creating an infinitely powerful AI, but we have not solved the problem of operationally defining commonplace English words which hundreds of millions of people successfully understand in such a way that a computer can perform operations using them.

It seems to me that the strong AI problem is many orders of magnitude more difficult than the problem of rigorously defining terms like "liberty". I imagine that a relatively small part of the processing power of one human brain is all that is needed to perform operations on terms like "liberty" or "paternalism" and engage in meaningful use of them so it is a much, much smaller problem than the problem of creating even a single human-level AI, let alone a vastly superhuman AI.

If in our imaginary scenario we can't even define "liberty" in such a way that a computer can use the term, it doesn't seem very likely that we can build any kind of AI at all.

Comment author: Stuart_Armstrong 30 April 2014 03:28:52PM *  0 points [-]

I could be wrong but I believe that this argument relies on an inconsistent assumption, where we assume we have solved the problem of creating an infinitely powerful AI, but we have not solved the problem of operationally defining commonplace English words which hundreds of millions of people successfully understand in such a way that a computer can perform operations using them.

Yes. Here's another brute force approach: upload a brain (without understanding it), run it very fast with simulated external memory, subject it to evolutionary pressure. All this can be done with little philosophical and conceptual understanding, and certainly without any understanding of something as complex as liberty.

Comment author: PhilosophyTutor 01 May 2014 12:16:47AM *  1 point [-]

If you can do that, then you can just find someone who you think understands what we mean by "liberty" (ideally someone with a reasonable familiarity with Kant, Mill, Dworkin and other relevant writers), upload their brain without understanding it, and ask the uploaded brain to judge the matter.

(Off-topic: I suspect that you cannot actually get a markedly superhuman AI that way, because the human brain could well be at or near a peak in the evolutionary landscape so that there is no evolutionary pathway from a current human brain to a vastly superhuman brain. Nothing I am aware of in the laws of physics or biology says that there must be any such pathway, and since evolution is purposeless it would be an amazing lucky break if it turned out that we were on the slope of the highest peak there is, and that the peak extends to God-like heights. That would be like if we put evolutionary pressure on a cheetah and discovered that if we do that we can evolve a cheetah that runs at a significant fraction of c.

However I believe my argument still works even if I accept for the sake of argument that we are on such a peak in the evolutionary landscape, and that creating God-like AI is just a matter of running a simulated human brain under evolutionary pressure for a few billion simulated years. If we have that capability then we must also be able to run a simulated philosopher who knows what "liberty" refers to).

EDIT: Downvoting this without explaining why you disagree doesn't help me understand why you disagree.

Comment author: Stuart_Armstrong 02 May 2014 01:49:28PM 0 points [-]

If we have that capability then we must also be able to run a simulated philosopher who knows what "liberty" refers to.

And would their understanding of liberty remain stable under evolutionary pressure? That seems unlikely.

EDIT: Downvoting this without explaining why you disagree doesn't help me understand why you disagree.

Have not been downvoting it.

Comment author: PhilosophyTutor 02 May 2014 08:19:32PM 0 points [-]

I didn't think we needed to put the uploaded philosopher under billions of years of evolutionary pressure. We would put your hypothetical pre-God-like AI in one bin and update it under pressure until it becomes God-like, and then we upload the philosopher separately and use them as a consultant.

(As before I think that the evolutionary landscape is unlikely to allow a smooth upward path from modern primate to God-like AI, but I'm assuming such a path exists for the sake of the argument).

Comment author: Stuart_Armstrong 06 May 2014 11:56:49AM 1 point [-]

we upload the philosopher separately and use them as a consultant.

And then we have to ensure the AI follows the consultant (probably doable) and define what querying process is acceptable (very hard).

But your solution (which is close to Paul Christiano's) works whatever the AI is, we just need to be able to upload a human. My point was that we could conceivably create an AI without understanding any of the hard problems, still stands. If you want I can refine it: allow partial uploads: we can upload brains, but they don't function as stable humans, as we haven't mapped all the fine details we need to. However, we can use these imperfect uploads, plus a bit of evolution, to produce AIs. And here we have no understanding of how to control its motivations at all.

Comment author: PhilosophyTutor 07 May 2014 11:05:03AM 1 point [-]

I won't argue against the claim that we could conceivably create an AI without knowing anything about how to create an AI. It's trivially true in the same way that we could conceivably turn a monkey loose on a typewriter and get strong AI.

I also agree with you that if we got an AI that way we'd have no idea how to get it to do any one thing rather than another and no reason to trust it.

I don't currently agree that we could make such an AI using a non-functioning brain model plus "a bit of evolution". I am open to argument on the topic but currently it seems to me that you might as well say "magic" instead of "evolution" and it would be an equivalent claim.

Comment author: Stuart_Armstrong 07 May 2014 05:04:18PM 0 points [-]

Why are you confident that an AI that we do develop will not have these traits? You agree the mindspace is large, you agree we can develop some cognitive abilities without understanding them. If you add that most AI programmers don't take AI risk seriously and will only be testing their AI's in controlled environments, that the AI will be likely developed for a military or commercial purpose, I don't see why you'd have high confidence that they will converge on a safe design?

Comment author: XiXiDu 07 May 2014 05:54:32PM 2 points [-]

If you add that most AI programmers don't take AI risk seriously and will only be testing their AI's in controlled environments...I don't see why you'd have high confidence that they will converge on a safe design?

Why do you think such an AI wouldn't just fail at being powerful, rather than being powerful in a catastrophic way?

If programs fail in the real world then they are not working well. You don't happen to come across a program that manages to prove the Riemann hypothesis when you designed it to prove the irrationality of the square root of 2.

Comment author: PhilosophyTutor 07 May 2014 10:43:58PM *  1 point [-]

(EDIT: See below.) I'm afraid that I am now confused. I'm not clear on what you mean by "these traits", so I don't know what you think I am being confident about. You seem to think I'm arguing that AIs will converge on a safe design and I don't remember saying anything remotely resembling that.

EDIT: I think I figured it out on the second or third attempt. I'm not 100% committed to the proposition that if we make an AI and know how we did so that we can definitely make sure it's fun and friendly, as opposed to fundamentally uncontrollable and unknowable. However it seems virtually certain to me that we will figure out a significant amount about designing AIs to do what we want in the process of developing them. People who subscribe to various "FOOM" theories about AI coming out of nowhere will probably disagree with this as is their right, but I don't find any of those theories plausible.

I also I hope I didn't give the impression that I thought it was meaningfully possible to create a God-like AI without understanding how to make AI. It's conceivable in that such a creation story is not a logical contradiction like a square circle or a colourless green dream sleeping furiously, but that is all. I think it is actually staggeringly unlikely that we will make an AI without either knowing how to make an AI, or knowing how to upload people who can then make an AI and tell use how they did it.

Comment author: [deleted] 07 May 2014 06:59:15PM 1 point [-]

Why are you confident that an AI that we do develop will not have these traits?

For the same reason a jet engine doesn't have comfy chairs: with all machines, you develop the core physical and mathematical principles first, and then add human comforts.

The core mathematical and physical principles behind AI are believed, not without reason, to be efficient cross-domain optimization. There is no reason for an arbitrarily-developed Really Powerful Optimization Process to have anything in its utility function dealing with human morality; in order for it to be so, you need your AI developers to be deliberately aiming at Friendly AI, and they need to actually know something about how to do it.

And then, if they don't know enough, you need to get very, very, very lucky.

Comment author: TheAncientGeek 07 May 2014 06:33:13PM 1 point [-]

MIRIs arguments aren't about deliberate weaponisation, they are about the inadvertent creation of dangerous .AI by competent and well intentioned people.

The weaponisation of .AI has almost happenedalready the form of stuxnet and it is significant that there were a lot safeguards built into it. .AI researchers seemed be aware enough.

Comment author: TheAncientGeek 07 May 2014 11:14:06AM 0 points [-]

I have no idea why the querrying process would have to be hard. Is David Frost some super genius?

Comment author: Stuart_Armstrong 07 May 2014 12:08:50PM 0 points [-]

"Defining what querying process is acceptable" is the hard part.

Comment author: TheAncientGeek 07 May 2014 12:59:43PM 0 points [-]

The justification of which is?

Comment author: [deleted] 01 May 2014 07:37:33AM -1 points [-]

My mind is throwing a type-error on reading your comment.

Liberty could well be like pornography: we know it when we see it, based on probabilistic classification. There might not actually be a formal definition of liberty that includes all actual humans' conceptions of such as special cases, but instead a broad range of classifier parameters defining the variation in where real human beings "draw the line".

Comment author: PhilosophyTutor 01 May 2014 11:46:59AM 2 points [-]

The standard LW position (which I think is probably right) is that human brains can be modelled with Turing machines, and if that is so then a Turing machine can in theory do whatever it is we do when we decide that something ls liberty, or pornography.

There is a degree of fuzziness in these words to be sure, but the fact we are having this discussion at all means that we think we understand to some extent what the term means and that we value whatever it is that it refers to. Hence we must in theory be able to get a Turing machine to make the same distinction although it's of course beyond our current computer science or philosophy to do so.

Comment author: hairyfigment 30 April 2014 07:34:04AM -1 points [-]

While I don't know how much I believe the OP, remember that "liberty" is a hotly contested term. And that's without a superintelligence trying to create confusing cases. Are you really arguing that "a relatively small part of the processing power of one human brain" suffices to answer all questions that might arise in the future, well enough to rule out any superficially attractive dystopia?

Comment author: PhilosophyTutor 30 April 2014 08:07:48AM 3 points [-]

I really am. I think a human brain could rule out superficially attractive dystopias and also do many, many other things as well. If you think you personally could distinguish between a utopia and a superficially attractive dystopia given enough relevant information (and logically you must think so, because you are using them as different terms) then it must be the case that a subset of your brain can perform that task, because it doesn't take the full capabilities of your brain to carry out that operation.

I think this subtopic is unproductive however, for reasons already stated. I don't think there is any possible world where we cannot achieve a tiny, partial solution to the strong AI problem (codifying "liberty", and similar terms) but we can achieve a full-blown, transcendentally superhuman AI. The first problem is trivial compared to the second. It's not a trivial problem, by any means, it's a very hard problem that I don't see being overcome in the next few decades, but it's trivial compared to the problem of strong AI which is in turn trivial compared to the problem of vastly superhuman AI. I think Stuart_Armstrong is swallowing a whale and then straining at a gnat.

Comment author: hairyfigment 30 April 2014 08:28:26AM -2 points [-]

No, this seems trivially false. No subset of my brain can reliably tell when an arbitrary Turing machine halts and when it doesn't, no matter how meaningful I consider the distinction to be. I don't know why you would say this.

Comment author: PhilosophyTutor 30 April 2014 12:10:26PM *  2 points [-]

I'll try to lay out my reasoning in clear steps, and perhaps you will be able to tell me where we differ exactly.

  1. Hairyfigment is capable of reading Orwell's 1984, and Banks' Culture novels, and identifying that the people in the hypothetical 1984 world have less liberty than the people in the hypothetical Culture world.
  2. This task does not require the full capabilities of hairyfigment's brain, in fact it requires substantially less.
  3. A program that does A+B has to be more complicated than a program that does A alone, where A and B are two different, significant sets of problems to solve. (EDIT: If these programs are efficiently written)
  4. Given 1-3, a program that can emulate hairyfigment's liberty-distinguishing faculty can be much, much less complicated than a program that can do that plus everything else hairyfigment's brain can do.
  5. If we can simulate a complete human brain that is the same as having solved the strong AI problem.
  6. A program that can do everything hairyfigment's brain can do is a program that simulates a complete human brain.
  7. Given 4-6 it is much less complicated to emulate hairyfigment's liberty-distinguishing faculty than to solve the strong AI problem.
  8. Given 7, it is unreasonable to postulate a world where we have solved the strong AI problem, in spades, so much so we have a vastly superhuman AI, but we still haven't solved the hairyfigment's liberty-distinguishing faculty problem.
Comment author: CCC 30 April 2014 01:37:43PM *  0 points [-]

A program that does A+B has to be more complicated than a program that does A alone, where A and B are two different, significant sets of problems to solve.

Incorrect. I can write a horrendously complicated program to solve 1+1; and a far simpler program to add any two integers.

Admittedly, neither of those are particularly significant problems; nonetheless, unnecessary complexity can be added to any program intended to do A alone.

It would be true to say that the shortest possible program capable of solving A+B must be more complex than the shortest possible program to solve A alone, though, so this minor quibble does not affect your conclusion.

Given 4-6 it is much less complicated to emulate hairyfigment's liberty-distinguishing faculty than to solve the strong AI problem.

Granted.

Given 7, it is unreasonable to postulate a world where we have solved the strong AI problem, in spades, so much so we have a vastly superhuman AI, but we still haven't solved the hairyfigment's liberty-distinguishing faculty problem.

Why? Just because the problem is less complicated, does not mean it will be solved first. A more complicated problem can be solved before a less complicated problem, especially if there is more known about it.

Comment author: PhilosophyTutor 01 May 2014 12:07:14AM 0 points [-]

Why? Just because the problem is less complicated, does not mean it will be solved first. A more complicated problem can be solved before a less complicated problem, especially if there is more known about it.

To clarify, it seems to me that modelling hairyfigment's ability to decide whether people have liberty is not only simpler than modelling hairyfigment's whole brain, but that it is also a subset of that problem. It does seem to me that you have to solve all subsets of Problem B before you can be said to have solved Problem B, hence you have to have solved the liberty-assessing problem if you have solved the strong AI problem, hence it makes no sense to postulate a world where you have a strong AI but can't explain liberty to it.

Comment author: CCC 14 May 2014 10:45:30AM 1 point [-]

Hmmm. That's presumably true of hairyfigment's brain; however, simulting a copy of any human brain would also be a solution to the strong AI problem. Some human brains are flawed in important ways (consider, for example, psychopaths) - given this, it is within the realm of possibility that there exists some human who has no conception of what 'liberty' means. Simulating his brain is also a solution of the Strong AI problem, but does not require solving the liberty-assessing problem.

Comment author: hairyfigment 30 April 2014 03:06:45PM -1 points [-]

..It's the hidden step where you move from examining two fictions, worlds created to be transparent to human examination, to assuming I have some general "liberty-distinguishing faculty".

Comment author: PhilosophyTutor 01 May 2014 12:02:48AM 1 point [-]

We have identified the point on which we differ, which is excellent progress. I used fictional worlds as examples, but would it solve the problem if I used North Korea and New Zealand as examples instead, or the world in 1814 and the world in 2014? Those worlds or nations were not created to be transparent to human examination but I believe you do have the faculty to distinguish between them.

I don't see how this is harder than getting an AI to handle any other context-dependant, natural language descriptor, like "cold" or "heavy". "Cold" does not have a single, unitary definition in physics but it is not that hard a problem to figure out when you should say "that drink is cold" or "that pool is cold" or "that liquid hydrogen is cold". Children manage it and they are not vastly superhuman artificial intelligences.

Comment author: TheAncientGeek 30 April 2014 03:49:05PM *  0 points [-]

H.airyfigment, do you canmean detecting liberty in reality is different to, or harder than, detecting liberty in fiction?

Comment author: EHeller 30 April 2014 05:14:02AM 1 point [-]

How? "tell", "the simulated brain thinks" "offend": defining those incredibly complicated concepts contains nearly the entirety of the problem.

If you can simulate the whole brain, you can just simulate asking the brain the question "does this offend against liberty."

Comment author: Stuart_Armstrong 30 April 2014 03:26:13PM 0 points [-]

Under what circumstances? There are situations - torture, seduction, a particular way of asking the question - that can make any brain give any answer. Defining "non-coercive yet informative questioning" about a piece of software (a simulated brain) is... hard. AI hard, as some people phrase it.

Comment author: TheAncientGeek 30 April 2014 04:23:18PM *  2 points [-]

Why would that .be more of a problem for an AI than a human?

Comment author: Stuart_Armstrong 02 May 2014 01:38:54PM 0 points [-]

? The point is that having a simulated brain and saying "do what this brain approves of" does not make the AI safe, as defining the circumstance in which the approval is acceptable is a hard problem.

This is a problem for us controlling an AI, not a problem for the AI.

Comment author: TheAncientGeek 02 May 2014 03:27:26PM 0 points [-]

I still don't get it. We assume acceptability by default. We don't constantly stop and ask "Was that extracted under torture".

Comment author: Stuart_Armstrong 06 May 2014 11:47:11AM 0 points [-]

I do not understand your question. It was suggested that an AI run a simulated brain, and ask the brain for approval for doing its action. My point was that "ask the brain for approval" is a complicated thing to define, and puts no real limits on what the AI can do unless we define it properly.

Comment author: TheAncientGeek 06 May 2014 12:42:23PM 0 points [-]

Ok. You are assuming the superintelligent .AI will pose the question in a dumb way?

Comment author: Stuart_Armstrong 06 May 2014 12:46:19PM 0 points [-]

No, I am assuming the superintelligent AI will pose the question in the way it will get the answer it prefers to get.

Comment author: TheAncientGeek 06 May 2014 01:20:24PM 0 points [-]

Oh, you're assuming it's malicious. In order to prove...?

Comment author: Neph 15 June 2014 02:13:42PM *  0 points [-]
def checkMorals():
>[simulate philosophy student's brain]
>if [simulated brain's state is offended]:
>>return False
>else:
>>return True
if checkMorals():
>[keep doing AI stuff]

there. that's how we tell an AI capable of being an AI and capable of simulating a brain to not to take actions which the simulated brain thinks offend against liberty, as implemented in python.

Comment author: Stuart_Armstrong 16 June 2014 10:29:52AM 0 points [-]

oh, it's so clear and obvious now, how could I have missed that?