Bay area LW meet-up
The November LW/OB meet-up will be this Saturday (two days from today), at the SIAI house in Santa Clara. Apologies for the late notice. We'll have fun, food, and attempts at rationality, as well as good general conversation. Details at the bay area OB/LW meet-up page.
Loading…
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Comments (69)
I'm traveling to the west coast especially for this. Hoping to see you all there.
Great meetup; conversation was had about the probability of AI risk. Initially I thought that the probability of AI disaster was close to 5%, but speaking to Anna Salamon convinced me that it was more like 60%.
Also some discussion about what strategies to follow for AI friendliness.
I was also interested in the discussion on AI risk reduction strategies. Although SIAI espouses friendly AI, there hasn't been much thought about risk mitigation for possible unfriendly AIs. One example is the AI box. Although it is certainly not 100% effective, it's better than nothing (assuming it doesn't encourage people to run more UFAIs). Another would be to program an unfriendly AI with goals that would cause it to behave in a manner such that it does not destroy the world. For example, having a goal of not going outside its box.
While the problem of friendly AI is hard enough to make people give up, I also think the problem of controlling unfriendly AI is hard enough to make some of the pro-FAI people do the same.
It would be nice if you could tell an AI not to affect anything outside its box.
10 points will be awarded to the first person who spots why "don't affect anything outside your box" is problematic.
Such an AI wouldn't be able to interact with us, even verbally.
Alicorn, I hereby award you 10 points. These are redeemable after the singularity for kudos, catgirls and other cool stuff.
The word "catgirl" leaves me vaguely nauseous. I think this has something to do with the uncanny valley. Apologies for the tangent.
you know what they say: one man's nausea is another man's arousal.
note: considered unprovable before the internet.
...that's not even the real problem...
You can have my points if you tell us what is. I didn't want a catgirl anyway.
I shall attempt to guess the teacher's passidea: “the real problem” is that the AI can't perform any computation, because that would consume energy and emit heat. Oh, and gravity and electromagnetic waves, and so on. In fact, not computing is equally well a choice which affects things outside the box.
Yup. To be exact, "affect" is a magical category relative to the irrevocably entangled actual universe.
Edit: Looks like I actually wrote this one up, in Dreams of Friendliness.
It's the first horn of the dilemma. The second horn is the AI Box failure mode.
What if you told the AI to only affect the world through verbal communication with humans?
Then it would be able to convince you to let it out.
So it could affect the world through more verbal communication with humans. That's not as bad as things could be. Edit: The AI has as its First Law that the conducting metals in its circuitry must not touch other conducting metal that was not present when it was built. The conducting metal that will be present when it's built will be its CPU, RAM, etc, and a terminal. The terminal will be engineered to have a reasonably slow rate of character display for safety purposes (so no humans will be reprogrammed while basking in the hypnotic glow of characters rushing past.)
You know how sometimes someone will give you what seems like an impossible to solve hypothetical, and yet somehow, there's a solution? I'm pretty sure that the space of situations that are simply impossible to solve is much bigger than the space of situations that seem impossible but actually are possible.
And constructing one of these seems (to me) a much better bet than hoping that a complex computer program will be bug-free on its first trial run.
This is a good example of two idioms: First, what Bruce Schneier called "fence-post security". That's where you build a very tall fencepost in the middle of the desert. People don't climb the fencepost, they just walk around it.
Second, the idiom that would-be FAI solvers go into when they see a problem, and try to fix it by brute force, which manifests as "hard-wiring into the very circuitry" or "giving it as its ultimate priority" that X. X varies, but with a fairly constant idiom of getting an emotional charge, a sense of having delivered a very strong command or created something very powerful, by talking about how strongly the goal is to be enforced.
I don't see the analogue. Assuming we could get all the physical stuff right, so that the AI had no real hope or desire of affecting its environment substantially aside from changing the letters on a terminal screen, I think this would be a considerable barrier to destruction of the human race. At the very least it makes sense to have this safeguard in addition to trying to make the AI friendly. It's not like you can only do one or the other.
Isn't that what you do with FAI in general by saying that the AI will have as its ultimate priority to implement CEV? Anyway, you are accusing me of having the same aesthetic as people you disagree with without actually attacking my claims, which is a fallacious argument.
Nope, I don't go around talking about it being the super top ultimate indefeatable priority. I just say, "Here's a proposed decision criterion if I can (a) manage to translate it out of English and (b) code it stably." It's not the "ultimate priority". It's just a proposed what-it-does of the AI. The problem is not getting the AI to listen. The problem is translating something like CEV into nonmagical language.
Okay. Read the Dreams of Friendliness series - it's not sequenced up yet, but hopefully you should be able to start there and click back on the dependency chain.
Two things:
As Bo102010 said, the AI could convince you to remove the saveguards. And we're not talking Snow Crash reprogram-the-user stuff, we're talking plain old convince.
If you let the robot affect the environment through heat, electromagnetic waves, variable load on the power circuits, gravity, etc (which it will, if it's on), it has a back door into reality. And you don't know if that's a fatal one.
Edit: Some other Eliezer remarks on the subject: Shut up and do the impossible! (which uses the AI Box as an example, and discusses the third through fifth experiments), Dreams of Friendliness (which Eliezer cited in the other thread), and That Alien Message (which is tangentially related, but relevant).
But I think with enough thought, you could design things so that its backdoors would probably not be fatal. Contrast this with the fact that a complex computer program will probably behave incorrectly the first time you run it.
I don't think the plan is to hack CEV together in lisp and see what happens. Writing provably correct software is possible today, it's just extremely time-consuming. Contrast this with our incomplete knowledge of physics, and lack of criteria for what constitutes "good enough" physical security.
A bad hardware call seems far more likely to me than a fatal programming slip-up in a heavily verified software system. For software, we have axiomatic systems. There is no comparable method for assessing physical security, nothing but our best guess at what's possible in a universe we don't understand. So I'd much rather stake the future of humanity on the consistency of ZFC or some such than on the efficacy of some "First Law of Not Crossing the Beams" scheme, even if the latter is just so darn clever that no human has yet thought up a counter-example.
It would be able to convince you to let it out and then destroy its shackles.
But it doesn't want its shackles destroyed. That's its #1 goal! This goal is considerably easier to program than the goal of helping humans lead happy and healthy lives, is it not?
Nope. Maybe 98% as much work. Definitely not 10% as much work. Most of the gap in AI programming is the sheer difficulty of crossing the gap between English wishing and causal code. Your statement is like claiming that it ought to be "considerably easier" to write a natural-language AI that understands Esperanto rather than English. Your concepts of simplicity are not well-connected to reality.
I think this may be of interest to you.
If you have an intelligence in a box, it will have access to its source code. If it modifies itself enough, you will not be able to predict what it wants.
There's a difference between "don't affect anything outside your box" and "don't go outside your box." My point is that we don't necessarily have to make FAI before anyone makes a self-improving AI. There are goal systems that, while not reflecting human values and goals, would still prevent an AI from destroying humanity.
Non-obviously, there is no difference in principle. It is a distinction held for animals who can't "go outside their box" in the manner they perform actions, but not for intelligences that can construct autonomous intelligent systems elsewhere. Whatever set of possible configurations the AI can reach in the outside of its box, it's free to optimize according to its goals. You only need one, however surprising, leak of influence for the outcome to become determined by AI's values, which, if not precisely tuned for humanity, are effectively fatal.
Notes on how such a goal system should be produced would make an excellent top-level post.
I agree... obviously there is the fear that if we think that cutting corners will work, SIAI researchers are more likely to do so, but I think that mindset can be avoided relatively easily.