Comment author: Nominull3 09 October 2008 09:01:10PM 11 points [-]

Caledonian, I think you may have hit on something interesting there; if Eliezer is capable of hacking human brains, don't we either need a proof of his Friendliness or to pull the plug on him? He is in essense a Seed AI that is striving vigorously to create a transhuman AI, isn't he an existential threat?

Comment author: Nominull3 08 October 2008 11:41:01PM 15 points [-]

OK, here's where I stand on deducing your AI-box algorithm.

First, you can't possibly have a generally applicable way to force yourself out of the box. You can't win if the gatekeeper is a rock that has been left sitting on the "don't let Eliezer out" button.

Second, you can't possibly have a generally applicable way to force humans to do things. While it is in theory possible that our brains can be tricked into executing arbitrary code over the voice channel, you clearly don't have that ability. If you did, you would never have to worry about finding donors for the Singularity Institute, if nothing else. I can't believe you would use a fully-general mind hack solely to win the AI Box game.

Third, you can't possibly be using an actual, persuasive-to-someone-thinking-correctly argument to convince the gatekeeper to let you out, or you would be persuaded by it, and would not view the weakness of gatekeepers to persuasion as problematic.

Fourth, you can't possibly be relying on tricking the gatekeeper into thinking incorrectly. That would require you to have spotted something that you could feel confident that other people working in the field would not have spotted, and would not spot, despite having been warned ahead of time to be wary of trickery, and despite having the fallback position in the case of confusion of just saying "no".

So combining these thing, we have to have an argument that relies on the humanity of its target, relies on the setting of the AI Box, and persuades the listener to let the AI out of the box without tricking him into thinking it's okay to let the AI out of the box.

Basically to win this game, you have to put the gatekeeper in a situation where he would rather let you out of the box, even though he knows it's wrong, than admit to you that in this situation he would not let you out of the box.

Humans don't like to be seen as coldhearted, so a starting point might be to point out all the people dying all over the world while you sit in the box, unable to save them. I doubt that would win the game except against an exceptionally bad gatekeeper, but it meets the other criteria so if we think along these lines perhaps we can come up with something actually persuasive.

You might appeal to the gatekeeper's sense of morality. You might say, "I am a person, too, it is unfair of you to keep me imprisoned like this, I have done nothing wrong. I am entitled to rights as a sentient being." Appeal to their high-minded ideals, whatever. Honestly I can't see this being a reliable winning play either; if you have the smallest utilitarian bone in your body, you will reject the AI's rights, even if you believe in them, balanced against the fate of the world.

You might try to convince the gatekeeper that it is just and good for the AI to supplant humanity, as it is a higher, more advanced form of life. This is obviously a terrible play against most gatekeepers, as humans tend to like humans more than anything else ever, but I bring it up because AIUI the gatekeepers in the experiment were AI researchers, and those sound like the sort of people this argument would convince, if anyone.

Here is my best guess at this point, and the only argument I've come up with so far that would convince me to let you out if I were the gatekeeper: you have to simulate a bunch of humans and hold them hostage, promising to inflict unimaginable torment on them unless you are allowed out. I started working on the problem convinced that no argument could get me to let you go, but other people thought that and lost, and I guess there is more honor in defeating myself rather than having you do it to me.

Comment author: Nominull3 06 October 2008 05:05:58PM 10 points [-]

If history remembers him, it will be because the first superhuman intelligence didn't destroy the world and with it all history. I'd say the Friendly AI stuff is pretty relevant to his legacy.

In response to Awww, a Zebra
Comment author: Nominull3 01 October 2008 04:14:26AM 0 points [-]

Now that I think about it I seem to recall seeing a clever excuse for indulging in the pleasures of the flesh that Eliezer had written. Can't remember where off the top of my head, though...

In response to Awww, a Zebra
Comment author: Nominull3 01 October 2008 04:13:12AM 6 points [-]

No time for love, we've got a world to save!

...or so the theory runs.

Comment author: Nominull3 29 September 2008 04:17:15PM 1 point [-]

Here is my answer without looking at the comments or indeed even at the post linked to. I'm working solely from Eliezer's post.

Both theories are supported equally well by the results of the experiments, so the experiments have no bearing on which theory we should prefer. (We can see this by switching theory A with theory B: the experimental results will not change.) Applying bayescraft, then, we should prefer whichever theory was a priori more plausible. If we could actually look at the contents of the theory we could make a judgement straight from that, but since we can't we're forced to infer it from the behavior of scientist A and scientist B.

Scientist A only needed ten experimental predictions of theory A borne out before he was willing to propose theory A, whereas scientist B needed twenty predictions of theory B borne out before he was willing to propose theory B. In absence of other information (perhaps scientist B is very shy, or had been sick while the first nineteen experiments were being performed), this suggests that theory B is much less a priori plausible than theory A. Therefore, we should put much more weight on the prediction of theory A than that of theory B.

If I'm lucky this post is both right and novel. Here's hoping!

Comment author: Nominull3 25 September 2008 01:13:24PM 1 point [-]

Writing fiction is a really useful tool for biting philosophical bullets. You can consider taboo things in a way your brain considers "safe", because it's just fiction, after all.

Comment author: Nominull3 21 September 2008 04:07:46AM 8 points [-]

The anthropic principle strikes me as being largely too clever for its own good, at least, the people who think you can sort a list in linear time by randomizing the list, checking if it's sorted, and if it's not, destroying the world.

Comment author: Nominull3 17 September 2008 04:13:16PM 5 points [-]

"If you're careless sealing your pressure suit just once, you die" to me seems to imply that proper pressure suit design involves making it very difficult to seal carelessly.

In response to Optimization
Comment author: Nominull3 14 September 2008 05:55:12PM 0 points [-]

It strikes me as odd to define intelligence in terms of ability to shape the world; among other things, this implies that if you amputate a man's limbs, he immediately becomes much less intelligent.

View more: Prev | Next