Comment author: faul_sname 04 June 2012 03:35:10PM -1 points [-]

In such a case, the median outcome of all agents will be improved if every agent with the option to do so takes that offer, even if they are assured that it is a once/lifetime offer (because presumably there is variance of more than 5 utils between agents).

Comment author: CuSithBell 04 June 2012 03:56:29PM *  2 points [-]

But the median outcome is losing 5 utils?

Edit: Oh, wait! You mean the median total utility after some other stuff happens (with a variance of more than 5 utils)?

Suppose we have 200 agents, 100 of which start with 10 utils, the rest with 0. After taking this offer, we have 51 with -5, 51 with 5, 49 with 10000, and 49 with 10010. The median outcome would be a loss of -5 for half the agents, a gain of 5 for half, but only the half that would lose could actually get that outcome...

And what do you mean by "the possibility of getting tortured will manifest itself only very slightly at the 50th percentile"? I thought you were restricting yourself to median outcomes, not distributions? How do you determine the median distribution?

In response to comment by CuSithBell on Fake Causality
Comment author: royf 04 June 2012 06:13:17AM 2 points [-]

I am saying pretty much exactly that. To clarify further, the words "deliberate", "conscious" and "wants" again belong to the level of emergent behavior: they can be used to describe the agent, not to explain it (what could not be explained by "the agent did X because it wanted to"?).

Let's instead make an attempt to explain. A complete control of an agent's own code, in the strict sense, is in contradiction of Gödel's incompleteness theorem. Furthermore, information-theoretic considerations significantly limit the degree to which an agent can control its own code (I'm wondering if anyone has ever done the math. I expect not. I intend to look further into this). In information-theoretic terminology, the agent will be limited to typical manipulations of its own code, which will be a strict (and presumably very small) subset of all possible manipulations.

Can an agent be made more effective than humans in manipulating its own code? I have very little doubt that it can. Can it lead to agents qualitatively more intelligent than humans? Again, I believe so. But I don't see a reason to believe that the code-rewriting ability itself can be qualitatively different than a human's, only quantitatively so (although of course the engineering details can be much different; I'm referring to the algorithmic level here).

Generally GAIs are ascribed extreme powers around here

As you've probably figured out, I'm new here. I encountered this post while reading the sequences. Although I'm somewhat learned on the subject, I haven't yet reached the part (which I trust exists) where GAI is discussed here.

On my path there, I'm actively trying to avoid a certain degree of group thinking which I detect in some of the comments here. Please take no offense, but it's phrases like the above quote which worry me: is there really a consensus around here about such profound questions? Hopefully it's only the terminology which is agreed upon, in which case I will learn it in time. But please, let's make our terminology "pay rent".

In response to comment by royf on Fake Causality
Comment author: CuSithBell 04 June 2012 02:49:18PM 0 points [-]

You are saying that a GAI being able to alter its own "code" on the actual code-level does not imply that it is able to alter in a deliberate and conscious fashion its "code" in the human sense you describe above?

I am saying pretty much exactly that. To clarify further, the words "deliberate", "conscious" and "wants" again belong to the level of emergent behavior: they can be used to describe the agent, not to explain it (what could not be explained by "the agent did X because it wanted to"?).

Sure, but we could imagine an AI deciding something like "I do not want to enjoy frozen yogurt", and then altering its code in such a way that it is no longer appropriate to describe it as enjoying frozen yogurt, yeah?

Let's instead make an attempt to explain. A complete control of an agent's own code, in the strict sense, is in contradiction of Gödel's incompleteness theorem. Furthermore, information-theoretic considerations significantly limit the degree to which an agent can control its own code (I'm wondering if anyone has ever done the math. I expect not. I intend to look further into this). In information-theoretic terminology, the agent will be limited to typical manipulations of its own code, which will be a strict (and presumably very small) subset of all possible manipulations.

This seems trivially false - if an AI is instantiated as a bunch of zeros and ones in some substrate, how could Godel or similar concerns stop it from altering any subset of those bits?

Can an agent be made more effective than humans in manipulating its own code? I have very little doubt that it can. Can it lead to agents qualitatively more intelligent than humans? Again, I believe so. But I don't see a reason to believe that the code-rewriting ability itself can be qualitatively different than a human's, only quantitatively so (although of course the engineering details can be much different; I'm referring to the algorithmic level here).

You see reasons to believe that any artificial intelligence is limited to altering its motivations and desires in a way that is qualitatively similar to humans? This seems like a pretty extreme claim - what are the salient features of human self-rewriting that you think must be preserved?

Generally GAIs are ascribed extreme powers around here

As you've probably figured out, I'm new here. I encountered this post while reading the sequences. Although I'm somewhat learned on the subject, I haven't yet reached the part (which I trust exists) where GAI is discussed here.

On my path there, I'm actively trying to avoid a certain degree of group thinking which I detect in some of the comments here. Please take no offense, but it's phrases like the above quote which worry me: is there really a consensus around here about such profound questions? Hopefully it's only the terminology which is agreed upon, in which case I will learn it in time. But please, let's make our terminology "pay rent".

I don't think it's a "consensus" so much as an assumed consensus for the sake of argument. Some do believe that any hypothetical AI's influence is practically unlimited, some agree to assume that because it's not ruled out and is a worst-case scenario or an interesting case (see wedrifid's comment on the grandparent (aside: not sure how unusual or nonobvious this is, but we often use familial relationships to describe the relative positions of comments, e.g. the comment I am responding to is the "parent" of this comment, the one you were responding to when you wrote it is the "grandparent". I think that's about as far as most users take the metaphor, though.)).

Comment author: halcyon 04 June 2012 12:18:39PM *  0 points [-]

People predict the behavior of other people all the time.

And they're proved wrong all the time. So what you're saying is, the alien predicts my behavior using the same superficial heuristics that others use to guess at my reactions under ordinary circumstances, except he uses a more refined process? How well can that kind of thing handle indecision if my choice is a really close thing? If he's going with a best guess informed by everyday psychological traits, the inaccuracies of his method would probably be revealed before long, and I'd be at the numbers immediately.

"be the sort of person who picks one box, then pick both boxes"

I agree, I would pick both boxes if that were the case, hoping I'd lived enough of a one box picking life before.

but that the way to be the sort of person that picks one box is to pick one box, because your future decisions are entangled with your traits, which can leak information and thus become entangled with other peoples' decisions.

I beg to differ on this point. Whether or not I knew I would meet Dr. Superintelligence one day, an entire range of more or less likely behaviors is very much conceivable that violate this assertion, from "I had lived a one box picking life when comparatively little was at stake," to "I just felt like picking differently that day." You're taking your reification of selfhood WAY too far if you think Being a One Box Picker by picking one box when the judgement is already over makes sense. I'm not even sure I understand what you're saying here, so please clarify if I've misunderstood things. Unlike my (present) traits, my future decisions don't yet exist, and hence cannot leak anything or become entangled with anyone.

But what this disagreement boils down to is, I don't believe that either quality is necessarily manifest in every personality with anything resembling steadfastness. For instance, I neither see myself as the kind of person who would pick one box, nor as the kind who would pick both boxes. If the test were administered to me a hundred times, I wouldn't be surprised to see a 50-50 split. Surely I would be exaggerating if I said you claim that I already belong to one of these two types, and that I'm merely unaware of my true inner box-picking nature? If my traits haven't specialized into either category, (and I have no rational motive to hasten the process) does the alien place a million dollars or not? I pity the good doctor. His dilemma is incomparably more black and white than mine.

To summarize, even if I have mostly picked one box in similar situations in the past, how concrete is such a trait? This process comes nowhere near the alien's implied infallibility, it seems to me. Therefore, either this process or the method's imputed infallibility has got to go if his power is to be coherent.

Not only that, if that's all there is to the alien's ability, what does this thought experiment say, except that it's indeed possible for a rational agent to reward others for their past irrationality? (to grant the most meaningful conclusion I DO perceive) That doesn't look like a particularly interesting result to me. Such figures are seen in authoritarian governments, religions, etc.

Comment author: CuSithBell 04 June 2012 02:35:13PM 1 point [-]

Unlike my (present) traits, my future decisions don't yet exist, and hence cannot leak anything or become entangled with anyone.

Your future decisions are entangled with your present traits, and thus can leak. If you picture a Bayesian network with the nodes "Current Brain", "Future Decision", and "Current Observation", with arrows from Current Brain to the two other nodes, then knowing the value of Current Observation gives you information about Future Decision.

Obviously the alien is better than a human at running this game (though, note that a human would only have to be right a little more than 50% of the time to make one-boxing have the higher expected value - in fact, that could be an interesting test to run!). Perhaps it can observe your neurochemistry in detail and in real time. Perhaps it simulates you in this precise situation, and just sees whether you pick one or both boxes. Perhaps land-ape psychology turns out to be really simple if you're an omnipotent thought-experiment enthusiast.

The reasoning wouldn't be "this person is a one-boxer" but rather "this person will pick one box in this particular situation". It's very difficult to be the sort of person who would pick one box in the situation you are in without actually picking one box in the situation you are in.

One use of the thought experiment, other than the "non-causal effects" thing, is getting at this notion that the "rational" thing to do (as you suggest two-boxing is) might not be the best thing. If it's worse, just do the other thing - isn't that more "rational"?

In response to comment by royf on Fake Causality
Comment author: wedrifid 04 June 2012 05:09:00AM 0 points [-]

Having asserted that your claim is, in fact, new information

I wouldn't assert that. I thought I was stating the obvious.

can you please clarify and explain why you believe that?

See CuSithBell's reply.

In response to comment by wedrifid on Fake Causality
Comment author: CuSithBell 04 June 2012 05:20:49AM 0 points [-]

Having asserted that your claim is, in fact, new information

I wouldn't assert that. I thought I was stating the obvious.

Yes, I think I misspoke earlier, sorry. It was only "new information" in the sense that it wasn't in that particular sentence of Eliezer's - to anyone familiar with discussions of GAI, your assertion certainly should be obvious.

In response to comment by CuSithBell on Fake Causality
Comment author: royf 04 June 2012 05:08:54AM 0 points [-]

I believe that is a misconception. Perhaps I'm not being reasonable, but I would expect the level at which you could describe such a creature in terms of "desires" to be conceptually distinct from the level at which it can operate on its own code.

This is the same old question of "free will" again. Desires don't exist as a mechanism. They exist as an approximate model of describing the emergent behavior of intelligent agents.

In response to comment by royf on Fake Causality
Comment author: CuSithBell 04 June 2012 05:18:20AM 0 points [-]

You are saying that a GAI being able to alter its own "code" on the actual code-level does not imply that it is able to alter in a deliberate and conscious fashion its "code" in the human sense you describe above?

Generally GAIs are ascribed extreme powers around here - if it has low-level access to its code, then it will be able to determine how its "desires" derive from this code, and will be able to produced whatever changes it wants. Similarly, it will be able to hack human brains with equal finesse.

In response to comment by wedrifid on Fake Causality
Comment author: royf 04 June 2012 04:51:09AM 0 points [-]

Having asserted that your claim is, in fact, new information: can you please clarify and explain why you believe that?

In response to comment by royf on Fake Causality
Comment author: CuSithBell 04 June 2012 04:56:01AM 1 point [-]

An advanced AI could reasonably be expected to be able to explicitly edit any part of its code however it desires. Humans are unable to do this.

In response to comment by CuSithBell on Fake Causality
Comment author: wedrifid 04 June 2012 04:14:33AM 0 points [-]

To be even more fair I also explicitly structured my own claim such that it still technically applies to your reading. That allowed me to make the claim both technically correct to a pedantic reading and an expression of the straightforward point that the difference is qualitative. (The obvious alternative response was to outright declare the comment a mere equivocation.)

only then is the sentence interpreted as you describe.

Meaning that I didn't, in fact, describe.

In response to comment by wedrifid on Fake Causality
Comment author: CuSithBell 04 June 2012 04:26:22AM 0 points [-]

Not meant as an attack. I'm saying, "to be fair it didn't actually say that in the original text, so this is new information, and the response is thus a reasonable one". Your comment could easily be read as implying that this is not new information (and that the response is therefore mistaken), so I wanted to add a clarification.

In response to comment by royf on Fake Causality
Comment author: wedrifid 04 June 2012 03:55:56AM 1 point [-]

Sadly, we humans can't rewrite our own code, the way a properly designed AI could.

Sure we can!

Not the way a properly designed AI could. The difference is qualitative.

In response to comment by wedrifid on Fake Causality
Comment author: CuSithBell 04 June 2012 04:04:02AM 0 points [-]

To be fair, when structured as

Sadly, we humans can't rewrite our own code, the way a properly designed AI could.

then the claim is in fact "we humans can't rewrite our own code (but a properly designed AI could)". If you remove a comma:

Sadly, we humans can't rewrite our own code the way a properly designed AI could.

only then is the sentence interpreted as you describe.

Comment author: shminux 03 June 2012 10:49:17PM *  0 points [-]

To rationalize dust specks over torture, one can construct a utility function where utility of dust specks in n people is of the Zeno type, -(1-1/2^n), and the utility of torture is -2. Presumably, something else goes wrong when you do that. What is it?

Comment author: CuSithBell 04 June 2012 12:22:57AM 0 points [-]

Many find that sort of discounting to be contrary to intuition and desired results, e.g. the suffering of some particular person is more or less significant depending on how many other people are suffering in a similar enough way.

Comment author: Swimmer963 01 June 2012 12:48:30AM 2 points [-]

Yes yes yes! An awesome book!

Comment author: CuSithBell 03 June 2012 04:17:41PM 0 points [-]

Well! I may have to take a more in-depth look at it sometime this summer.

View more: Prev | Next