In response to The Meaning of Right
Comment author: Sebastian_Hagen2 29 July 2008 03:20:20PM 1 point [-]

Thank you for this post. "should" being a label for results of the human planning algorithm in backward-chaining mode the same way that "could" is a label for results of the forward-chaining mode explains a lot. It's obvious in retrospect (and unfortunately, only in retrospect) to me that the human brain would do both kinds of search in parallel; in big search spaces, the computational advantages are too big not to do it.

I found two minor syntax errors in the post: "Could make sense to ..." - did you mean "Could it make sense to ..."? "(something that has a charge of should-ness" - that parenthesis is never closed.

Unknown wrote:

As I've stated before, we are all morally obliged to prevent Eliezer from programming an AI.

Speak for yourself. I don't think EliezerYudkowsky::Right is quite the same function as SebastianHagen::Right, but I don't see a real chance of getting an AI that optimizes only for SebastianHagen::Right accepted as sysop. I'd rather settle for an acceptable compromise in what values our successor-civilization will be built on than see our civilization being stomped into dust by an entirely alien RPOP, or destroyed by another kind of existential catastrophe.

Comment author: Sebastian_Hagen2 16 July 2008 11:55:10AM 0 points [-]

It's harder to answer Subhan's challenge - to show directionality, rather than a random walk, on the meta-level.

Even if one is ignorant of what humans mean when they talk about morality, or what aspects of the environment influence it, it should be possible to determine whether morality-development over time follows a random walk empirically: a random walk would, on average, cause more repeated reversals of a given value judgement than a directional process.
For performing this test, one would take a number of moral judgements that have changed in the past, and compare their development from a particular point in human history (the earlier, the better; unreversed recent changes may have been a result of the random walk only becoming sufficiently extreme in the recent past) to now, counting how often those judgements flipped during historical development. I'm not quite sure about the conditional probabilities, but a true random walk should result in more such flips than a directional (even a noisy directional) process.
Does anyone have suggestions for moral values that changed early in human development?

Comment author: Sebastian_Hagen2 05 July 2008 01:14:40PM 0 points [-]

Regarding the first question,

Why do people seem to mean different things by "I want the pie" and "It is right that I should get the pie"?

I think the meaning of "it is (morally) right" may be easiest to explain through game theory. Humans in the EEA had plenty of chances for positive-sum interactions, but consistently helping other people runs the risk of being exploited by defection-prone agents. Accordingly, humans may have evolved a set of adaptions to exploit non-zero sumness between cooperating agents, but also avoid cooperating with defectors. Treating "X is (morally) right" as a warning of the form "If you don't do X, I will classify that as defection" explains a lot. Assume a person A has just (honestly) warned a person B that "X is the right thing to do":

If B continues not do X, A will likely be indignant; indignancy means A will be less likely to help B in the future (which makes sense according to game theory), and might also recommend the same to other members of the tribe. B might accept the claim about rightness; this will make it more likely for him to do the "right" thing. Since, in the EEA, being ostracized by the tribe would result in a significant hit to fitness, it's likely for there to be an adaption predisposing people to evaluate claims about rightness in this manner. B's short-term desires might override his sense of "moral rightness", leading to him doing the (in his own conception) "wrong" thing. While B can choose to do the wrong thing, he cannot change which action is right by a simple individual decision, since the whole point of evaluating rightness at all is to evaluate it the same way as other people you interact with.

According to this view, moral duties function as rules which help members of a society to identify defectors (by defectors violating them).

Comment author: Sebastian_Hagen2 03 July 2008 03:08:04PM 3 points [-]

This post reminds me a lot of DialogueOnFriendliness.

There's at least one more trivial mistake in this post:

Is their nothing more to the universe than their conflict?

s/their/there/

Constant wrote:

Arguably the difficulty the three have in coming to a conclusion is related to the fact that none of the three has anything close to a legitimate claim on the pie.

If you modify the scenario by postulating that the pie is accompanied by a note reading "I hereby leave this pie as a gift to whomever finds it. Enjoy. -- Flying Pie-Baking Monster", how does that make the problem any easier?

In response to The Moral Void
Comment author: Sebastian_Hagen2 01 July 2008 06:45:00PM 0 points [-]

Hal Finney:
Why doesn't the AI do it verself? Even if it's boxed (and why would it be, if I'm convinced it's an FAI?), at the intelligence it'd need to make the stated prediction with any degree of confidence, I'd expect it to be able to take over my mind quickly. If what it claims is correct, it shouldn't have any qualms about doing that (taking over one human's body for a few minutes is a small price to pay for the utility involved).
If this happened in practice I'd be confused as heck, and the alleged FAI being honest about its intentions would be pretty far down on my list of hypotheses about what's going on. I'd likely stare into space dumbfounded until I found some halfway-likely explanation, or the AI decided to take over my mind after all.

Comment author: Sebastian_Hagen2 29 June 2008 09:25:00PM 0 points [-]

Are there no vegetarians on OvBias?

I'm a vegetarian, though not because I particularly care about the suffering of meat animals.

Sebastian Hagen, people change. Of course you may refuse to accept it, but the current you will be dead in a second, and a different you born.

Of course people change; that's why I talked about "future selves" - the interesting aspect isn't that they exist in the future, it's that they're not exactly the same person as I am now. However, there's still a lot of similarity between my present self and my one-second-in-the-future self, and they have effectively the same optimization target. Moreover, these changes are largly non-random and non-degenerative: a lot of them are a part of my mind improving its model of the universe and getting more effective at interacting with it.
I don't think it is appropriate to term such small changes "death". If an anvil drops on my head, crushing my brain to goo, I immediately lose more optimization power than I do in a decade of living without fatal accidents. The naive view of personal identity isn't completely accurate, but the reason that it works pretty well in practice is that (in our current society) humans don't change particularly quickly, except for when they suffer heavy injuries.

The anvil-dropped-on-head-scenario is what I envisioned in my last post: something annihilating or massively corrupting my mind, destroying the part that's responsible for evaluating the desirability of hypothetical states of the universe.

Comment author: Sebastian_Hagen2 29 June 2008 05:32:03PM 1 point [-]

Suppose you learned, suddenly and definitively, that nothing is moral and nothing is right; that everything is permissible and nothing is forbidden.

I'm a physical system optimizing my environment in certain ways. I prefer some hypothetical futures to others; that's a result of my physical structure. I don't really know the algorithm I use for assigning utility, but that's because my design is pretty messed up. Nevertheless, there is an algorithm, and it's what I talk about when I use the words "right" and "wrong".
Moral rightness is fundamentally a two-place function: it takes both an optimization process and a hypothetical future as arguments. In practice, people frequently use the curried form, with themselves as the implied first argument.

Suppose I proved that all utilities equaled zero.

That result is obviously false for my present self. If the proof pertains to that entity, it's either incorrect or the formal system it is phrased in is inappropriate for modeling this aspect of reality.
It's also false for all of my possible future selves. I refuse to recognize something which doesn't have preferences over hypothetical futures as a future-self of me; whatever it is, it's lost too many important functions for that.

Comment author: Sebastian_Hagen2 18 June 2008 03:59:23PM 0 points [-]

Here's my vision of this, as a short scene from a movie. Off my blog: The Future of AI

To me, the most obvious reading of that conversation is that a significant part of what the AI says is a deliberate lie, and Anna is about to be dumped into a fun-and-educational adventure game at the end. Did you intend that interpretation?

In response to Timeless Control
Comment author: Sebastian_Hagen2 08 June 2008 10:52:55AM 0 points [-]

Eliezer:

If you think as though the whole goal is to save on computing power, and that the brain is actually fairly good at this (it has to be), then you won't go far astray.

Ah, thanks! I hadn't considered why you would think about isolated subystems in practice; knowing about the motivation helps a lot in filling in the implementation details.

In response to Timeless Control
Comment author: Sebastian_Hagen2 07 June 2008 10:32:05AM 1 point [-]

I'm trying to see exactly where your assertion that humans actually have choice comes in.

"choice" is a useful high-level abstraction of certain phenomena. It's a lossy abstraction, and if you had infinite amounts of memory and computing power, you would have no need for it, at least when reasoning about other entities. It exists, in exactly the same way in which books (the concept of a book is also a high-level abstraction) exist.
If that sounded wrong or like nonsense to you, please taboo "choice" and explain what exactly your question is.

I also have a question of my own, regarding the rock-hill-system:

If you isolate a subsystem of reality, like a rock rolling down hill, then you can mathematically define the future-in-isolation of that subsystem; you can take the subsystem in isolation, and compute what would happen to it if you did not act on it. In this case, what would happen is that the rock would reach the bottom of the hill.

How does this isolation work? Do you assume that the forces acting on the system from outside stay constant (in some undefined fashion), without explicitly modeling the outside? If I assume *no* further interactions with the outside, I don't expect to see the rock rolling down the hill, since there's no planet below to gravitationally attract it. Or was the planet supposed to be part of this system?

View more: Prev | Next