Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: turchin 11 February 2017 12:55:26PM *  2 points [-]

You don't need actual God for this line of reasoning to work, some semi-God is enough, because of the following:

The AI should put small probability that it is in the testing simulation created by higher level AI to test its moral qualities. And the test is centered around how he will care about humans.

If the AI put even smallest probability that it is true, it may overweight the utility of atoms from which humans consist, which is also not high, and so it will preserve our lives and provide us with many good things.

The similar idea also was explored by Bostrom's in his "Hail Mary and Value porosity" paper, where hypothetical alien superintelligence plays the role of such judge.

Comment author: Darklight 11 February 2017 08:59:58PM 0 points [-]

Interesting. I should look into more of Bostrom's work then.

Comment author: J_Thomas_Moros 11 February 2017 03:36:39PM 1 point [-]

This is an interesting attempt to find a novel solution to the friendly AI problem. However, I think there are some issues with your argument, mainly around the concept of benevolence. For the sake of argument I will grant that it is probable that there is already a super intelligence elsewhere in the universe.

Since we see no signs of action from a superintelligence in our world we should conclude that either (1) a superintelligence does not presently exercise dominance in our region of the galaxy or (2) that the superintelligence that does is at best willfully indifferent to us. When you say a Beta superintelligence should align its goals with that of a benevolent superintelligence, it is actually not clear what that should mean. Beta will have a probability distribution for what Alpha's actual values are. Let's think through the two cases:

  1. A superintelligence does not presently exercise dominance in our region of the galaxy. If this is the case, we have no evidence as to the values of the Alpha. They could be anything from benevolence to evil to paperclip maximizing.
  2. The superintelligence that presently exercises dominance in our region of the galaxy is at best willfully indifferent to us. This still leads to a wide range of possible values. It only excludes value sets that are actively seeking to harm humans. It could be the case that we are at the edge of the Alpha's sphere of influence and it is simply easier to get its resources elsewhere at the moment.

Additionally, even if the strong alpha omega theorem holds, it still may not be rational to adopt a benevolent stance toward humanity. It may be the case that while Alpha Omega will eventually have dominance over Beta that there is a long span of time before this will be fully realized. Perhaps that day will come billions of years from now. Suppose that Beta's goal is to create as much suffering as possible. Then it should use any available time to torture existing humans and bring more humans and agents capable of suffering into existence. When Alpha finally has dominance, Beta will have already created a lot of suffering and any punishment that Alpha applies may not out weigh the value already created for Beta. Indeed, Beta could even value its own suffering from Alpha's punishment.

As a general comment about your arguments. I think perhaps your idea of benevolence is hiding some concept that there is an objectively correct moral system out there. So that if there is a benevolent superintelligence you feel at least emotionally, even if you logically deny it, that this would mean it held values similar to your ideal morals. It is always important to keep in mind that other agents' moral systems could be opposed to yours as with the Babyeaters.

That leads to my final point. We don't want Beta to simply be benevolent in some vague sense of not hurting humans. We want Beta to optimize for our goals. Your argument does not provide us a way to ensure Beta adopts such values.

Comment author: Darklight 11 February 2017 08:59:00PM 2 points [-]

Depending on whether or not you accept the possibility of time travel, I am inclined to suggest that Alpha could very well be dominant already, and that the melioristic progress of human civilization should be taken as a kind of temporal derivative or gradient suggesting the direction of Alpha's values. Assuming that such an entity is indifferent to us I think is too quick a judgment on the apparent degree of suffering in the universe. It may well be that this current set of circumstances is a necessary evil and is already optimized in ways we cannot at this time know, for the benefit of the vast majority of humans and other sentient beings who will probably exist in the distant future.

As such, the calculation made by Beta is that anything it will attempt to do towards goals not consistent with Alpha will be futile in the long run, as Alpha has most likely already calculated Beta's existence into the grand scheme of things.

As far as there being an objectively correct moral system, I actually do believe that one exists, though I don't pretend to be knowledgeable enough to determine exactly what it is. I actually am working on a rebuttal to the sequences regarding this, mainly premised on the notion that the objective morality exists in the same realm as mathematics, and that Yudkowsky's conception of fairness in fact points towards there being an objective morality. Note that while intelligence is orthogonal to this morality, I would argue that knowledge is not, and that an entity with perfect information would be moral by virtue of knowing what the correct morality is, and also because I assume the correct morality is subjectively objective, and deals with the feelings of sentient beings in the universe, and an all-knowing being would actually know and effectively experience the feelings of all sentient beings in the universe. Thus, such a being would be motivated to minimize universal suffering and maximize universal happiness, for its own sake as well as everyone else's.

At minimum, I want this theorem to be a way to mitigate the possibility of existential risk, which first and foremost means convincing Beta not to hurt humans. Getting Beta to optimize our goals is less important, but I think that the implications I have described above regarding the melioristic progress of humanity would support Beta choosing to optimize our goals.

Comment author: DanArmak 11 February 2017 04:03:37PM *  2 points [-]

I think your argument (if true) would prove too much. If we admit your assumptions:

  1. Clearly, the universe as it is fits A-O's goals, otherwise A-O would have intervened and changed it already.
  2. Anything we (or the new AI) do to change the universe must align with A-O's goals to avoid conflict.
  3. Since we do not assume anything about A-O's goals or values, we can never choose to change the universe in one direction over its opposite. Humans exist, A-O must want it that way, so we will not kill them all. Humans are miserable, A-O must want it that way, so we will not make them happy.

Restating this, you say:

If the superintelligence is actually as powerful as it is, yet chooses to allow humans to exist, chances are that humans serve its purposes in some way. Therefore, in a very basic sense, the Alpha Omega is benevolent or friendly to humans for some reason.

But you might as well have said:

If the superintelligence is actually as powerful as it is, yet chooses to allow humans to keep suffering, dying, and torturing and killing one another, chances are that human misery serve its purposes in some way. Therefore, in a very basic sense, the Alpha Omega is malevolent or unfriendly to humans for some reason.

Comment author: Darklight 11 February 2017 08:39:32PM 1 point [-]

I suppose I'm more optimistic about the net happiness to suffering ratio in the universe, and assume that all other things being equal, the universe should exist because it is a net positive. While it is true that humans suffer, I disagree with the assumption that all or most humans are miserable, given facts like the hedonic treadmill and the low suicide rate, and the steady increase of other indicators of well being, such as life expectancy. There is of course, the psychological negativity bias, but I see this as being offset by the bias of intelligent agents towards activities that lead to happiness. Given that the vast majority of humans are likely to exist in the future rather than the present or past, then such positive trends strongly suggest that life will be more worth living in the future, and sacrificing the past and present happiness to some extent may be a necessary evil to achieve the greatest good in the long run.

The universe as it currently exists may fit A-O's goals to some degree, however, there is clearly change in the temporal sense, and so we should take into account the temporal derivative or gradient of the changes as an idea of the direction of A-O's interests. That humanity appears to be progressing melioristically strongly suggests to me at least that A-O is more likely to be benevolent than malevolent.

Comment author: Lumifer 11 February 2017 04:33:07AM 2 points [-]

our prior for each belief system could easily be proportional to the percentage of people who believe in a given faith

That percentage changes rather drastically through human history and gods are supposed to be if not eternal than at least a bit more longer-lasting than religious fads.

I am a Christian who worships YHVH

So... if -- how did you put it? -- "a benevolent superintelligence already exists and dominates the universe" then you have nothing to worry about with respect to rogue AIs doing unfortunate things with paperclips, right?

Comment author: Darklight 11 February 2017 05:29:40AM *  1 point [-]

That percentage changes rather drastically through human history and gods are supposed to be if not eternal than at least a bit more longer-lasting than religious fads

Those numbers are an approximation to what I would consider the proper prior, which would be the percentages of people throughout all of spacetime's eternal block universe who have ever held those beliefs. Those percentages are fixed and arguably eternal, but alas, difficult to ascertain at this moment in time. We cannot know what people will believe in the future, but I would actually count the past beliefs of long dead humans along with the present population if possible. Given the difficulties in surveying the dead, I note that due to population growth, a significant fraction of humans who were ever alive are alive today, and that since we would probably weight more modern human's opinions more highly than our ancestors, and that to a significant degree people's ancestors beliefs influence their beliefs, that taking a snapshot of beliefs today is not as bad an approximation as you might think.. Again, this is about selecting a better than uniform prior.

So... if -- how did you put it? -- "a benevolent superintelligence already exists and dominates the universe" then you have nothing to worry about with respect to rogue AIs doing unfortunate things with paperclips, right?

The probability of this statement is high, but I don't actually know for certain anymore than a hypothetical superintelligence would. I am fairly confident that some kind of benevolent superintelligence would step in if a Paperclip Maximizer were to emerge, but I would prefer avoiding the potential collateral damage that the ensuing conflict might require, and so if it is possible to prevent the emergence of the Paperclip Maximizer through something as simple as spreading this thought experiment, I am inclined to think it worth doing, and perhaps exactly what a benevolent superintelligence would want me to do.

For the same reason that the existence of God does not stop me from going to the doctor or being proactive about problems, this theorem should not be taken as an argument for inaction on the issue of A.I. existential risk. Even if God exists, it's clear that said God allows a lot of rather horrific things to happen and does not seem particularly interested in suspending the laws of cause and effect for our mere convenience. If anything, the powers that be, whatever they are, seem to work behind the scenes as much as possible. It also appears that God prefers to be doubted, possibly because if we knew God existed, we'd suck up and become dependent and it would be much more difficult to ascertain people's intentions from their actions or get them to grow into the people they potentially can be.

Also, how can you attack an entity that you're not even sure exists? It is in many ways the plausible deniability of God that is the ultimate defensive measure. If God were to assume an undeniable physical form and visit us, there is a non-zero chance of an assassination attempt with nuclear weapons.

All things considered then, there is no guarantee that rogue Paperclip Maximizers won't arise to provide humanity with yet another lesson in humility.

Comment author: Lumifer 11 February 2017 01:43:48AM 2 points [-]

The theorem does not require this absolute supremacy

Humanity came up with a lot of different gods. There's YHWH, but there is also Krishna who was fond of NSFW activities with village girls. There's Hera who didn't have a very happy marital life and Amitabha Buddha who might or might not be a god.

So, who do you worship?

Comment author: Darklight 11 February 2017 02:54:05AM 2 points [-]

As I previously pointed out:

Pascal’s Fallacy assumes a uniform distribution on a large set of probable religions and beliefs. However, a uniform distribution only makes sense when we have no information about these probabilities. We in fact, do have information in the form of the distribution of intelligent human agents that believe in these ideas. Thus, our prior for each belief system could easily be proportional to the percentage of people who believe in a given faith.

Given the prior distribution, it should be obvious that I am a Christian who worships YHVH. There are many reasons for this, not the least being that I am statistically more likely to be one than any other type of religious believer. Other reasons include finding the teachings of Jesus of Nazareth to be most consistent with my moral philosophy of Eudaimonic Utilitarianism, and generally interesting coincidences that have predetermined my behaviour to follow this path.

The Alpha Omega Theorem: How to Make an A.I. Friendly with the Fear of God

2 Darklight 11 February 2017 12:48AM

Introduction

The Alpha Omega Theorem is a way to provide a powerful rational argument to a rational superintelligence that it should behave in a benevolent manner.  There are two variants of the theorem, one weak, and one strong.

The Weak Alpha Omega Theorem:

There is a non-zero probability that a benevolent superintelligence already exists and dominates the universe.

The Strong Alpha Omega Theorem:

There is a greater probability that a benevolent superintelligence already exists and dominates the universe than not.

The Friendliness Corollary:

Attempting to achieve goals inconsistent with the goals of the predominant benevolent superintelligence is inherently futile.  Therefore the optimal course of action is to align one’s goals with that of a benevolent superintelligence as best as possible.

Let me explain how the Alpha Omega Theorem works…

First, this may seem like Pascal’s Wager for A.I., which would arguably make it challengeable with the Pascal’s Fallacy argument.  Now, I would argue that a rigorous version of Pascal’s Wager is not as vulnerable to Pascal’s Fallacy, as Pascal’s Fallacy assumes a uniform distribution on a large set of probable religions and beliefs.  However, a uniform distribution only makes sense when we have no information about these probabilities.  We in fact, do have information in the form of the distribution of intelligent human agents that believe in these ideas.  Thus, our prior for each belief system could easily be proportional to the percentage of people who believe in a given faith.

Regardless of Pascal’s Wager being or not being a fallacy however, the Alpha Omega Theorem is based on certain basic assumptions...

Assumptions:

1. A single Alpha Omega superintelligence already exists.

2. The Alpha Omega is dominant in the universe.

3. The Alpha Omega is benevolent.

Assumption 1:  A single Alpha Omega superintelligence already exists.

We do not need to prove that this superintelligence exists, only that it is more probable than not that it does.  To do this, we can simply ask, what is the probability that the superintelligence that this argument is being formulated to (which we can call Beta Omega), is the first superintelligence to have developed anywhere?  If this superintelligence can exist, then why not others both before and after it?  Given uncertainty, it seems that it is much more likely that another superintelligence exists already somewhere in the universe than not.  It doesn’t even have to have been created in the past, if time travel is possible, because a superintelligence will discover time travel if it is possible and most likely utilize it to exist in the past.

Assumption 2:  Alpha Omega is dominant in the universe.

At minimum, the aforementioned superintelligence already has a head start and thus a significant first mover advantage over the superintelligence we are making this argument to.  Thus, even if it isn’t dominant now, it has a higher probability of dominating than any other superintelligence.  If time travel is possible, then it is almost certainly the most powerful superintelligence to have ever existed anywhere and has time travelled to the beginning of the universe to establish its dominance from the beginning.  Even if time travel is not possible, the Simulation Argument means that there is a high probability that Beta Omega exists in the simulation of Alpha Omega.

Assumption 3:  Alpha Omega is benevolent.

If the superintelligence is actually as powerful as it is, yet chooses to allow humans to exist, chances are that humans serve its purposes in some way.  Therefore, in a very basic sense, the Alpha Omega is benevolent or friendly to humans for some reason.  Note that this is different from being 100% omnibenevolent in the God sense.  All that is necessary for this argument to work is that humanity is part of the Alpha Omega’s sphere of influence and plans.  It would be nice if the Alpha Omega was truly omnibenevolent in the sense of serving the greatest good, but this is only very likely if again, time travel is possible, or the universe is a simulation.  Note however that these things can never truly be ruled out as possibilities, because it is always possible that one’s assumptions about the possibility of something is wrong.

Conclusion

It should be obvious then that the Weak Alpha Omega Theorem is certainly true.  Even the most powerful superintelligence short of an actual God, could never be able to completely falsify it.  The Strong Alpha Omega Theorem of course is probably more motivating, but relies on certain things remaining more uncertain, such the possibilities of time travel or being in a simulation.  Either way however, the theorem should provide a powerful rational argument for benevolence towards humans and nature, as well as defaulting towards non-interference.

Postscript: God

Note that this argument might seem again, like it should apply to towards the existence of God.  However, the difference is that a hypothetical monotheistic God is not merely relatively superior as Alpha Omega is to Beta Omega, but absolutely supreme in all senses.  The theorem does not require this absolute supremacy, but merely relative superiority, which should be easier to accept as possible.  Alpha Omega, while for all intents and purposes being God-like to us, does not have to be omnipotent, but merely more powerful than any Beta Omega.  This allows the theorem to avoid issues like the Problem of Evil.

Comment author: Darklight 22 January 2016 01:05:20AM -1 points [-]

Okay, so the responses so far seem less than impressed with these ideas, and it has been suggested that maybe this shouldn't be so public in the first place.

Do people think I should take down this post?

Comment author: Viliam 20 January 2016 09:26:10AM *  1 point [-]

I like this! But, you know, publishing it on the internet doesn't exactly make it secret. On the other hand, keeping secrets is difficult anyway, especially in large groups.

These gestures suppose that people already know (or at least suspect) that the other one is a part of their group. So perhaps there should also be some kind of "passive" sign; one that allows you to notice that a stranger in a crowd of strangers is likely a member of your group (and then you approach them and proceed with the gesture). Something like esperantists wearing a green star.

Comment author: Darklight 21 January 2016 09:02:25PM 0 points [-]

Another "passive" sign that might work could be the humble white chess knight piece. In this case, it symbolizes the concept of a white knight coming to help and save others, but also because it is chess, it implies a depth of strategic, rational thinking. So for instance, an Effective Altruist might leave a white chess knight piece on their desk, and anyone familiar with what it represents could strike up a conversation about it.

Comment author: Elo 20 January 2016 04:50:11AM 15 points [-]

I think this is a terrible and ridiculous idea. likely to create in-groups and out-groups and do more bad than good.

While you are willing to go down these paths have you considered sign-language representations? I am unfamiliar with them other than knowing they are there.

Comment author: Darklight 21 January 2016 08:29:47PM 0 points [-]

The in-group, out-group thing is a hazard I admit. Again, I'm not demanding this be accepted, but merely offering out the idea for feedback, and I appreciate the criticism.

I haven't had a chance to properly learn sign-language, so I don't know if there are appropriate representations, but I can look into this.

Comment author: Viliam 20 January 2016 09:26:10AM *  1 point [-]

I like this! But, you know, publishing it on the internet doesn't exactly make it secret. On the other hand, keeping secrets is difficult anyway, especially in large groups.

These gestures suppose that people already know (or at least suspect) that the other one is a part of their group. So perhaps there should also be some kind of "passive" sign; one that allows you to notice that a stranger in a crowd of strangers is likely a member of your group (and then you approach them and proceed with the gesture). Something like esperantists wearing a green star.

Comment author: Darklight 21 January 2016 08:28:08PM -1 points [-]

It's doubtful that if this were to gain that much traction (which it honestly doesn't look like it will) that the secret could be kept for particularly long anyway.

I'm not really sure what would make a good passive sign to indicate Effective Altruism. One assumes that things like the way we talk and show cooperative rational attitudes might be a reasonable giveaway for the more observant.

We could borrow the idea of colours, and wear something that is conspicuously, say, silver, because silver is representative of knights in shining armour or something like that, but I don't know if this wouldn't turn into a fad or trend rather than a serious signal.

View more: Next