Arguments against the Orthogonality Thesis

JonatasMueller

The orthogonality thesis (formulated by Nick Bostrom in his article Superintelligent Will, 2011), states basically that an artificial intelligence can have any combination of intelligence level and goal. This article will focus on this simple question, and will only deal with the practical implementation issues at the end, that would need to be part of its full refutation according to Stuart Armstrong.

Meta-ethics

The orthogonality thesis is based on a variation of ethical values for different beings. This is either because the beings in question have some objective difference in their constitution that associates them to different values, or because they can choose what values they have.

That assumption of variation is arguably based on an analysis of humans. The problem with choosing values is obvious: making errors. Human beings are biologically and constitutionally very similar, and given this, if they objectively and rightfully differ in correct values, it is only in aesthetic preferences, by an existing biological difference. If they differ in other values, given that they are constitutionally similar, then the differing values could not be all correct at the same time, they would be differing due to error in choice.

Aesthetic preferences do vary for us, but they all connect ultimately to their satisfaction ― a specific aesthetic preference may satisfy only some people and not others. What is important is the satisfaction, or good feelings, that they produce, in the present or future (what might entail life preservation), which is basically the same thing to everyone. A given stimulus or occurrence is interpreted by the senses and it can produce good feelings, bad feelings, or neither, depending on the organism that receives it. This variation is besides the point, it is just an idiosyncrasy that could be either way: theoretically, any input (aesthetic preferences) could be associated with a certain output (good and bad feelings), or even no input at all, as in spontaneous satisfaction or wire-heading. In terms of output, good feelings and bad feelings always get positive and negative value, by definition.

Masochism is not a counter-example: masochists like pain only in very specific environments, associated with certain roleplaying fantasies, due to good feelings associated to it, or due to a relief of mental suffering that comes with the pain. Outside of these environments and fantasies, they are just as averse to pain as other people. They don't regularly put their hands into boiling water to feel the pain, nobody does.

Good and bad feelings are directly felt as positive and desirable; negative and aversive, and this direct verification gives them the highest epistemological value. What is indirectly felt, such as the world around us, science, or physical theories, depends on the senses and could therefore be an illusion, such as being part of a virtual world. We could, theoretically, be living inside virtual worlds in an underlying alien universe with different physical laws and scientific facts, but we can nonetheless be sure of the reality of our conscious experiences in themselves, which are directly felt.

There is a difference between valid and invalid human values, which is the ground of justification for moral realism: valid values have an epistemological justification, while invalid ones are based on arbitrary choice or intuition. The epistemological justification of valid values occurs by that part of our experiences which has a direct certainty, as opposed to indirect: conscious experiences in themselves. Likewise, only conscious beings can be said to be ethically relevant in themselves, while what goes on in the hot magma at the core of the earth, or in a random rock in Pluto, are not. Consciousness creates a subject of experience, which is required for direct ethical value. It is straightforward to conclude, therefore, that good conscious experiences constitute what is good, and bad conscious experiences constitute what is bad. Good and bad are what ethical value is about.

Good and bad feelings (or conscious experiences) are physical occurrences, and therefore objectively good and bad occurrences, and objective value. Other fictional values without epistemological (or logical) justification are therefore in another category, and simply constitute the error which comes from allowing free choice of one's values for beings with a similar biological constitution.

Personal Identity

The existence of personal identities is purely an illusion that cannot be justified by argument, and clearly disintegrates upon deeper analysis (for why that is, see, e.g., this essay: Universal Identity, or for an introduction to the problem, see Less Wrong article The Anthropic Trilemma).

Different instances in time of a physical organism relate to it in the same way that any other physical organism in the universe does. There is no logical basis for privileging a physical organism's own viewpoint, nor the satisfaction of their own values over that of other physical organisms, nor for assuming the preponderance of their own reasoning over those of other physical organisms of contextually comparable reasoning capacity.

Therefore, the argument of variation or orthogonality could, at best, assume that a superintelligent physical organism with complete understanding of these cognitively trivial philosophical matters would have to consider all viewpoints and valid preferences in their utility function, in a way much similar to coherent extrapolated volition (CEV), extrapolating the values for intelligence and removing errors, but taking account of the values of all sentient physical organisms: not only humans, but also animals, and possibly sentient machines and aliens. The only values that are validly generalizable among such widely differing sentient creatures are good and bad feelings (in the present or future).

Furthermore, a superintelligent physical organism with such understanding would have to give equal weight to the reasoning of other physical organisms of contextually comparable reasoning capacity (depending on the cognitive demands of the context, or problem, even some humans can reason perfectly well), if existent. In case of convergence, this would be a non-issue. In case of divergence, this would force an evaluation of reasons or argumentation, seeking a convergence or preponderance of argument.

Conclusions

Taking the orthogonality thesis to be merely the assumption of divergence of ethical values of superintelligent agents, but not a statement about the issues with practical implementation and tampering with or forcing them by non-superintelligent humans, then there are two fatal arguments against it, one on the side of meta-ethics (moral realism), and one on the side of personal identity (open/empty individualism, or universal identity).

Beings with general superintelligence should find these fundamental philosophical matters trivial (meta-ethics and personal identity), and understand them completely. They should take a non-privileged and objective viewpoint, accounting for all perspectives of physical subjects, and giving (a priori) similar consideration for the reasoning of all physical organisms of contextually comparable reasoning capacity.

Furthermore they would understand that the free variation of values, even in comparable causal chains of biologically similar organisms, comes from error, and that their extrapolation for intelligence would result in moral realism with good and bad feelings as the epistemologically justified and only valid direct values, from which all other indirectly or instrumentally valuable actions derive their indirect value. For instance, survival, which in a paradise can have positive value, coming from good feelings in the present and future, and which in a hell can have negative value, coming from bad feelings in the present and in the future.

Perhaps certain architectures or contexts involving beings with superintelligence, caused by beings without superintelligence and erratic behavior, could be forced to produce unethical results. This seems to be the most grave existential risk that we face, and would not come from beings with superintelligence themselves, but from human error. The orthogonality thesis is fundamentally mistaken in relation to beings with general superintelligence (surpassing all human cognitive capacities), but it might be practically realized by non-superintelligent human agents.

Meta-ethics

Personal Identity

Conclusions

Hi Carl,

Thank you for a thoughtful comment. I am not used to writing didactically, so forgive my excessive conciseness.

You understood my argument well, in the 5 points, with the detail that I define value as good and bad feelings rather than pleasure, happiness, suffering and pain. The former definition allows for subjective variation and universality, while the latter utilitarian definition is too narrow and anthropocentric, and could be contested on these grounds.

What kind of value do you mean here? Impersonal ethical value? Impact on behavior? Different sorts of pleasurable and painful experience affect motivation and behavior differently, and motivation does not respond to pleasure or pain as such, but to some discounted transformation thereof. E.g. people will accept a pain 1 hour hence in exchange for a reward immediately when they would not take the reverse deal.

I mean ethical value, but not necessarily impact on behavior or motivation. Indeed, people do accept trades between good and bad feelings, and they can be biased in terms of motivation.

Does this apply to other directly felt moral intuitions, like anger or fairness? Later you say that our best theories show that personal identity is an illusion, despite our perception of continued existence over time, and so we would discard it. What distinguishes the two?

It does not apply in the same way to other moral intuitions, like anger or fairness. The latter are directly felt in some way, and in this sense they are real, but they also have a context related to the world that is indirectly felt and could be false. Anger, for instance, can be directly felt as a bad feeling, but its causation and subsequent behavioral motivation relate to the outside world, and are in another level of certainty (not as certain). Likewise, it could be said that whatever caused good or bad feelings (such as kissing a woman) is not universal and not as certain as the good feeling itself which was caused by it in a person, and was directly verified by them. This person doesn't know if he is inside a Matrix virtual world and if the woman was really a woman or just computer data, but he knows that the kiss led to directly felt good feelings. The distinction is that one relates to the outside world, and another relates to itself.

How are good and bad feelings physical occurrences in a way that knowledge or health or equality or the existence of other outcomes that people desire are not?

Good question. The goodness and badness of feelings is directly felt as so, and is a datum of highest certainty about the world, while the goodness or badness of these other physical occurrences (which are indirectly felt) is not data, but inferences, which though generally trustworthy, need to be justified eventually by being connected to intrinsic values.

Earlier you privileged pleasure as a value because it is directly experienced. But an organism directly experiences, and is conditioned or reinforced by its own pain or pleasure.

Indeed. However, in acting on the world, an organism has to assume a model about the world which they are going to trust as true, in order to act ethically. In this model of the world, in the world as it appears to us, the organism would consider the nature of personal identity and not privilege its own viewpoint. However, you have a reason that, strictly, one's own experiences are more certain than those of others. The difference in this certainty could be thought of as the difference between direct conscious feelings and physical theories. Let's say that the former get ascribed a certainty of 100%, while the latter get 95%. The organism might then put 5% more value to its own experiences, not fundamentally, but based on the solipsistic hypothesis that other people are zombies, or that they don't really exist.

Error in what sense? If desires are mostly learned through reward and ranticipations of to reward, one can note when the resulting desires do not maximize some metric of personal pleasure or pain (e.g. to be remembered after one dies, or for equality). But why identify with the usual tendency of reinforcement learning rather than the actual attitudes and desires one has?

I meant in that case intrinsic values. But what you meant, for instance for equality, can be thought of instrumental values. Instrumental values are taken as heuristics or in decision theory as patterns of behavior that usually lead to intrinsic values. Indeed, in order to achieve direct or intrinsic value, the best way tends to be following instrumental values, such as working, learning, increasing longevity... I argue that the validity of these can be examined by the extent that they lead to direct value, being good and bad feelings, in a non-personal way.

OK, that is the interpretation I found less convincing. The bare axiomatic normative claim that all the desires and moral intuitions not concerned with pleasure as such are errors with respect to maximization of pleasure isn't an argument for adopting that standard.

And given the admission that biological creatures can and do want things other than pleasure, have other moral intuitions and motivations, and the knowledge that we can and do make computer programs with preferences defined over some model of their environment that do not route through an equiva... (read more)

-17

Arguments against the Orthogonality Thesis

-17

Meta-ethics

Personal Identity

Conclusions

-17

-17

Arguments against the Orthogonality Thesis

-17

Meta-ethics

Personal Identity

Conclusions

-17