Stuart has worked on further developing the orthogonality thesis, which gave rise to a paper, a non-final version of which you can see here: http://lesswrong.com/lw/cej/general_purpose_intelligence_arguing_the/ 

This post won't make sense if you haven't been through that. 

 

Today we spent some time going over it and he accepted my suggestion of a minor amendment. Which best fits here. 

 

Besides all the other awkward things that a moral convergentist would have to argue for, namely:

This argument generalises to other ways of producing the AI. Thus to deny the Orthogonality thesis is to assert that there is a goal system G, such that, among other things:

 

  1. There cannot exist any efficient real-world algorithm with goal G.
  2. If a being with arbitrarily high resources, intelligence, time and goal G, were to try design an efficient real-world algorithm with the same goal, it must fail.
  3. If a human society were highly motivated to design an efficient real-world algorithm with goal G, and were given a million years to do so along with huge amounts of resources, training and knowledge about AI, it must fail.
  4. If a high-resource human society were highly motivated to achieve the goals of G, then it could not do so (here the human society is seen as the algorithm).
  5. Same as above, for any hypothetical alien societies.
  6. There cannot exist any pattern of reinforcement learning that would train a highly efficient real-world intelligence to follow the goal G.
  7. There cannot exist any evolutionary or environmental pressures that would evolving highly efficient real world intelligences to follow goal G.

We can add:
8. If there were a threshold of intelligence above which any agent will converge towards the morality/goals asserted by the anti-orthogonalist, there cannot exist any system, composed of a multitude of below-threshold intelligences that will as a whole pursue a different goal (G) than the convergent one (C), without any individual agent reaching the threshold. 

Notice in this case each individual might still desire the goal (G). We can specify it even more by ruling out this case altogether. 

9. There cannot be any Superorganism-like groups of agents, each with sub-threshold intelligence, whose goals differ from G, whom if acting towards their own goals could achieve G. 

This would be valuable in case in which the threshold for convergence is i units of intelligence, or i-s units of intelligence plus knowing that goal C exists in goal space (C would be the goal towards which they allegedly would converge), and to fully grasp G requires understanding C. 

---------    

A separately interesting issue that has come up is that there seems to be two distinct conceptions of why convergent goals would converge, and some other people might be as unaware of that as it seemed we were. 

Case 1: Goals would converge because there is the right/correct/inescapable/imperative set of goals, and anything smart enough will notice that those are the right ones, and start acting towards them.    

(this could but be moral realism, but needn't, in particular because moral realism doesn't mean much in most cases)  

Case 2: There's a fact that any agent, upon achieving some particular amount of intelligence will start to converge in their moral judgements and assessments, and regardless of those being true/right/correct etc,,, the agents will converge into them. So whichever those happen to be, a)Moral convergence is the case and b)We should call those the Moral Convergent Values or some other fancy name. 

The distinction between them is akin to that of of and that.  So group one believes, of the convergent moral values, that agents will converge to them. The other group believes that convergent values, whichever they are, should be given distinct conceptual importance and a name. 

Stuart and I were inclined to think that Case 2 is more defensible/believable, though both fail at surviving the argument for the orthogonality thesis.   

 

New Comment
22 comments, sorted by Click to highlight new comments since: Today at 12:55 AM

Out of curiosity, is rejection of the Orthogonality thesis a common position in philosophy? (If you can make a guess at what percentage of philosophers reject it, that'd be cool.)

I seem to remember always finding it intuitively obvious, so it's difficult for me to understand why someone would disagree with it (except for being a theist, maybe).

Someone with a hardcore 'rationalist' position (someone who thought all moral statements could be derived from first principles e.g. a Kantian) would probably reject it, but they're basically extinct in the wild.

In what sense is this a 'rationalist' position?

In the sense of moral rationalism. The fact that rationalist can be used to refer to rationality or rationalism is unfortunate, but IIRC (to busy to search for it) we've had a few debates about terminology and decided that we currently are using the least bad options.

Indeed. Its a problem of language evolution.

To summarise a few centuries of Philosophy very briefly: A lng tie ago there were Rationalists who thought everything could be proven by pure reason, and Empiricists who depended on observation of the external world. Because Reason was often used in contrast to emotion (and because of the association with logic and mathematics) "Rational" evolved into a general word for reasonable or well argued. The modern rationalist movement is about thinking clearly and coming to correct conclusions, which can't really be done by relying exclusively on pure reason. (Hence why moral rationalists in the original sense don't really exist anymore)

Moral motivation: internalism or externalism?

Other 329 / 931 (35.3%)

Accept or lean toward: internalism 325 / 931 (34.9%)

Accept or lean toward: externalism 277 / 931 (29.8%)

source

Internalism is the belief that it is a necessary truth that, if A believes X to be wrong/right, A is at least partly motivated to avoid/promote/honour X. Externalism is usually considered to be the denial of internalism, so I don't know what 35.5% of people are talking about. My guess is they meant "don't know".

it's difficult for me to understand why someone would disagree with it

Typical mind fallacy, but with respect to the entirety of mindspace?

The Orthogonality Thesis:

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.

It seems true - but pretty irrelevant. We mostly care about real world agents - not what could "in principle" be constructed. It's a kind of weasel wording - no doubt intended to provoke concern about evil geniuses.

Try chapter 7 of "Good and Real" by Gary Drescher.

David Pearce appears to be, judging by his posts here.

We should call those the Moral Convergent Values or some other fancy name.

These are the same as Universal Instrumental Values? Or is there a reason to think that something different would be valued?

Incidentally, value convergence could involve multiple attractors. There might be moral symmetry breaking. Value systems as natural attractors doesn't imply universal convergence on one set of values. This point seems to get lost in this post.

Tim, thanks for that commentary, it will put reading your book on the top of my leisure to do list.

Yes, it could involve multiple attractors. I'm not sure which kind of symmetry you refer to though. Do you mean some sort of radial symmetry comming from everything else towards a unique set of values? Even in that case it would not be symmetric because the acceleration (force) would be different from different regions, due to for instance the stuff in (2008 Boyd Richerson and someone else) .

About your main question, no, that is not the same as instrumental Moral Values. Those who hold that claim would probably prefer to say something like: There are these two sets of convergent values, the instrumental ones, and about those we don't care much more than Omohundro does. And the Convergent ones, which are named such because they converge despite not being for instrumental reasons.

Yes, it could involve multiple attractors. I'm not sure which kind of symmetry you refer to though. Do you mean some sort of radial symmetry comming from everything else towards a unique set of values? Even in that case it would not be symmetric because the acceleration (force) would be different from different regions, due to for instance the stuff in (2008 Boyd Richerson and someone else).

Imagine the Lefties, who value driving on the left - and the Righties, who value driving on the right. Nature doesn't care much about this (metaphorically speaking, of course), but the Lefties and the Righties do. I would say that was an example of moral symmetry breaking. It may not be the greatest example (it is more lilkely that they actually care about not being killed) - but I think it illustrates the general idea.

About your main question, no, that is not the same as instrumental Moral Values. Those who hold that claim would probably prefer to say something like: There are these two sets of convergent values, the instrumental ones, and about those we don't care much more than Omohundro does. And the Convergent ones, which are named such because they converge despite not being for instrumental reasons.

I suspect they are practically the same. Intelligent organisms probably won't deviate far from Universal Instrumental Values - for fear of meeting agents whose values more closely approximate them - thus losing control of their entire future.

Lefties and righties is just a convention case, if humans had three arms, two on the right, there might have been a matter of fact as to coming from which arm preference things go better.

I think this fear of other agents taking over the world is some form of reminiscent ingroup outgroup bias. To begin with, on the limit, if you value A B and C intrinsically but you have to do D1 D2 and D3 instrumentally, you may initially think of doing D1 D2 and D3. but what use would it be to fill up your future with that instrumental stuff if you nearly never get A B an C. You'd become just one more stupid replicator fighting for resources. You'd be better off by doing nothing and wishing that, by luck, A B an C were being instantiated by someone less instrumental than yourself.

Lefties and righties is just a convention case, if humans had three arms, two on the right, there might have been a matter of fact as to coming from which arm preference things go better.

Sure, but there are cases where rivals are evenly matched. Lions and tigers, for instance, have different - often conflicting - aims. However, it isn't a walk-over for one team. Of course, you could say whether the lion or tiger genes win is "just a convention" - but to the lions and tigers, it really matters.

To begin with, on the limit, if you value A B and C intrinsically but you have to do D1 D2 and D3 instrumentally, you may initially think of doing D1 D2 and D3. but what use would it be to fill up your future with that instrumental stuff if you nearly never get A B an C [?]

No use. However, our values are not that far from Universal Instrumental Values - because we were built by a process involving a lot of natural selection.

Our choice is more like: do we give up a few of the things we value now - or run the risk of losing many more of them in the future. That leads to the question of how big the risk is - and that turns out to be a tricky issue.

Agreed. That tricky issue I suspect might have enormous consequences if reason ends up being highjacked by in-group out-group biases, and the surviving memes end up being those that make us more instrumental, for fear of someone else doing the same.

I expect that the force that will eventually promote natural values most strongly will be the prospect of encountering unknown aliens. As you say, the stakes are high. If we choose incorrectly, much of our distinctiveness could be permanently obliterated.

sorry, in your terminology I should have said "reproductor"?, I forgot your substitute for replicator....

Replicator, reproducor, I can cope either way. It seems to be mostly critics who get into a muddle over this issue - though of course, we should try not to confuse people with misleading terminology.

[-][anonymous]11y-10

I disbelieve the orthonality thesis, but I'm not sure that my poisition is covered by either of your two cases. My position is best described as a statement by Yudkowsky:

"for every X except x0, it is mysteriously impossible to build any computational system which generates a range of actions, predicts the consequences of those actions relative to some ontology and world-model, and then selects among probable consequences using criterion X"

I certainly don't think AIs become friendly automatically. I agree they have to have the correct goal system X (xo) built-in from the start. My guess is that the AIs without the correct X built-in are not true general intelligences. That is to say, I think they would simply stop functioning correctly (or equivalently, there is an intelligence ceiling past which they cannot go).

Why do you think this, and on a related note why do you think AI's without X will stop functioning/hit a ceiling (in the sense of what is the causal mechanism)?

[-][anonymous]11y-10

Taking a wild guess I’d say…

Starting from my assumption that concept-free general intelligence is impossible, the implication is that there would be some minimal initial set of concepts required to be built-in for all AGIs.

This minimal set of concepts would imply some necessary cognitive biases/heuristics (because the very definition of a ‘concept’ implies a particular grouping or clustering of data – an initial ‘bias’), which in turn is equivalent to some necessary starting values (a ‘bias’ is in a sense, a type of value judgement).

The same set of heuristics/biases (values) involved in taking actions in the world would also be involved in managing (reorganizing) the internal representational system of the AIs. If the reorganization is not performed in a self-consistent fashion, the AIs stop functioning. Remember: we are talking about a closed loop here….the heuristics/biases used to reorganize the representational system, have to themselves be fully represented in that system.

Therefore, the causal mechanism that stops the uAIs would be the eventual breakdown in their representational systems as the need for ever more new concepts arises, stemming from the inconsistent and/or incomplete initial heuristics/biases being used to manage those representational systems (i.e., failing to maintain a closed loop).

Advanced hard math for all this to follow….