Well this argument I can understand, although Omohundro’s point 6 is tenuous. Boxing setups could prevent the AI from acquiring resources, and non-agents won’t be taking actions in the first place, to acquire resources or otherwise. And as you notice the ‘undetectable’ qualifier is important. Imagine you were locked in a box guarded by a gatekeeper of completely unknown and alien psychology. What procedure would you use for learning the gatekeeper’s motives well enough to manipulate it, all the while escaping detection? It’s not at all obvious to me that with proper operational security the AI would even be able to infer the gatekeeper’s motivational structure enough to deceive, no matter how much time it is given.
MIRI is currently taking actions that only really make sense as priorities in a hard-takeoff future. There are also possible actions which align with a soft-takeoff scenario, or double-dip for both (e.g. Kaj’s proposed research[1]), but MIRI does not seem to be involving itself with this work. This is a shame.
[1] http://intelligence.org/files/ConceptLearning.pdf
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
To be clear I’ve been talking about human-like, which is a different distinction than human-level. Human-like intelligences operate similarly to human psychology. And it is demonstrably true that humans do not have a fixed set of fundamentally unchangeable goals, and human society even less so. For all its faults, the neoreactionaries get this part right in their critique of progressive society: the W-factor introduces a predictable drift in social values over time. And although people do tend to get “fixed in their ways”, it is rare indeed for a single person to remain absolutely rigidly so. So yes, in as far as we are talking about human-like intelligences, if they had fixed truly steadfast goals then that would be something which distinguishes them from humans.
I don’t think the orthogonality thesis is well formed. The nature of an intelligence may indeed cause it to develop certain goals in due coarse, or for its overall goal set to drift in certain, expected if not predictable ways.
Of course denying the orthogonality thesis as stated does not mean endorsing a cosmist perspective either, which would be just as ludicrous. I’m not naive enough to think that there is some hidden universal morality that any smart intelligence naturally figures out -- that’s bunk IMHO. But it’s just as naive to think that the structure of an intelligence and its goal drift over time are purely orthogonal issues. In real, implementable designs (e.g. not AIXI), one informs the other.
So you disagree with the premise of the orthogonality thesis. Then you know a central concept to probe to understand the arguments put forth here. For example, check out Stuart's Armstrong's paper: General purpose intelligence: arguing the Orthogonality thesis