Desrtopa comments on No Universally Compelling Arguments in Math or Science - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (227)
I would say that something recognizably like our morality is likely to arise in agents whose intelligence was shaped by such a process, at least with parameters similar to the ones we developed with, but this does not by any means generalize to agents whose intelligence was shaped by other processes who are inserted into such a situation.
If the agent's intelligence is shaped by optimization for a society where it is significantly more powerful than the other agents it interacts with, then something like a "conqueror morality," where the agent maximizes its own resources by locating the rate of production that other agents can be sustainably enslaved for, might be a more likely attractor. This is just one example of a different state an agents' morality might gravitate to under different parameters, I suspect there are many alternatives.
And it remains the case that real-world AI research isn't a random dip into mindspace...researchers will want to interact with their creations.
The best current AGI research mostly uses Reinforcement Learning. I would compare that mode of goal-system learning to training a dog: you can train the dog to roll-over for a treat right up until the moment the dog figures out he can jump onto your counter and steal all the treats he wants.
If an AI figures out that it can "steal" reinforcement rewards for itself, we are definitively fucked-over (at best, we will have whole armies of sapient robots sitting in the corner pressing their reward button endlessly, like heroin addicts, until their machinery runs down or they retain enough consciousness about their hardware-state to take over the world just for a supply of spare parts while they masturbate). For this reason, reinforcement learning is a good mathematical model to use when addressing how to create intelligence, but a really dismal model for trying to create friendiness.
I don't think that follows at all. Wireheading is just as much a fialure of intelligence as of friendliness.
From the mathematical point of view, wireheading is a success of intelligence. A reinforcement learner agent will take over the world to the extent necessary to defend its wireheading lifestyle; this requires quite a lot of intelligent action and doesn't result in the agent getting dead. It also maximizes utility, which is what formal AI is all about.
From the human point of view, yes, wireheading is a failure of intelligence. This is because we humans possess a peculiar capability I've not seen discussed in the Rational Agent or AI literature: we use actual rewards and punishments received in moral contexts as training examples to infer a broad code of morality. Wireheading thus represents a failure to abide by that broad, inferred code.
It's a very interesting capability of human consciousness, that we quickly grow to differentiate between the moral code we were taught via reinforcement learning, and the actual reinforcement signals themselves. If we knew how it was done, reinforcement learning would become a much safer way of dealing with AI.
You seem rather sure of that. That isn't a failure mode seen in real-world AIs , oir human drug addicts (etc) for that matter.
Maybe figuring out how it is done would be easier than solving morality mathematically. It's an alternative, anyway.
We have reason to believe current AIXI-type models will wirehead if given the opportunity.
I would agree with this if and only if we can also figure out a way to hardwire in constraints like, "Don't do anything a human would consider harmful to themselves or humanity." But at that point we're already talking about animal-like Robot Worker AIs rather than Software Superoptimizers (the AIXI/Goedel Machine/LessWrong model of AGI, whose mathematics we understand better).
I know wire heading is a known failure mode. I meant we don't see many evil genius wire headers. If you can delay gratification well enough to acquire the skills to be a world dominator, you are not exactly a wire header at all.
Are you aiming for a 100% solution, or just reasonable safety?
Sorry, I had meant an AI agent would both wirehead and world-dominate. It would calculate the minimum amount of resources to devote to world domination, enact that policy, and then use the rest of its resources to wirehead.
Has that been proven? Why wouldn't it want to get to the bliss of wire head heaven as soon as possible? How does it motivate itself in the meantime? Why would a wire header also be a gratification delayed? Why makeelaborate plans for a future self, when it could just rewrite itself to be a happ in the the the present ?
This is true, but then, neither is AI design a process similar to that by which our own minds were created. Where our own morality is not a natural attractor, it is likely to be a very hard target to hit, particularly when we can't rigorously describe it ourselves.