JamesAndrix comments on Dreams of AIXI - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (145)
I'm not talking about the genome.
1024 bits is an extremely lowball estimate of the complexity of the basic drives and emotions in your AI design. You have to create those drives out of a huge universe of possible drives. Only a tiny subset of possible designs are human like. Most likely you will create an alien mind. Even handpicking drives: it's a small target, and we have no experience with generating drives for even near human AI. The shape of all human like drive sets within the space of all possible drive sets is likely to be thin and complexly twisty within the mapping of a human designed algorithm. You won't intuitively know what you can tweak.
Also, a set of drives that yields a nice AI at human levels might yield something unfriendly once the AI is able to think harder about what it wants. (and this applies just as well to upgrading existing friendly humans.)
All intellectual arguments about complex concepts of morality stem from simpler concepts of right and wrong, which stem from basic preferences learned in childhood. But THOSE stem from emotions and drives which flag particular types of early inputs as important in the first place.
A baby will cry when you pinch it, but not when you bend a paperclip.
Estimating 1 bit per character, that's 214 bits. Still a huge space.
It could be that there is another mechanism that guides adoption of values, which we don't even have a word for yet.
A simpler explanation is that moral memes evolved to be robust to most of the variation in basic drives that exists within the human population. A person born with relatively little 'frowns are bad' might still be taught not to murder with a lesson that hooks into 'groups are good'.
But there just aren't many moral lessons structured around the basic drive of 'paperclips are good' (19 bits)
The subset possible of designs is sparse - and almost all of the space is an empty worthless desert. Evolution works by exploring paths in this space incrementally. Even technology evolves - each CPU design is not a random new point in the space of all possible designs - each is necessarily close to previously explored points.
Yes - but they are learned memetically, not genetically. The child learns what is right and wrong through largely subconscious queues in the tone of voice of the parents, and explicit yes/no (some of the first words learned), and explicit punishment. Its largely a universal learning system with an imprinting system to soak up memetic knowledge from the parents. The genetics provided the underlying hardware and learning algorithm, but the content is all memetic (software/data).
Saying intellectual arguments about complex concepts such as morality relate back to genetics is like saying all arguments about computer algorithm design stem from simpler ideas, which ultimately stem from enlightenment thinkers of three hundred years ago - or perhaps paleolithic cave dwellers inventing fire.
Part of this disagreement could stem from different underlying background assumptions - for example I am probably less familiar with ev psych than many people on LW - partly because (to the extent I have read it) I find it to be grossly over-extended past any objective evidence (compared to say computational neuroscience). I find that ev psych has minor utility in actually understanding the brain, and is even much less useful attempting to make sense of culture.
Trying to understand culture/memetics/minds with ev psych or even neuroscience is even worse than trying to understand biology through physics. Yes it did all evolve from the big bang, but that was a long long time ago.
So basically, anything much more complex than our inner reptile brain (which is all the genome can code for) needs to be understood in memetic/cultural/social terms.
For example, in many civilizations it has been perfectly acceptable to kill or abuse slaves. In some it was acceptable for brothers and sisters to get married, for homosexual relations between teacher and pupil, and we could go on and on.
The idea that there is some universally programmed 'morality' in the genome is . ... a convenient fantasy. It seems reasonable only because we are samples in the dominant Judeo-Christian memetic super-culture, which at this point has spread its influence all over the world, and dominates most of it.
But there are alternate histories and worlds where that just never happened, and they are quite different.
A child's morality develops as a vast accumulation of tiny cues and triggers communicated through the parents - and these are memetic transfers, not genetic. (masturbation is bad, marriage is good, slavery is wrong, racism is wrong, etc etc etc etc)
The basic drive 'paperclips are good' is actually a very complex thing we'd have to add to an AGI design - its not something that would just spontaneously appear.
The more easier, practical AGI design would be a universal learning engine (inspired by the human cortex&hippocampus), simulation loop (hippo-thalamic-cortical circuit) combined with just a subset of the simpler reinforcement learning circuits (the most important being learning-reinforcement itself and imprinting).
And then with imprinting you teach the developing AGI morality in the same way humans learn morality - memetically. Trying to hard-code the morality into the AGI is a massive step backwards from the human brain's design.
One thing I want to make clear is that it is not the correct way to make friendly AI to try to hard code human morality into it. Correct Friendly AI learns about human morality.
MOST of my argument really really isn't about human brains at all. Really.
For a value system in an AGI to change, there must be a mechanism to change the value system. Most likely that mechanism will work off of existing values, if any. In such cases, the complexity of the initial values system is the compressed length of the modification mechanism, plus any initial values. This will almost certainly be at least a kilobit.
If the mechanism+initial values that your AI is using were really simple, then you would not need 1024 bits to describe it. The mechanism you are using is very specific. If you know you need to be that specific, then you already know that you're aiming for a target that specific.
If your generic learning algorithm needs a specific class of motivation mechanisms to 1024 bits of specificity in order to still be intelligent, then the mechanism you made is actually part of your intellignce design. You should separate that for clarity, an AGI should be general.
Heh yeah, but I already conceded that.
Let me put it this way: emotions and drives and such are in the genome. They act as a (perhaps relatively small) function which takes various sensory feeds as arguments, and produce as output modifications to a larger system, say a neural net. If you change that function, you will change what modifications are made.
Given that we're talking about functions that also take their own output as input and do pretty detailed modifications on huge datasets, there is tons of room for different functions to go in different directions. There is no generic morality-importer.
Now there may be clusters of similar functions which all kinda converge given similar input, especially when that input is from other intelligences repeating memes evolved to cause convergence on that class of functions. But even near those clusters are functions which do not converge.
I think it's great that you're putting the description of a paperclip in the basic drive complexity count, as that will completely blow away the kilobit for storing any of the basic human drives you've listed. Maybe the complexity of the important subset of human drives will be somewhere in the ballpark of the complexity of the reptilian brain.
Another thing I could say to describe my point: If you have a generic learning algorithm, then whatever things feed rewards or punishments to that algorithm should bee seen as part of that algorithms environment. Even if some of those things are parts of the agent as a whole, they are part of what the values-agnostic learning algorithm is going to learn to get reward from.
So if you change an internal reward-generator, it's just like changing the environment of the part that just does learning. So two AI's with different internal reward generators will end up learning totally different things about their 'environment'.
To say that a different way: Everything you try to teach the AI will be filtered through the lens of its basic drives.
I'm not convinced that an AGI needs a value system in the first place (beyond the basic value of - survive)- but perhaps that is because I am taking 'value system' to mean something similar to morality - a goal evaluation mechanism.
As I discussed, the infant human brain does have a number of inbuilt simple reinforcement learning systems that do reward/punish on a very simple scale for some simple drives (pain avoidance, hunger) - and you could consider these a 'value system', but most of these drives appear to be optional.
Most of the learning an infant is doing is completely unsupervised learning in the cortex, and it has little to nothing to do with a 'value system'.
The bare bones essentials could just be just the cortical-learning system itself and perhaps an imprinting mechanism.
This is not necessarily true, it does not match what we know from theoretical models such as AGI. With enough time and enough observations, two general universal intelligences will converge on the same beliefs about their environment.
Their goal/reward mechanisms may be different (ie what they want to accomplish), for a given environment there is a single correct set of beliefs, a single correct simulation of that environment that AGI's should converge to.
Of course in our world this is so complex that it could take huge amounts of time, but science is the example mechanism.
You're going to build an AI that doesn't have and can't develop a goal evaluation system?
It doesn't matter what we call it or how it's designed. It could be fully intertwined into an agents normal processing. There is still an initial state and a mechanism by which it changes.
Take any action by any agent, and trace the causality backwards in time, you'll find something I'll loosely label a motivation. The motivation might just be a pattern in a clump of artificial neurons, or a broad pattern in all the neurons, that will depend on implementation. If you trace the causality of that backwards, yes you might find environmental inputs and memes, but you'll also find a mechanism that turned those inputs into motivation like things That mechanism might include the full mind of the agent. Or you might just hit the initial creation of the agent, if the motivation was hardwired.
But for any learning of values to happen, you must have a mechanism, and the complexity of that mechanism tells us how specific it is.
That would be wrong, because I'm talking about two identical AI's in different environments.
Imagine your AI in it's environment, now draw a balloon around the AI and label it 'Agent'. Now let the baloon pass partly through the AI and shrink the balloon so that the AI's reward function is outside of the balloon.
Now copy that diagram and tweak the reward function in one of them.
Now the balloons label agents than will learn very different things about their environments. They might both agree about gravity and everything else we would call a fact about the world, but they will likely disagree about morality, even if they were exposed to the same moral arguments. They can't learn the same things the same way.
No no not necessarily. Goal evaluation is just rating potential future paths according to estimates of your evaluation function - your values.
The simple straightforward approach to universal general intelligence can be built around maximizing a single very simple value: survival.
For example, AIXI maximizes simple reward signals defined in the environment, but in the test environments the reward is always at the very end for 'winning'. This is just about as simple as a goal system as you can get: long term survival. It also may be equivalent to just maximizing accurate knowledge/simulation of the environment.
If you generalize this to the real world, it would be maximizing winning in the distant distant future - in the end. I find it interesting that many transhumanist/cosmist philosophies are similarly aligned.
Another interesting convergence is that if you take just about any evaluator and extend the time horizon to infinity, it converges on the same long term end-time survival. An immortality drive.
And perhaps that drive is universal. Evolution certainly favors it. I believe barring other evidence, we should assume that will be something of a default trajectory of AI, for better or worse. We can create more complex intrinsic value systems and attempt to push away from that default trajectory, but it may be uphill work.
An immortalist can even 'convert' other agents to an extent by convincing them of the simulation argument and the potential for them to maximize arbitrary reward signals in simulations (afterlifes).
In practice yes, although this is less clear as their knowledge expands towards AIXI. You can have different variants of AIXI that 'see' different rewards in the environment and thus have different motivations, but as those rewards are just mental and not causal mechanisms in the environment itself the different AIXI variants will eventually converge on the same simulation program - the same physics approximation.
Isn't it obviosus that a superintelligence that just values it's own survival is not what we want?
There is a LOT more to transhumanism than immortalism.
You treat value systems as a means to the end of intelligence, which is entirely backwards.
That two agents with different values would converge on identical physics is true but irrelevant. Your claim is that they would learn the same morality, even when their drives are tweaked.
No, this isn't obvious at all, and it gets into some of the deeper ethical issues. Is it moral to create an intelligence that is designed from the ground up to only value our survival at our expense? We have already done this with cattle to an extent, but we would now be creating actual sapients enslaved to us by design. I find it odd that many people can easily accept this, but have difficulty accepting say creating an entire self-contained sim universe with unaware sims - how different are the two really?
And just to be clear, I am not advocating creating a superintelligence that just values survival. I am merely pointing out that this is in fact the simplest type of superintelligence and is some sort of final attractor in the space. Evolution will be pushing everything towards that attractor.
No, I'm not trying to claim that. There are several different things here:
I notice that you brought up our treatment of cattle, but not our enslavement of spam filters. These are two semi-intelligent systems. One we are pretty sure can suffer, and I think there is a fair chance that mistreating them is wrong. The other system we generally think does not have any conscious experience or other traits that would require moral consideration. This despite the fact that the spam filter's intelligence is more directly useful to us.
So a safer route to FAI would be to create a system that is very good at solving problems and deciding which problems need solving on our behalf, but which perhaps never experiences qualia itself, or otherwise is not something it would be wrong to enslave. Yes this will require a lot of knowledge about consciousness and morality beforehand. It's a big challenge.
TL;DR: We only run the FAI if it passes a nonperson predicate.
four. I don't follow you.
So now we move to that whole topic of what is life/intelligence/complexity? However you scale it, the cow is way above the spam-filter. The most complex instances of the latter are still below insects, from what I recall. Then when you get to an intelligence that is capable of understanding language, that becomes something like a rocket which boots it up into a whole new realm of complexity.
I don't think this leads to the result that you want - even in theory. But it is the crux of the issue.
Consider the demands of a person predicate. The AI will necessarily be complex enough to form complex abstract approximate thought simulations and acquire the semantic knowledge to build those thought-simulations through thinking in human languages.
So what does it mean to have a person predicate? You have to know what a 'person' is.
And what's really interesting is this: that itself is a question so complex that we humans are debating it.
I think the AI will learn that a 'person', a sapient, is a complex intelligent pattern of thoughts - a pattern of information, which could exist biologically or in a computer system. It will then realize that it itself is in fact a person, the person predicate returns true for its self, and thus goal systems that you create to serve 'people' will include serving itself.
I also believe that this line of thought is not arbitrary and can not be avoided: it is singularly correct and unavoidable.