As you mention, so far every attempt by humans to have a self-consistent value system (the process also known as decompartmentalization) results in less-than-desirable outcomes. What if the end goal of having a thriving long-lasting (super-)human(-like) society is self-contradictory, and there is no such thing as both "nice" and "self-referentially stable"? Maybe some effort should be put into figuring out how to live, and thrive, while managing the unstable self-reference and possibly avoid convergence altogether.
A thought I've been thinking of lately, derived from a reinforcement learning view of values, and also somewhat inspired by Nate's recent post on resting in motion... - value convergence seems to suggest a static endpoint, with some set of "ultimate values" we'll eventually reach and have ever after. But so far societies have never reached such a point, and if our values are an adaptation to our environment (including the society and culture we live in), then it would suggest that as long as we keep evolving and developing, our values will keep changing and evolving with us, without there being any meaningful endpoint.
There will always (given our current understanding of physics) be only a finite amount of resources available, and unless we either all merge into one enormous hivemind or get turned into paperclips, there will likely be various agents with differing preferences on what exactly to do with those resources. As the population keeps changing and evolving, the various agents will keep acquiring new kinds of values, and society will keep rearranging itself to a new compromise between all those different values. (See: the whole history of the human species so far.)...
Impressive.
Couldn't another class of solutions be that resolutions of inconsistencies cannot reduce the complexity of the agent's morality? I.e. morality has to be (or tend to become) not only (more) consistent, but also (more) complex, sort of like an evolving body of law rather than like the Ten Commandments?
What do I mean by that? Well, imagine you're trying to reach reflective equilibrium in your morality. You do this by using good meta-ethical rules, zooming up and down at various moral levels, making decisions on how to resolve inconsistencies, etc... But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup.
Wait... what? No.
You don't solve the value-alignment problem by trying to write down your confusions about the foundations of moral ph...
But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup.
Or once you lose your meta-mortal urge to reach a self-consistent morality. This may not be the wrong (heh) answer along a path that originally started toward reaching self-consistent morality.
...Or, more simply, the system could get hacked. When exploring a potential future world, you could become so enamoured of it, that you overwrite any objections you had. It seems very easy for h
It seems that your research is coming around to some concepts that are at the basis of mine. Namely, that noise in an optimization process is a constraint on the process, and that the resulting constrained optimization process avoids the nasty properties you describe.
Feel free to contact me if you'd like to discuss this further.
The common thread I am noticing is the assumption of singletonhood.
Technologically, if you have a process that could go wrong, you run several in parallel.
In human society, an ethical innovator can run an idea past the majority to seems it sounds like an improved version of what they believe already.
It's looking, again, like group rationality is better.
.
I don't think anyone has proposed any self-referential criteria as being the point of Friendly AI? It's just that such self-referential criteria as reflective equilibrium are a necessary condition which lots of goal setups don't even meet. (And note that just because you're trying to find a fixpoint, doesn't necessarily mean you have to try to find it by iteration, if that process has problems!)
-- Be happy that people have died and sad that they remain alive (same qualifiers as before: person is not suffering so much that even nothingness is preferable, etc.) and the reverse for people who they don't like
-- Want to kill people to benefit them (certainly, we could improve a lot of third world suffering by nuking places, if they have a bad life but a good afterlife. Note that the objection "their culture would die out" would not be true if there is an afterlife.)
-- In the case of people who oppose abortions because fetuses are people (which I expect overlaps highly with belief in life after death), be in favor of abortions if the fetus gets a good afterlife
-- Be less willing to kill their enemies the worse the enemy is
-- Do extensive scientific research trying to figure out what life after death is like.
-- Genuinely think that having their child die is no worse than having their child move away to a place where the child cannot contact them
-- Drastically reduce how bad they think death is when making public policy decisions; there would be still some effect because death is separation and things that cause death also cause suffering, but we act as though causing death makes some policy uniquely bad and preventing it uniquely good
-- Not oppose suicide
Edit: Support the death penalty as more humane than life imprisonment.
(Some of these might not apply if they believe in life after death but also in Hell, but that has its own bizarre consequences.)
Be happy that people have died and sad that they remain alive (same qualifiers as before: person is not suffering so much that even nothingness is preferable, etc.) and the reverse for people who they don't like
Hmmm.
What is known is that people who go to the afterlife don't generally come back (or, at least, don't generally come back with their memories intact). Historical evidence strongly suggests that anyone who remains alive will eventually die... so remaining alive means you have more time to enjoy what is nice here before moving on.
So, I don't ima...
A putative new idea for AI control; index here.
After working for some time on the Friendly AI problem, it's occurred to me that a lot of the issues seem related. Specifically, all the following seem to have commonalities:
Speaking very broadly, there are two features all them share:
What do I mean by that? Well, imagine you're trying to reach reflective equilibrium in your morality. You do this by using good meta-ethical rules, zooming up and down at various moral levels, making decisions on how to resolve inconsistencies, etc... But how do you know when to stop? Well, you stop when your morality is perfectly self-consistent, when you no longer have any urge to change your moral or meta-moral setup. In other words, the stopping point (and the the convergence to the stopping point) is entirely self-referentially defined: the morality judges itself. It does not include any other moral considerations. You input your initial moral intuitions and values, and you hope this will cause the end result to be "nice", but the definition of the end result does not include your initial moral intuitions (note that some moral realists could see this process dependence as a positive - except for the fact that these processes have many convergent states, not just one or a small grouping).
So when the process goes nasty, you're pretty sure to have achieved something self-referentially stable, but not nice. Similarly, a nasty CEV will be coherent and have no desire to further extrapolate... but that's all we know about it.
The second feature is that any process has errors - computing errors, conceptual errors, errors due to the weakness of human brains, etc... If you visualise this as noise, you can see that noise in a convergent process is more likely to cause premature convergence, because if the process ever reaches a stable self-referential state, it will stay there (and if the process is a long one, then early noise will cause great divergence at the end). For instance, imagine you have to reconcile your belief in preserving human cultures with your beliefs in human individual freedom. A complex balancing act. But if, at any point along the way, you simply jettison one of the two values completely, things become much easier - and once jettisoned, the missing value is unlikely to ever come back.
Or, more simply, the system could get hacked. When exploring a potential future world, you could become so enamoured of it, that you overwrite any objections you had. It seems very easy for humans to fall into these traps - and again, once you lose something of value in your system, you don't tend to get if back.
Solutions
And again, very broadly speaking, there are several classes of solutions to deal with these problems: