The AI can be adapted for other, less restricted, domains
That the ideas from a safe AI can be used to build an unsafe AI is a general argument against working on (or even talking about) any kind of AI whatsoever.
The AI adds code that will evolve into another AI into it's output
The output is to contain only proofs of theorems. Specifically, a proof (or refutation) of the theorem in the input. The state of the system is to be reset after each run so as to not accumulate information.
The AI could self-modify incorrectly and result in unfriendly AI
An...
Well, I liked the paper, but I'm not knowledgeable enough to judge its true merits. It deals heavily with Bayesian-related questions, somewhat in Jayne's style, so I thought it could be relevant to this forum.
At least one of the authors is a well-known theoretical physicist with an awe-inspiring Hirsch factor, so presumably the paper would not be trivially worthless. I think it merits a more careful read.
Regarding the "he's here... he is the end of the world" prophecy, in view of the recent events, it seems like it can become literally true without it being a bad thing. After all, it does not specify a time frame. So Harry may become immortal and then tear apart the very stars in heaven, some time during a long career.
You're treating resources as one single kind, where really there are many kinds with possible trades between teams
I think this is reasonably realistic. Let R signify money. Then R can buy other necessary resources.
But my point was exactly that there would be many teams who could form many different alliances. Assuming only two is unrealistic and just ignores what I was saying.
We can model N teams by letting them play two-player games in succession. For example, any two teams with nearly matched resources would cooperate with each other, producing a ...
I don't think you can get an everywhere-positive exchange rate. There are diminishing returns and a threshold, after which, exchanging more resources won't get you any more time. There's only 30 hours in a day, after all :)
Space (land or whatever is being used). Mass and energy. Natural resources. Computing power. Finite-supply money and luxuries if such exist. Or are you making an assumption that CEVs are automatically more altruistic or nice than non-extrapolated human volitions are?
These all have property that you only need so much of them. If there is a sufficient amount for everybody, then there is no point in killing in order to get more. I expect CEV-s to not be greedy just for the sake of greed. It's people's CEV-s we're talking about, not paperclip maximizers'.
...A coalition of 80% of the population forms, which would like to kill the other 20% in order to get their resources
I have trouble thinking of a resource that would make even one person's CEV, let alone 80%, want to kill people, in order to just have more of it.
The question of definition, who is to be included in the CEV? or - who is considered sane?
This is easy, and does not need any special hardcoding. If someone is so insane that their beliefs are totally closed and impossible to move by knowledge and intelligence, then their CEV is undefined. Thus...
The resources are not scarce at all. But, there's no consensus of CEVs. The CEVs of 80% want to kill the rest.
The resources are not scarce, yet the CEV-s want to kill? Why?
I meant that the AI that implements your version of CEV would forcibly update people's actual beliefs to match what it CEV-extrapolated for them.
It would do so only if everybody's CEV-s agree that updating these people's beliefs is a good thing.
If you believed there were many such people, would you modify your solution, or is ignoring them however many they are fine by you?
Peo...
So you're OK with the FAI not interfering if they want to kill them for the "right" reasons?
I wouldn't like it. But if the alternative is, for example, to have FAI directly enforce the values of the minority on the majority (or vice versa) - the values that would make them kill in order to satisfy/prevent - then I prefer FAI not interfering.
"if we kill them, we will benefit by dividing their resources among ourselves"
If the resources are so scarce that dividing them is so important that even CEV-s agree on the necessity of killin...
If a majority of humanity wishes to kill a minority, obviously there won't be a consensus to stop the killing, and AI will not interfere
The majority may wish to kill the minority for wrong reasons - based on false beliefs or insufficient intelligence. In which case their CEV-s won't endorse it, and the FAI will interfere. "Fundamentally different" means their killing each other is endorsed by someone's CEV, not just by themselves.
But you said it would only do things that are approved by a strong human consensus.
Strong consensus of their CE...
Will it only interfere if a consensus of humanity allows it to do so? Will it not stop a majority from murdering a minority?
If the majority and the minority are so fundamentally different that their killing each other is not forbidden by the universal human CEV, then no. On what moral grounds would it do the prevention?
The first AGI that does not favor inaction will become a singleton, destroying the other AIs and preventing future new AIs
Until everybody agree that this new AGI is not good after all. Then the original AGI will interfere and dismantl...
A FAI that never does anything except prevent existential risk - which, in a narrow interpretation, means it doesn't stop half of humanity from murdering the other half - isn't a future worth fighting for IMO. We can do so much better.
No one said you have to stop with that first FAI. You can try building another. The first FAI won't oppose it (non-interference). Or, better yet, you can try talking to the other half of the humans.
There are people who believe religiously that End Times must come
Yes, but we assume they are factually wrong, and so their...
The problems look like a kind of an anti-Prisoner's Dilemma. An agent plays against an opponent, and gets a reward iff they played differently. Then any agent playing against itself is screwed.
I would be fine with FAI removing existential risks and not doing any other thing until everybody('s CEV) agrees on it. (I assume here that removing existential risks is one such thing.) And an FAI team that creditably precommitted to implementing CEV instead of CEV would probably get more resources and would finish first.
Well, my own proposed plan is also a contingent modification. The strongest possible claim of CEV can be said to be:
There is a unique X, such that for all living people P, CEV
= X.
Assuming there is no such X, there could still be a plausible claim:
Y is not empty, where Y = Intersection{over all living people P} of CEV
.
And then AI would do well if it optimizes for Y while interfering the least with other things (whatever this means). This way, whatever "evolving" will happen due to AI's influence is at least agreed upon by everyone('s CEV).
Back here you said "Well, perhaps yes." I understand that to mean you agree with my point that it's wrong / bad for the AI to promote extrapolated values while the actual values are different and conflicting
I meant that "it's wrong/bad for the AI to promote extrapolated values while the actual values are different and conflicting" will probably be a part of the extrapolated values, and the AI would act accordingly, if it can.
...My position is that the AI must be guided by the humans' actual present values in choosing to steer human (s
Humans don't know which of their values are terminal and which are instrumental, and whether this question even makes sense in general. Their values were created by two separate evolutionary processes. In the boxes example, humans may not know about the diamond. Maybe they value blue boxes because their ancestors could always bring a blue box to a jeweler and exchange it for food, or something.
This is precisely the point of extrapolation - to untangle the values from each other and build a coherent system, if possible.
No, the "actual" values would tell it to give the humans the blue boxes they want, already.
the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person
It the AI could do this, then this is exactly what the extrapolated values would tell it to do. [Assuming some natural constraints on the original values].
If it extrapolates coherently, then it's a single concept, otherwise it's a mixture :)
This may actually be doable, even at present level of technology. You gather a huge text corpus, find the contexts where the word "sound" appears, do the clustering using some word co-occurence metric. The result is a list of different meanings of "sound", and a mapping from each mention to the specific meaning. You can also do this simultaneously for many words together, then it is a global optimization problem.
Of course, AGI would be able to do this at a deeper level than this trivial syntactic one.
Does is rely on true meanings of words, particularly? Why not on concepts? Individually, "vibrations of air" and "auditory experiences" can be coherent.
I think seeking and refining such plans would be a worthy goal. For one thing, it would make LW discussions more constructive. Currently, as far as I can tell, CEV is very broadly defined, and its critics usually point at some feature and cast (legitimate) doubt on it. Very soon, CEV is apparently full of holes and one may wonder why is it not thrown away already. But they may be not real holes, just places where we do not know enough yet. If these points are identified and stated in a form of questions of fact, which can be answered by future research, then a global plan, in the form of a decision tree, could be made and reasoned about. That would be a definite progress, I think.
Why is it important that it be uncontroversial?
I'm not sure. But it seems a useful property to have for an AI being developed. It might allow centralizing the development. Or something.
Ok, you're right in that a complete lack of controversy is impossible, because there are always trolls, cranks, conspiracy theorists, etc. But is it possible to reach a consensus among all sufficiently well-informed sufficiently intelligent people? Where "sufficiently" is not a too high threshold?
What I'm trying to do is find some way to fix the goalposts. Find a set of conditions on CEV that would satisfy. Whether such CEV actually exists and how to build it are questions for later. Lets just pile up constraints until a sufficient set is reached. So, lets assume that:
would you say that running it is uncontroversial? If not, what other conditions are required?
I value the universe with my friend in it more than one without her.
Ok, but do you grant that running a FAI with "unanimous CEV" is at least (1) safe, and (2) uncontroversial? That the worst problem with it is that it may just stand there doing nothing - if I'm wrong about my hypothesis?
People are happy, by definition, if their actual values are fulfilled
Yes, but values depend on knowledge. There was an example by EY, I forgot where, in which someone values a blue box because they think the blue box contains a diamond. But if they're wrong, and it's actually the red box that contains the diamond, then what would actually make them happy - giving them the blue or the red box? And would you say giving them the red box is making them suffer?
Well, perhaps yes. Therefore, a good extrapolated wish would include constraints on the speed of it...
VHEMT supports human extinction primarily because, in the group's view, it would prevent environmental degradation. The group states that a decrease in the human population would prevent a significant amount of man-made human suffering.
Obviously, human extinction is not their terminal value.
I believe there exist (extrapolated) wishes universal for humans (meaning, true for literally everyone). Among these wishes, I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.
But he assumes that it is worse for me because it is bad for my friend to have died. Whereas, in fact, it is worse for me directly.
...People sometimes respond that death isn't bad for the person who is dead. Death is bad for the survivors. But I don't think that can be central to what's bad about death. Compare two stories.
Story 1. Your friend is about to go on the spaceship that is leaving for 100 Earth years to explore a distant solar system. By the time the spaceship comes back, you will be long dead. Worse still, 20 minutes after the ship takes off, all radio contact between the Earth and the ship will be lost until its return. You're losing all contact with your closest friend.
Stor
For extrapolation to be conceptually plausible, I imagine "knowledge" and "intelligence level" to be independent variables of a mind, knobs to turn. To be sure, this picture looks ridiculous. But assuming, for the sake of argument, that this picture is realizable, extrapolation appears to be definable.
Yes, many religious people wouldn't want their beliefs erased, but only because they believe them to be true. They wouldn't oppose increasing their knowledge if they knew it was true knowledge. Cases of belief in belief would be dissolved ...
Paperclipping is also self-consistent in that limit. That doesn't make me want to include it in the CEV
Then we can label paperclipping as a "true" value too. However, I still prefer true human values to be maximized, not true clippy values.
...Evidence please. There's a long long leap from ordinary gaining knowledge and intelligence through human life, to "the limit of infinite knowledge and intelligence". Moreover we're considering people who currently explicitly value not updating their beliefs in the face of knowledge, and basing th
What makes you give them such a label as "true"?
They are reflectively consistent in the limit of infinite knowledge and intelligence. This is a very special and interesting property.
In your CEV future, the extrapolated values are maximized. Conflicting values, like the actual values held today by many or all people, are necessarily not maximized.
But people would change - gaining knowledge and intelligence - and thus would become happier and happier with time. And I think CEV would try to synchronize this with the timing of its optimization process.
why extrapolate values at all
Extrapolated values are the true values. Whereas the current values are approximations, sometimes very bad and corrupted approximations.
they will suffer in the CEV future
This does not follow.
Errr. This is a question of simple fact, which is either true or false. I believe it's true, and build the plans accordingly. We can certainly think about contingency plans, of what to do if the belief turns out to be false, but so far no one agreed that the plan is good even in case the belief is true.
Dunno... propose to kill them quickly and painlessly, maybe? But why do you ask? As I said, I don't expect this to happen.
No, because "does CEV fulfill....?" is not a well-defined or fully specified question. But I think, if you asked "whether it is possible to build FAI+CEV in such a way that it fulfills the wish(es) of literally everyone while affecting everything else the least", they would say they do not know.
I'd think someone's playing a practical joke on me.
Aumann update works only if I believe you're a perfect Bayesian rationalist. So, no thanks.
Too bad. Let's just agree to disagree then, until the brain scanning technology is sufficiently advanced.
I've pointed out people who don't wish for the examples you gave
So far, I didn't see a convincing example of a person who truly wished for everyone to die, even in extrapolation.
Otherwise the false current beliefs will keep on being very relevant to them
To them, yes, but not to their CEV.
You could very easily build a much happier life for them just by allocating some resources (land, computronium, whatever) and going by their current values
Well... ok, lets assume a happy life is their single terminal value. Then by definition of their extrapolated values, you couldn't build a happier life for them if you did anything else other than follow their extrapolated values!
In all of their behavior throughout their lives, and in their own words today, they honestly have this value
This is the conditional that I believe is false when I say "they are probably lying, trolling, joking". I believe that when you use the brain scanner on those nihilists, and ask them whether they would prefer the world where everyone is dead to any other possible world, and they say yes, the brain scanner would show they are lying, trolling or joking.
Well, assuming EY's view of intelligence, the "cautionary position" is likely to be a mathematical statement. And then why not prove it? Given several decades? That's a lot of time.
Even if they do, it will be the best possible thing for them, according to their own (extrapolated) values.
we anticipate there will be no extrapolated wishes that literally everyone agrees on
Well, now you know there exist people who believe that there are some universally acceptable wishes. Let's do the Aumann update :)
Lots of people religiously believe...
False beliefs => irrelevant after extrapolation.
Some others believe that life in this world is suffering, negative utility, and ought to be stopped for its own sake (stopping the cycle of rebirth)
False beliefs (rebirth, existence of nirvana state) => irrelevant after extrapolation.
My conditional was "cautionary position is the correct one". I meant, provably correct.
How can we even start defining CEV without brain scanning technology able to do much more than answering the original question?
What is the chance some of them will try to seize first-mover advantage, and refuse to wait for another 30 years, and ignore Friendliness? I estimate high. The payoff is the biggest in human history: first-mover will potentially control a singleton that will rewrite to order the very laws of physics in its future light-cone, and prevent any other AGI from ever being built! This is beyond even "rule the world forever and reshape it in your image" territory. The greatest temptation ever. Do you seriously expect no-one would succumb to it?
Remembe...
I only proposed a hypothesis, which will become testable earlier than the time when CEV could be implemented.
I am confused about how Philosopher's stone could help with reviving Hermione. Does QQ mean to permanently transfigure her dead body into a living Hermione? But then, would it not mean that Harry could do it now, albeit temporarily? And, he wouldn't even need a body. He could then just temporarily transfigure any object into a living Hermione. Also, now that I think of it, he could transfigure himself a Feynman and a couple of Einsteins...