What's the general algorithm you can use to determine if something like "sound" is a "word" or a "concept"?
If it extrapolates coherently, then it's a single concept, otherwise it's a mixture :)
This may actually be doable, even at present level of technology. You gather a huge text corpus, find the contexts where the word "sound" appears, do the clustering using some word co-occurence metric. The result is a list of different meanings of "sound", and a mapping from each mention to the specific meaning. You can also do this simultaneously for many words together, then it is a global optimization problem.
Of course, AGI would be able to do this at a deeper level than this trivial syntactic one.
Yeah, it's weird that Eliezer's metaethics and FAI seem to rely on figuring out "true meanings" of certain words, when Eliezer also wrote a whole sequence explaining that words don't have "true meanings".
For example, Eliezer's metaethical approach (if it worked) could be used to actually answer questions like "if a tree falls in the forest and no one's there, does it make a sound?", not just declare them meaningless :-) Namely, it would say that "sound" is not a confused jumble of "vibrations of air" and "auditory experiences", but a coherent concept that you can extrapolate by examining lots of human brains. Funny I didn't notice this tension until now.
Does is rely on true meanings of words, particularly? Why not on concepts? Individually, "vibrations of air" and "auditory experiences" can be coherent.
How is it worse for you directly?
I value the universe with my friend in it more than one without her.
This is completely wrong. People are happy, by definition, if their actual values are fulfilled; not if some conflicting extrapolated values are fulfilled. CEV was supposed to get around this by proposing (without saying how) that people would actually grow to become smarter etc. and thereby modify their actual values to match the extrapolated ones, and then they'd be happy in a universe optimized for the extrapolated (now actual) values. But you say you don't want to change other people's values to match the extrapolation. That makes CEV a very bad idea - most people will be miserable, probably including you!
People are happy, by definition, if their actual values are fulfilled
Yes, but values depend on knowledge. There was an example by EY, I forgot where, in which someone values a blue box because they think the blue box contains a diamond. But if they're wrong, and it's actually the red box that contains the diamond, then what would actually make them happy - giving them the blue or the red box? And would you say giving them the red box is making them suffer?
Well, perhaps yes. Therefore, a good extrapolated wish would include constraints on the speed of its own fulfillment: allow the person to take the blue box, then convince them that it is the red box they actually want, and only then present it. But in cases where this is impossible (example: blue box contains horrible violent death), then it is wrong to say that following the extrapolated values (withholding the blue box) is making the person suffer. Following their extrapolated values is the only way to allow them to have a happy life.
Isn't that included when he says "that is worse for you, too, since you care about your friend"?
But he assumes that it is worse for me because it is bad for my friend to have died. Whereas, in fact, it is worse for me directly.
People sometimes respond that death isn't bad for the person who is dead. Death is bad for the survivors. But I don't think that can be central to what's bad about death. Compare two stories.
Story 1. Your friend is about to go on the spaceship that is leaving for 100 Earth years to explore a distant solar system. By the time the spaceship comes back, you will be long dead. Worse still, 20 minutes after the ship takes off, all radio contact between the Earth and the ship will be lost until its return. You're losing all contact with your closest friend.
Story 2. The spaceship takes off, and then 25 minutes into the flight, it explodes and everybody on board is killed instantly.
Story 2 is worse. But why? It can't be the separation, because we had that in Story 1. What's worse is that your friend has died. Admittedly, that is worse for you, too, since you care about your friend. But that upsets you because it is bad for her to have died.
Actually, I think the universe is better for me with my friend being alive in it, even if I won't ever see her. My utility function is defined over the world states, not over my sensory inputs.
But assuming it can, why would it be controversial to fulfill the wish(es) of literally everyone, while affecting everything else the least?
Problems:
Extrapolation is poorly defined, and, to me, seems to go in either one of two directions: either you make people more as they would like to be, which throws any ideas of coherence out the window, or you make people 'better' a long a specific axis, in which case you're no longer directing the question back at humanity in a meaningful sense. Even something as simple as removing wrong beliefs (as you imply) would automatically erase any but the very weakest theological notions. There are a lot of people in the world who would die to stop that from happening. So, yes, controversial.
Coherence, one way or another, is unlikely to exist. Humans want a bunch of different things. Smarter, better-informed humans would still want a bunch of different, conflicting things. Trying to satisfy all of them won't work. Trying to satisfy the majority at the expense of the minorities might get incredibly ugly incredibly fast. I don't have a better solution at this time, but I don't think taking some kind of vote over the sum total of humanity is going to produce any kind of coherent plan of action.
For extrapolation to be conceptually plausible, I imagine "knowledge" and "intelligence level" to be independent variables of a mind, knobs to turn. To be sure, this picture looks ridiculous. But assuming, for the sake of argument, that this picture is realizable, extrapolation appears to be definable.
Yes, many religious people wouldn't want their beliefs erased, but only because they believe them to be true. They wouldn't oppose increasing their knowledge if they knew it was true knowledge. Cases of belief in belief would be dissolved if it was known that true beliefs were better in all respects, including individual happiness.
Coherence, one way or another, is unlikely to exist. Humans want a bunch of different things...
Yes, I agree with this. But, I believe there exist wishes universal for (extrapolated) humans, among which I think there is the wish for humans to continue existing. I would like for AI to fulfil this wish (and other universal wishes if there are any), while letting people decide everything else for themselves.
They are reflectively consistent in the limit of infinite knowledge and intelligence. This is a very special and interesting property.
Paperclipping is also self-consistent in that limit. That doesn't make me want to include it in the CEV.
But people would change - gaining knowledge and intelligence - and thus would become happier and happier with time.
Evidence please. There's a long long leap from ordinary gaining knowledge and intelligence through human life, to "the limit of infinite knowledge and intelligence". Moreover we're considering people who currently explicitly value not updating their beliefs in the face of knowledge, and basing their values on faith not evidence. For all I know they'd never approach your limit in the lifetime of the universe, even if it is the limit given infinite time. And meanwhile they'd be very unhappy.
And I think CEV would try to synchronize this with the timing of its optimization process.
So you're saying it wouldn't modify the world to fit their new evolved values until they actually evolved those values? Then for all we know it would never do anything at all, and the burden of proof is on you to show otherwise. Or it could modify the world to resemble their partially-evolved values, but then it wouldn't be a CEV, just a maximizer of whatever values people happen to already have.
Paperclipping is also self-consistent in that limit. That doesn't make me want to include it in the CEV
Then we can label paperclipping as a "true" value too. However, I still prefer true human values to be maximized, not true clippy values.
Evidence please. There's a long long leap from ordinary gaining knowledge and intelligence through human life, to "the limit of infinite knowledge and intelligence". Moreover we're considering people who currently explicitly value not updating their beliefs in the face of knowledge, and basing their values on faith not evidence. For all I know they'd never approach your limit in the lifetime of the universe, even if it is the limit given infinite time. And meanwhile they'd be very unhappy.
As I said before, if someone's mind is that incompatible with truth, I'm ok with ignoring their preferences in the actual world. They can be made happy in a simulation, or wireheaded, or whatever the combined other people's CEV thinks best.
So you're saying it wouldn't modify the world to fit their new evolved values until they actually evolved those values?
No, I'm saying, the extrapolated values would probably estimate the optimal speed for their own optimization. You're right, though, it is all speculations, and the burden of proof is on me. Or on whoever will actually define CEV.
Logical Uncertainty as Probability
This post is a long answer to this comment by cousin_it:
Logical uncertainty is weird because it doesn't exactly obey the rules of probability. You can't have a consistent probability assignment that says axioms are 100% true but the millionth digit of pi has a 50% chance of being odd.
I'd like to attempt to formally define logical uncertainty in terms of probability. Don't know if what results is in any way novel or useful, but.
Let X be a finite set of true statements of some formal system F extending propositional calculus, like Peano Arithmetic. X is supposed to represent a set of logical/mathematical beliefs of some finite reasoning agent.
Given any X, we can define its "Obvious Logical Closure" OLC(X), an infinite set of statements producible from X by applying the rules and axioms of propositional calculus. An important property of OLC(X) is that it is decidable: for any statement S it is possible to find out whether S is true (S∈OLC(X)), false ("~S"∈OLC(X)), or uncertain (neither).
We can now define the "conditional" probability P(*|X) as a function from {the statements of F} to [0,1] satisfying the axioms:
Axiom 1: Known true statements have probability 1:
P(S|X)=1 iff S∈OLC(X)
Axiom 2: The probability of a disjunction of mutually exclusive statements is equal to the sum of their probabilities:
"~(A∧B)"∈OLC(X) implies P("A∨B"|X) = P(A|X) + P(B|X)
From these axioms we can get all the expected behavior of the probabilities:
P("~S"|X) = 1 - P(S|X)
P(S|X)=0 iff "~S"∈OLC(X)
0 < P(S|X) < 1 iff S∉OLC(X) and "~S"∉OLC(X)
"A=>B"∈OLC(X) implies P(A|X)≤P(B|X)
"A<=>B"∈OLC(X) implies P(A|X)=P(B|X)
etc.
This is still insufficient to calculate an actual probability value for any uncertain statement. Additional principles are required. For example, the Consistency Desideratum of Jaynes: "equivalent states of knowledge must be represented by the same probability values".
Definition: two statements A and B are indistinguishable relative to X iff there exists an isomorphism between OLC(X∪{A}) and OLC(X∪{B}), which is identity on X, and which maps A to B.
[Isomorphism here is a 1-1 function f preserving all logical operations: f(A∨B)=f(A)∨f(B), f(~~A)=~~f(A), etc.]
Axiom 3: If A and B are indistinguishable relative to X, then P(A|X) = P(B|X).
Proposition: Let X be the set of statements representing my current mathematical knowledge, translated into F. Then the statements "millionth digit of PI is odd" and "millionth digit of PI is even" are indistinguishable relative to X.
Corollary: P(millionth digit of PI is odd | my current mathematical knowledge) = 1/2.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
What you are saying indeed applies only "in cases where this is impossible". I further suggest that these are extremely rare cases when a superhumanly-powerful AI is in charge. If the blue box contains horrible violent death, the AI would build a new (third) box, put a diamond inside, paint it blue, and give it to the person.
It the AI could do this, then this is exactly what the extrapolated values would tell it to do. [Assuming some natural constraints on the original values].