There's the classic economic textbook example of two hot-dog vendors on a beach that need to choose their location - assuming an even distribution of customers, and that customers always choose the closest vendor; the equilibrium location is them standing right next to each other in the middle; while the "optimal" (from customer view, minimizing distance) locations would be at 25% and 75% marks.
This matches the median voter principle - the optimal behavior of candidates is to be as close as possible to the median but on the "right side" to capture "their half" of the voters; even if most voters in a specific party would prefer their candidate to cater for, say, the median Republican/Democrat instead, it's against the candidates interests to do so.
The deeper problem is that you can't really program "make me happy" in the same way that you can't program "make this image look like I want".
On one hand, Friendly AI people want to convert "make me happy" to a formal specification. Doing that has many potential pitfalls. because it is a formal specification.
On the other hand, Richard, I think, wants to simply tell the AI, in English, "Make me happy." Given that approach, he makes the reasonable point that any AI smart enough to be dangerous would also be smart enough to interpret that at least as intelligently as a human would.
I think the important question here is, Which approach is better? LW always assumes the first, formal approach.
To be more specific (and Bayesian): Which approach gives a higher expected value? Formal specification is compatible with Eliezer's ideas for friendly AI as something that will provably avoid disaster. It has some non-epsilon possibility of actually working. But its failure modes are many, and can be literally unimaginably bad. When it fails, it fails catastrophically, like a monotonic logic system with one false belief.
"Tell the AI in English" can fail, but the worst case is closer to a "With Folded Hands" scenario than to paperclips.
I've never considered the "Tell the AI what to do in English" approach before, but on first inspection it seems safer to me.
"Tell the AI in English" is in essence an utility function "Maximize the value of X, where X is my current opinion of what some english text Y means".
The 'understanding English' module, the mapping function between X and "what you told in English" is completely arbitrary, but is very important to the AI - so any self-modifying AI will want to modify and improve that. Also, we don't have a good "understanding English" module so yes, we also want the AI to be able to modify and improve that. But, it can be wildly different from reality or opinions of humans - there are trivial ways of how well-meaning dialogue systems can misunderstand statements.
However, for the AI "improve the module" means "change the module so that my utility grows" - so in your example it has strong motivation to intentionally misunderstand English. The best case scenario is to misunderstand "Make everyone happy" as "Set your utility function to MAXINT". The worst case scenario is, well, everything else.
There's the classic quote "It is difficult to get a man to understand something, when his salary depends upon his not understanding it!" - if the AI doesn't care in the first place, then "Tell AI what to do in English" won't make it care.
What's the difference between someone who commonly believes that rudeness is appropriate, and a rude person?
If you model X as "rude person", then you expect him to be rude with a high[er than average] probability cases, period.
However, if you model X as an agent that believes that rudeness is appropriate in common situations A,B,C, then you expect that he might behave less rudely (a) if he would percieve that this instance of a common 'rude' situation is nuanced and that rudeness is not appropriate there; or (b) if he could be convinced that rudeness in situations like that is contrary to his goals, whatever those may be.
In essence, it's simpler and faster to evaluate expected reactions for people that you model as just complex systems, you can usually do that right away. But if you model goal-oriented behavior, "walk a mile in his shoes" and try to understand the intent of every [non]action and the causes of that, then it tends to be tricky but allows you more depth in both accurate expectations, and ability to affect the behavior.
However, if you do it poorly, or simply lack data neccessary to properly understand the reasons/motivations of that person then you'll tend to get gross misunderstandings.
I admit, I stopped reading the linked paper when I saw the page count, but I don't see why you're rejecting decades of 60<IQ<100 AIs as implausible (uninteresting is another matter, but some people are interested). An IQ70 AI is little more able to self-improve than a IQ70 human is able to improve an AI. Even an IQ120 human would have trouble with that. The task of bringing AIs from IQ60 to IQ140 where they can start meaningfully contributing to AI research falls to IQ180 humans, and will probably take a long time.
Not that talking about the IQ of an AI makes a whole lot of sense -- mindspace is many-dimensional and no AI is likely to land on the human manifold. But as a very, very crude approximation, it will do here.
My [unverified] intuition on AI properties is that the delta between current status and 'IQ60AI' is multiple orders of magnitude larger than the delta between 'IQ60AI' and 'IQ180AI'. In essence, there is not that much "mental horsepower" difference between the stereotypical Einstein and a below-average person; it doesn't require a much larger brain or completely different neuronal wiring or a million years of evolutionary tuning.
We don't know how to get to IQ60AI; but getting from IQ60AI to IQ180AI could (IMHO) be done with currently known methods in many labs around the world by the current (non IQ180) researchers rapidly (ballpark of 6 months maybe?). We know from history that a 0 IQ process can optimize from monkey-level intelligence to an Einstein by bruteforcing; So in essence, if you've got IQ70 minds that can be rapidly run and simulted, then just apply more hardware (for more time-compression) and optimization, as that gap seems to require exactly 0 significant breakthroughs to get to IQ180.
Make sure the equality comparison only depends on things that affect functionality -- i.e. it will declare any functionally equivalent programs equal even they use a different nameset for variables or something.
(Yes, I know that's reducible to the halting problem; in practice, you'll use a computable, polynomial time approximation for it that will inevitably have to throw out equivalent programs that are too complex or otherwise be too 'clever'.)
It's quite likely that the optimal behaviour should be different in case the program is functionally equal but not exactly equal.
If you're playing yourself, then you want to cooperate.
If you're playing someone else, then you'd want to cooperate if and only if that someone else is smart enough to check if you'll cooperate; but if it's decision doesn't depend on yours, then you should defect.
Written communication has many advantages, but it typically does not make you actually do the exercises. Typically, one just looks briefly at the exercise, thinks "yeah, I see what they are trying to do" and then clicks another hyperlink or switches to another browser tab.
Having five minutes without internet access and with a social pressure to actually do the exercise can make people actually do the exercises they found on internet a decade ago but never tried.
Sure, everyone is different, but I would expect most people who spend a lot of time on internet to be like this. (And the people who don't spend a lot of time on internet won't see LifeHacker or LessWrong, unless a book form is published.)
I see MOOC's as a big educaational improvement because of this - sure, I could get the same educational info without the MOOC structure; just by reading the field best textbooks and academic papers; but having a specific "course" with the quizzes/homework makes me actually do the excercises, which I wouldn't have done otherwise; and the course schedule forces me to do them now, instead of postponing them for weeks/months/forever.
Thanks for correcting me! I changed that paragraph. Is it less offensive to people who know what they are talking about now?
Sometimes people refer to this relativity of utilities as "positive affine structure" or "invariant up to a scale and shift", which confuses me by making me think of a utility function as a set of things with numbers coming out, which don't agree on the actual numbers, but can be made to agree with a linear transform, rather than a space I can measure distances in.
I feel confused. "a space I can measure distances in" is a strong property of a value, and it does not follow from your initial 5 axioms, and seems contrary to the 5th axiom.
In fact, your own examples given further seem to provide a counterexample - i.e., if someone prefers being a whale to 400 actual orgasms, but prefers 1/400 of being a whale to 1 orgasm, then both "being a whale" and "orgasm" have some utility value, but they cannot be used as units to measure distance.
If you're in a reality where a>b and 2a<2b, then you're not allowed to use classic arithmetic simply because some of your items look like numbers, since they don't behave like numbers.
That's not sufficient -
Maybe. But in context it is onlhy necessary, since in context the point is to separate out the non-etchial cclams which have been piggybacked onto ethics.
there can be wildly different, incompatible universalizable morality systems based on different premises and axioms;
That's not obvious.
As an example (but there are others), many of the major religious traditions would definitely claim to be universalizable systems of morality; and they are contradicting each other on some points.
The points they most obviouslty contradict each other on tend to be the most symbolic ones, about diet and dress, etc.
OK, for a slightly clearer example, in the USA abortion debate, the pro-life "camp" definitely considers pro-life to be moral and wants to apply to everyone; and pro-choice "camp" definitely considers pro-choice to be moral and to apply to everyone.
This is not a symbolic point, it is a moral question that defines literally life-and-death decisions.
What is the difference between "self-serving ideas" as you describe, "tribal shibboleths" and "true morality" ?
That's not sufficient - there can be wildly different, incompatible universalizable morality systems based on different premises and axioms; and each could reasonably claim to be that they are a true morality and the other is a tribal shibboleth.
As an example (but there are others), many of the major religious traditions would definitely claim to be universalizable systems of morality; and they are contradicting each other on some points.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Can you really be sure that a program that you write has at least a 99.9% chance of being correct without performing moderately extensive testing? Personally, I'd probably put significantly more confidence in (b).
Well, but you can (a) preform moderately extensive testing, and (b) do redundancy.
If you write 3 programs for verifying primeness (using different algorithms and possibly programming languages/approaches); and if all their results match, then you can assume a much higher confidence in correctness than for a single such program.