You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Toy model for wire-heading [EDIT: removed for improvement]

2 Stuart_Armstrong 09 October 2015 03:45PM

EDIT: these ideas are too underdeveloped, I will remove them and present a more general idea after more analysis.

This is a (very) simple toy model of the wire-heading problem to illustrate how it might or might not happen. The great question is "where do we add the (super)intelligence?"

Let's assume a simple model for an expected utility maximising agent. There's the input assessor module A, which takes various inputs and computes the agent's "reward" or "utility". For a reward-based agent, A is typically outside of the agent; for a utility-maximiser, it's typically inside the agent, though the distinction need not be sharp. And there's the the decision module D, which assess the possible actions to take to maximise the output of A. If E is the general environment, we have D+A+E.

Now let's make the agent superintelligent. If we add superintelligence to module D, then D will wirehead by taking control of A (whether A is inside the agent or not) and controlling E to prevent interference. If we add superintelligence to module A, then it will attempt to compute rewards as effectively as possible, sacrificing D and E to achieve it's efficient calculations.

Therefore to prevent wireheading, we need to "add superintelligence" to (D+A), making sure that we aren't doing so to some sub-section of the algorithm - which might be hard if the "superintelligence" is obscure or black-box.

 

Does utilitarianism "require" extreme self sacrifice? If not why do people commonly say it does?

7 Princess_Stargirl 09 December 2014 08:32AM


Chist Hallquist wrote the following in an article (if you know the article please, please don't bring it up, I don't want to discuss the article in general):


"For example, utilitarianism apparently endorses killing a single innocent person and harvesting their organs if it will save five other people. It also appears to imply that donating all your money to charity beyond what you need to survive isn’t just admirable but morally obligatory. "


The non-bold part is not what is confusing me. But where does the "obligatory" part come in. I don't really how its obvious what, if any, ethical obligations utilitarianism implies. given a set of basic assumptions utilitarianism lets you argue whether one action is more moral than another. But I don’t see how its obvious which, if any, moral benchmarks utilitarianism sets for “obligatory.” I can see how certain frameworks on top of utilitarianism imply certain moral requirements. But I do not see how the bolded quote is a criticism of the basic theory of utilitarianism.


However this criticism comes up all the time. Honestly the best explanation I could come up with was that people were being unfair to utilitarianism and not thinking through their statements. But the above quote is by HallQ who is intelligent and thoughtful. So now I am genuinely very curious.


Do you think utilitarianism really require such extreme self sacrifice and if so why? And if it does not require this why do so many people say it does? I am very confused and would appreciate help working this out.


edit:


I am having trouble asking this question clearly. Since utilitarianism is probably best thought of as a cluster of beliefs. So its not clear what asking "does utilitarianism imply X" actually means. Still I made this post since I am confused. Many thoughtful people identity as utilitarian (for example Ozy and theunitofcaring) yet do not think people have extreme obligations. However I can think of examples where people do not seem to understand the implications of their ethical frameowrks. For example many Jewish people endorse the message of the following story:



Rabbi Hilel was asked to explain the Torah while standing on one foot and responded "What is hateful to you, do not do to your neighbor. That is the whole Torah; the rest is the explanation of this--go and study it!"


The story is presumably apocryphal but it is repeated all the time by Jewish people. However its hard to see how the story makes even a semblance of sense. The torah includes huge amounts of material that violates the "golden Rule" very badly. So people who think this story gives even a moderately accurate picture of the Torah's message are mistaken imo.

The Up-Goer Five Game: Explaining hard ideas with simple words

29 RobbBB 05 September 2013 05:54AM

xkcd's Up-Goer Five comic gave technical specifications for the Saturn V rocket using only the 1,000 most common words in the English language.

This seemed to me and Briénne to be a really fun exercise, both for tabooing one's words and for communicating difficult concepts to laypeople. So why not make a game out of it? Pick any tough, important, or interesting argument or idea, and use this text editor to try to describe what you have in mind with extremely common words only.

This is challenging, so if you almost succeed and want to share your results, you can mark words where you had to cheat in *italics*. Bonus points if your explanation is actually useful for gaining a deeper understanding of the idea, or for teaching it, in the spirit of Gödel's Second Incompleteness Theorem Explained in Words of One Syllable.

As an example, here's my attempt to capture the five theses using only top-thousand words:

  • Intelligence explosion: If we make a computer that is good at doing hard things in lots of different situations without using much stuff up, it may be able to help us build better computers. Since computers are faster than humans, pretty soon the computer would probably be doing most of the work of making new and better computers. We would have a hard time controlling or understanding what was happening as the new computers got faster and grew more and more parts. By the time these computers ran out of ways to quickly and easily make better computers, the best computers would have already become much much better than humans at controlling what happens.
  • Orthogonality: Different computers, and different minds as a whole, can want very different things. They can want things that are very good for humans, or very bad, or anything in between. We can be pretty sure that strong computers won't think like humans, and most possible computers won't try to change the world in the way a human would.
  • Convergent instrumental goals: Although most possible minds want different things, they need a lot of the same things to get what they want. A computer and a human might want things that in the long run have nothing to do with each other, but have to fight for the same share of stuff first to get those different things.
  • Complexity of value: It would take a huge number of parts, all put together in just the right way, to build a computer that does all the things humans want it to (and none of the things humans don't want it to).
  • Fragility of value: If we get a few of those parts a little bit wrong, the computer will probably make only bad things happen from then on. We need almost everything we want to happen, or we won't have any fun.

If you make a really strong computer and it is not very nice, you will not go to space today.

Other ideas to start with: agent, akrasia, Bayes' theorem, Bayesianism, CFAR, cognitive bias, consequentialism, deontology, effective altruism, Everett-style ('Many Worlds') interpretations of quantum mechanics, entropy, evolution, the Great Reductionist Thesis, halting problem, humanism, law of nature, LessWrong, logic, mathematics, the measurement problem, MIRI, Newcomb's problem, Newton's laws of motion, optimization, Pascal's wager, philosophy, preference, proof, rationality, religion, science, Shannon information, signaling, the simulation argument, singularity, sociopathy, the supernatural, superposition, time, timeless decision theory, transfinite numbers, Turing machine, utilitarianism, validity and soundness, virtue ethics, VNM-utility

Why is it that I need to create an article to state an idea?

-36 [deleted] 01 May 2012 03:00PM

 

This is where it begins. This is also where it ends. You've been challenged. I have a wish. I also have a belief. I believe that people who wish should stop wishing and should instead start by doing. This seems a little more efficient honestly? Why did I use the word "honestly?" Because I feel I'm being honest. Simple connection. Returning to the point however, doing is so much more efficient than wishing. Correct me if I'm wrong. Oh wait...you can't (Unless I'm wrong of course). You know I'm right.

 

 

 

 

(P.S. = Why is it that I need 20 karma to share my idea? This seems inefficient. My voice wishes to be heard however it can not. That sounds like the opposite of efficient to me.)

 

(P.S. = Again...does this captcha really prove I'm human? Does not this message elaborate enough T_T) I see error in this system. I guess I'm just hoping that whoever reads this will be LessWrong and MoreRight.

 

Thinking in Bayes: Light

6 atucker 10 October 2011 04:08AM

There are a lot of explanations of Bayes' Theorem, so I won't get into the technicalities. I will get into why it should change how you think. This post is pretty introductory, so free to totally skip it if you don't feel like there's anything about Bayes' Theorem that you don't understand.

For a while I was reading LessWrong and not seeing what the big deal about Bayes' Theorem was. Sure, probability is in the mind and all, but I didn't see why it was so important to insist on bayesian methods. For me they were a tool, rather than a way of thinking. This summary also helped someone in the DC group.

After using the Anki deck, a thought occurred to me:

Bayes theorem means that when seeing how likely a hypothesis is after an event, not only do I need to think about how likely the hypothesis said the event is, I need to consider everything else that could have possibly made that event more likely.

To illustrate:

pretty clearly shows how you need to consider P(e|H), but that's slightly more obvious than the rest of it.

If you write it out the way that you would compute it you get...

where h is an element of the hypothesis space.

This means that every way that e could have happened is important, on top of (or should I say under?) just how much probability the hypothesis assigned to e.

This is because P(e) comes from every hypothesis that contributes to e happening, or more mathilyeX P(e) is the sum over all possible hypotheses of the probability of the event and that hypothesis, computed by the probability of the hypothesis times the probability of the event given the hypothesis.

In LaTeX:

where h is an element of the hypothesis space.