This is equivalent to , since is an obvious lie.

Um... what? I don't see the relation between that and the previous line.

In response to
Intuitive cooperation

This is equivalent to , since is an obvious lie.

Um... what? I don't see the relation between that and the previous line.

DR(B) = C is easily seen to be false, since DR always defects (by definition), and in general, ~X is logically equivalent to X -> False. So we take X to be the statement from the line above, and place DR(B) = C for False.

Sorry that wasn't more clear.

In response to
Ethics in a Feedback Loop: A Parable

Some ideas:

It might be good to have designated spaces where Green Martians can practice tickling participating humans. As more and more of these spaces become available, it becomes more and more socially unacceptable for Green Martians to tickle humans elsewhere. Obviously, for this to work, there would need to be a sufficiently high number of humans willing to participate.

For me, a large part of minor annoyances is anticipating minor annoyances. Imagining myself as a human, I might get really sick of stung randomly throughout the day. It would be easier to deal with the occasional sting if they were constrained to certain times of the day or to certain environments. For example, I think it would be good to make a taboo against tickling at work. The blue martians can always tickle later, and the humans would be able to work without having to worry about getting stung.

Now imagining myself as a Green Martian, I feel like I would naturally feel bad about stinging humans, and try to avoid it for the most part. This anxiety would be significantly worse if the humans visibly disliked me for stinging them. But I would be desperate for someone who could help me become a Blue Martian. Learning the Blue Martian's techniques would be very tempting... Something that might be a good alternative would be if there were humans working with the Blue Martians, finding ways to make the techniques more robust.

Also, studying the transformation process more carefully would probably be very helpful for everyone.

In response to
Intuitive cooperation

This is really nicely written, but unfortunately the main lesson for me is that I don't understand the Löb's Theorem. Despite reading the linked PDF file a few times. So I guess I will use this place to ask you a few things, since I like your way of explaining.

First, I may have just ruined my math credentials by admitting that I don't understand Löb's Theorem, but it doesn't mean that I suck at maths completely. I feel pretty confident in elementary- and maybe even high-school math. Let's say that I am confident I would *never* tell someone that 2+2=3.

Second, if I understand the rules for implication correctly, if the premise is false, the implication is always true. Things like "if 2+2=3, then Eliezer will build a Friendly AI" should be pretty non-controversial even outside of LessWrong.

So it seems to me that I can safely say "if I tell you that 2+2=3, then 2+2=3". (Because I know I would never say that 2+2=3.)

But Mr. Löb insists that it means that I just claimed that 2+2=3.

And I feel like Mr. Löb is totally strawmanning me.

Please help!

Oh thanks; I hope I am able to explain it adequately. Your understanding of implication, and what Löb's theorem would imply about yourself (assuming you are a formal mathematical system, of course) are both correct. More formally, for any formal system S at least containing Peano Arithmetic, if this system can prove: (If PA proves that 2+2=3, then 2+2 really is 3), then the system will also prove that 2+2=3. This is pretty bad, since 2+2=4, so we must declare the system inconsistent. This flies pretty strongly against most people's intuition - but since it is *true*, it's your intuition that needs to be retrained.

The analogy to 'being suspicious' hopefully helps brings in some of the right connotations and intuition. Of course, it's not a perfect analogy, since as you mentioned, this is kind of an uncharitable way to interpret *people*.

Anyway, I feel less confident that I can explain the proof of Löb's theorem intuitively, but here's a jab at it:

Peano Arithmetic (PA) is surprisingly powerful. In much the same way that we can store and process all sorts of information on computers as binary numbers, we can write all sorts of statements (including statements about PA itself) as numbers. And similar to how all the processing on a computer is reduced to operations with simple transistors, we can encode all the rules of inference for these statements as arithmetic functions.

In *particular*, we can encode a statement L that says "If a proof of L exists (in PA), then X is true," where the X can be arbitrary.

Now PA can think, "well if I had a proof of L, then unpacking what L means, I could prove that (if a proof of L exists, then X is true)."

"I also know that if I had a proof of L, then I could prove that I can prove L."

"So I can just skip ahead and see that - if I had a proof of L, then I could prove that X is true."

A dishonest PA may then reason - "Since I *trust* myself, if I knew I had a proof of L, then X would just be true." @

"I can also prove this fact about myself - that if I had a proof of L, then X is true."

"But that's exactly what L says! So I can prove that I can prove L"

"So going back to @, I now know that X is true!"

But X can be literally *anything*! So this is really bad.

At a coarser level, you can think of as breathing life into things you imagine about yourself. You can try to be careful to only imagine things you can actually see yourself doing, but there are tricky statements like L that can use this to smuggle arbitrary statements out of your imagination into real life! L says, "if you imagine L, then X is true." Even though this is just a statement, and is neither true nor false *a priori*, you still have to be careful thinking about it. You need a solid barrier between the things you imagine about yourself, and the things you actually do, to protect yourself from getting 'hacked' by this statement.

Hopefully something in here helps the idea 'click' for you. Feel free to ask more questions!

This is an exposition of some of the main ideas in the paper Robust Cooperation. My goal is to make the ideas and proofs seem natural and intuitive - instead of some mysterious thing where we invoke Löb's theorem at the right place and the agents magically cooperate. Also I hope it is accessible to people without a math or CS background. Be warned, it is pretty cheesy ok.

In a small quirky town, far away from other cities or towns, the most exciting event is a game called (for historical reasons) The Prisoner's Dilemma. Everyone comes out to watch the big tournament at the end of Summer, and you (Alice) are especially excited because this year it will be your first time playing in the tournament! So you've been thinking of ways to make sure that you can do well.

The way the game works is this: Each player can choose to cooperate or defect with the other player. If you both cooperate, then you get two points each. If one of you defects, then that player will get three points, and the other player won't get any points. But if you both defect, then you each get only one point. You have to make your decisions separately, without communicating with each other - however, everyone is required to register the algorithm they will be using before the tournament, and you *can* look at the other player's algorithm if you want to. You also are allowed to use some outside help in your algorithm.

Now if you were a newcomer, you might think that no matter what the other player does, you can always do better by defecting. So the best strategy must be to always defect! Of course, you know better, if *everyone* tried that strategy, then they would end up defecting against each other, which is a shame since they would both be better off if they had just cooperated.

But how can you do better? You have to be able to describe your algorithm in order to play. You have a few ideas, and you'll be playing some practice rounds with your friend Bob soon, so you can try them out before the actual tournament.

Your first plan:

I'll cooperate with Bob if I can tell from his algorithm that he'll cooperate with me. Otherwise I'll defect.

For your first try, you'll just run Bob's algorithm and see if he cooperates. But there's a problem - if Bob tries the same strategy, he'll have to run your algorithm, which will run his algorithm again, and so on into an infinite loop!

So you'll have to be a bit more clever than that... luckily you know a guy, Shady, who is good at these kinds of problems.

You call up Shady, and while you are waiting for him to come over, you remember some advice your dad Löb gave you.

(Löb's theorem) "If someone says you can trust them on X, well then they'll just tell you X."

If (someone tells you If [I tell you] X, then X is true)

Then (someone tells you X is true)

(See The Cartoon Guide to Löb's Theorem[pdf] for a nice proof of this)

Here's an example:

Sketchy watch salesman: Hey, if I tell you these watches are genuine then they *are* genuine!

You: Ok... so are these watches genuine?

Sketchy watch salesman: Of course!

It's a good thing to remember when you might have to trust someone. If someone you already trust tells you you can trust them on something, then you know that something must be true.

On the other hand, if someone says you can *always* trust them, well that's pretty suspicious... If they say you can trust them on everything, that means that they will never tell you a lie - which is logically equivalent to them saying that if they *were* to tell you a lie, then that lie must be true. So by Löb's theorem, they will lie to you. (Gödel's second incompleteness theorem)

Despite his name, you actually trust Shady quite a bit. He's never told you or anyone else anything that didn't end up being true. And he's careful not to make any suspiciously strong claims about his honesty.

So your new plan is to ask Shady if Bob will cooperate with you. If so, then you will cooperate. Otherwise, defect. (FairBot)

It's game time! You look at Bob's algorithm, and it turns out he picked the exact same algorithm! He's going to ask Shady if *you* will cooperate with *him*. Well, the first step is to ask Shady, "will Bob cooperate with me?"

Shady looks at Bob's algorithm and sees that if Shady says you cooperate, then Bob cooperates. He looks at your algorithm and sees that if Shady says Bob cooperates, then you cooperate. Combining these, he sees that if he says you *both* cooperate, then both of you will cooperate. So he tells you that you will both cooperate (your dad was right!)

Let A stand for "Alice cooperates with Bob" and B stand for "Bob cooperates with Alice".

From looking at the algorithms, and .

So combining these, .

Then by Löb's theorem, .

Since that means that Bob will cooperate, you decide to actually cooperate.

Bob goes through an analagous thought process, and also decides to cooperate. So you cooperate with each other on the prisoner's dilemma! Yay!

That night, you go home and remark, "it's really lucky we both ended up using Shady to help us, otherwise that wouldn't have worked..."

Your dad interjects, "Actually, it *doesn't* matter - as long as they were both smart enough to count, it would work. This doesn't just say 'I tell you X', it's stronger than that - it actually says 'Anyone who knows basic arithmetic will tell you X'. So as long as they both know a little arithmetic, it will still work - *even* if one of them is pro-axiom-of-choice, and the other is pro-axiom-of-life. The cooperation is *robust*." That's really cool!

But there's another issue you think of. Sometimes, just to be tricky, the tournament organizers will set up a game where you have to play against a rock. Yes, literally just a rock that holding the cooperate button down. If you played against a rock with your current algorithm, well you start by asking Shady if the rock will cooperate with you. Shady is like, "well yeah, duh." So then you cooperate too. But you could have gotten *three* points by defecting! You're missing out on a totally free point!

You think that it would be a good idea to make sure the other player isn't a complete idiot before you cooperate with them. How can you check? Well, let's see if they would cooperate with a rock placed on the defect button (affectionately known as 'DefectRock'). If they know better than that, and they will cooperate with you, *then* you will cooperate with them.

The next morning, you excitedly tell Shady about your new plan. "It will be like before, except this time, I also ask you if the other player will cooperate with DefectRock! If they are dumb enough to do that, then I'll just defect. That way, I can still cooperate with other people who use algorithms like this one, or the one from before, but I can also defect and get that extra point when there's just a rock on cooperate."

Shady get's an awkward look on his face, "Sorry, but I can't do that... or at least it wouldn't work out the way you're thinking. Let's say you're playing against Bob, who is still using the old algorithm. You want to know if Bob will cooperate with DefectRock, so I have to check and see if I'll tell Bob that DefectRock will cooperate with him. I would have say I would *never *tell Bob that DefectRock will cooperate with him. But by Löb's theorem, that means I *would* tell you this obvious lie! So that isn't gonna work."

Notation, if X cooperates with Y in the prisoner's dilemma (or = D if not).

You ask Shady, does ?

Bob's algorithm: only if .

So to say , we would need .

This is equivalent to , since is an obvious lie.

By Löb's theorem, , which is a lie.

<Extra credit: does the fact that Shady is the one explaining this mean you can't trust him?>

<Extra extra credit: find and fix the minor technical error in the above argument.>

Shady sees the dismayed look on your face and adds, "...*but*, I know a guy who can vouch for me, and I think maybe that could make your new algorithm work."

So Shady calls his friend T over, and you work out the new details. You ask *Shady* if Bob will cooperate with you, and you ask *T* if Bob will cooperate with DefectRock. So T looks at Bob's algorithm, which asks Shady if DefectRock will cooperate with him. Shady, of course, says no. So T sees that Bob will defect against DefectRock, and lets you know. Like before, Shady tells you Bob will cooperate with you, and thus you decide to cooperate! And like before, Bob decides to cooperate with you, so you both cooperate! Awesome! (PrudentBot)

If Bob is using your new algorithm, you can see that the same argument goes through mostly unchanged, and that you will still cooperate! And against a rock on cooperate, T will tell you that it will cooperate with DefectRock, so you can defect and get that extra point! This is really great!!

(ok now it's time for the *really* cheesy ending)

It's finally time for the tournament. You have a really good feeling about your algorithm, and you do really well! Your dad is in the audience cheering for you, with a really proud look on his face. You tell your friend Bob about your new algorithm so that he can also get that extra point sometimes, and you end up tying for first place with him!

A few weeks later, Bob asks you out, and you two start dating. Being able to cooperate with each other robustly is a good start to a healthy relationship, and you live happily ever after!

The End.

In principle, it is possible to simulate a brain on a computer

That's a hypothesis, unproven and untested. Especially if you claim the equivalence between the mind and the simulation -- which you have to do in order to say that the simulation delivers the "source code" of the mind.

you can think of something's source code as a (computable) mathematical description of that thing.

A mathematical description of my mind would be beyond the capabilities of my mind to understand (and so, know). Besides, my mind ** changes** constantly both in terms of patterns of neural impulses and, more importantly, in terms of the underlying "hardware". Is neuron growth or, say, serotonin release part of my "source code"?

The laws of physics as we currently understand them are computable (not efficiently, but still), and there is no reason to hypothesize new physics to explain how the brain works. I'm claiming there is an isomorphism.

Dynamic systems have mathematical descriptions also...

This blog provides a cynical view on the workplace: http://michaelochurch.wordpress.com/

I like the cynicism, but I don't know how realistic it is.

Also, there is Stack Exchange: http://workplace.stackexchange.com/

In response to
Three questions about source code uncertainty

*P(B|A,X) or P(B|X)*P(A|B,X). But what if A is the proposition that we calculate the probability P(A,B|X) by using P(A|X)*P(B|A,X)? Then we will get different answers depending on how we do the calculation.

In response to
Three questions about source code uncertainty

I'm very confused about how that theory applies to people

It does not.

The concept of "source code" is of doubtful use when applied to wetware, anyway.

In principle, it is possible to simulate a brain on a computer, and I think it's meaningful to say that if you *could* do this, you would know your "source code". In general, you can think of something's source code as a (computable) mathematical description of that thing.

Also, the point of the post is to generalize the theory to this domain. Humans don't know their source code, but they do have models of other people, and use these to make complicated decisions. What would a formalization of this kind of process look like?

We'll be discussing Yvain's Positivism, Self Deception, and Neuroscience sequence: http://wiki.lesswrong.com/wiki/Positivism,_Self_Deception,_and_Neuroscience_(sequence)

In particular, we will be discussing the article Simultaneously Right and Wrong: http://lesswrong.com/lw/1d/simultaneously_right_and_wrong/

Please read the whole sequence and focus specifically on Simultaneously Right and Wrong. We'll be talking about things your brain does to deceive itself and how to recognize and avoid these pitfalls.

There will be snacks and games! Hope to see you then!

Here's a link to the google streetview of the turn in to the apartment complex, since it's hard to find: https://www.google.com/maps/@33.817518,-84.267679,3a,75y,356.44h,70.19t/data=!3m4!1e1!3m2!1sPGiXiJ00M8mb5eELuqwxeA!2e0

View more: Next

*0 points [-]