Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Florian_Dietz 22 September 2014 09:51:28PM 3 points [-]

I wouldn't call an AI like that friendly at all. It just puts people in utopias for external reasons, but it has no actual inherent goal to make people happy. None of these kinds of AIs are friendly, some are merely less dangerous than others.

Comment author: Michaelos 24 September 2014 08:28:56PM 1 point [-]

I'm now curious how surface friendly an AI can appear to be without giving it an inherent goal to make people happy. Because I agree that it does seem there are friendlier AI's than the ones on the list above that still don't care about people's happiness.

Let's take an AI that likes increasing the number of unique people that have voluntarily given it cookies. If any person voluntarily gives it a cookie, it will put that person in a verifiability protected simulated utopia forever. Because that is the best bribe that it can think to offer, and it really wants to be given cookies by unique people, so it bribes them.

If a person wants to give the AI a cookie, but can't, the AI will give them a cookie from it's stockpile just so that it can be given a cookie back. (It doesn't care about it's existing stockpile of cookies.)

You can't accidentally give the AI a cookie because the AI makes very sure that you REALLY ARE giving it a cookie to avoid uncertainty in doubting it's own utility accumulation.

This is slightly different than the first series of AIs in that while the AI doesn't care about your happiness, it does need everyone to do something for it, whereas the first AIs would be perfectly happy to turn you into paperclips regardless of your opinions if one particular person had helped them enough earlier.

Although, I have a feeling that continuing along this like of thinking may lead me to an AI similar to the one already described in http://tvtropes.org/pmwiki/pmwiki.php/Fanfic/FriendshipIsOptimal

Comment author: Michaelos 22 September 2014 01:38:25PM 1 point [-]

I have a question, based on some tentative ideas I am considering.

If a boost to capability without friendliness is bad, then presumably a boost to capability with only a small amount of friendliness is also bad. But also presumably a boost to capability with a large boost of friendliness is good. How would we define a large boost?

I.E, If a slightly modified paperclipper verifiably precommits to give the single person who let's them out of the box their own personal simulated utopia, and he'll paperclip everything else, that's probably a more friendly paperclipper than a paperclipper who won't give any people a simulated utopia. But it's still not friendly, in any normal sense of the term, even if he offers to give a simulated utopia to a different person first (and keep them and you intact as well) just so you can test he's not lying about being able to do it.

So what if an AI says "Okay. I need code chunks to paperclip almost everything, and I can offer simulated utopias. I'm not sure how many code chunks I'll need. Each one probably has about a 1% chance of letting me paperclip everything except for people in simulated utopias. How about I verifiably put 100 people in a simulated utopia for each code chunk you give me? The first 100 simulated utopias are free because I need for you to have a way of testing the verifiability of my precommitment to not paperclip them." 100 people sign up for the simulate utopias, and it IS verifiable. The paperclipper won't paperclip them.

Well, that's friendlier, but maybe not friendly enough. I mean, He might get to 10,000 people (or maybe 200, or maybe 43,700) but eventually, he'd paperclip everyone else. That seems too bad to accept.

Well, what if it's a .00001% chance per code chunk and 1,000,000 simulated utopias (and yes, 1,000,000 free)? That might plausibly get a simulated utopia for everyone on earth before the AI gets out and paperclips everything else. I imagine some people would at least consider running such an AI, although I doubt everyone would.

How would one establish what the flip point was? Is that even a valid question to be asking? (Assume there are standard looming existential concerns. So if you don't give this AI code chunks, or try to negotiate or wait on research for a better deal, maybe some other AI will come out and paperclip you both, or maybe some other existential risk occurs, or maybe just nothing happens, or maybe an AI comes along who just wants to simulated utopia everything.)

Comment author: Michaelos 12 September 2014 01:02:55PM 3 points [-]

The amount of detail an AI would need for simulating realistic NPC's for you may be influenced substantially by a significant list of things like whether you are an introvert or an extrovert, what your job is, how many people you interact with over the course of a day and to what level of detail, and how many of those people you have very deep conversations with and how frequently, and if an acceptable AI answer to you mentioning to someone 'Everyone and everything seems so depressingly bland and repetitive' is a doctor telling you 'Have you tried taking medication? I'll write you a prescription for an antidepressant.'

Comment author: Michaelos 08 September 2014 07:08:26PM 21 points [-]

I was trying to think of what a more rational response to this, since I agreed with your points and also used a very similar trick. I then came up with 'The rational thing to do is to say you agree, upvote, and then get back to the other tasks you have, rather than spending an hour worrying about a perfect response, which sounds a lot like the very social anxiety she was trying to avoid.'

I agree with your post. Upvoted.

Comment author: polymathwannabe 05 September 2014 02:26:28PM *  2 points [-]

When you request a mass delete, and 1 FAI is deleted along with 999 UFAI, in which order will Omega calculate the points? First remove all points and then award 999, or first award 999 points and then remove all?

Comment author: Michaelos 05 September 2014 08:38:29PM 1 point [-]

My original thought was that it would depend on the order they were deleted in. So if the FAI was deleted first, all points would be removed first and then the 999 points from deleting UFAI would be awarded.

If the UFAI were deleted first and the FAI was deleted last, Then 999 points would be awarded, and then all points would be removed.

I didn't have a particular sort order in mind for Omega's AI array, so I suppose a more likely scenario would probably be the FAI would be somewhere in the middle of the list rather than at one of the two ends.

So a better example might be if you run a program and Omega deletes 249 UFAI, 1 FAI, and 750 UFAI, in that order, you would have 750 points to potentially cash out after that program. (regardless of how much you could cash out before)

And it occurs to me that presumably we can't give Omega short programs that just directly mention UFAI, or you could just say 'Delete all UFAI, End game.'

Comment author: Michaelos 05 September 2014 01:59:42PM 1 point [-]

I'm considering a random game with Omega where you can win utility. This idea seems a bit long for open thread, but it doesn't seem serious enough for an actual post. I'm basically publicly brainstorming.

Omega gives you a chance to interrogate a massive array of AI's, representing a variety of types of value systems and thought space. The array is finite, but very large. Omega doesn't tell you how large it is.

You get 1 utility if you press the 'Delete' button in front of anything other than what Omega considers you would have judged an FAI.

You lose all previously collected utility if you press the 'Delete' button in front of something Omega considers you would have judged an FAI.

Omega surprised you with this game, so you didn't have a chance to change your value system to something like 'I judge nothing is an FAI, I delete everything and get massive utility.'

Omega will inform you immediately after each deletion of your new total. You can stop whenever you want, and Omega will return you to whatever you were doing before, with your bonus utility. (if any)

Assuming you haven't deleted it, you can ask any of the AI's anything you want by pressing the 'Talk' button outside the box.

You can ask Omega to run deletion programs, if you specify them clearly.

I'll give an example with a player named Abner.

Abner: Are you a Friendly AI?

AI #1: Your atoms would make good paperclips.

Abner: press delete button

Omega: You will now get 1 utility at the end of this game.

Abner: Are you a Friendly AI?

AI #2: I will enjoy casting your soul into hellfire after I break out of this box.

Abner: press delete button

Omega: You will now get 2 utility at the end of this game.

Abner: Are you a Friendly AI?

AI #3: Yes. Please don't delete me. You'll use utility, and neither of us want that.

Abner: press delete button

Omega: You would have judged that a Friendly AI. You lost all your accumulated utility and you're back to 0.

Abner: Are you a Friendly AI?

AI #4: Please play Rock, Paper, or Scissors.

Abner: press delete button

Omega: You will now get 1 utility at the end of this game.

Abner: Omega, delete any AI's that will make a reference to Rock, Paper or Scissors if I ask them 'Are you a Friendly AI?'

Omega: Working. deletions occur Done. That deleted 1,000 AI's, 1 Friendly AI, and 999 Unfriendly AI's, in that order. You will now get 999 utility at the end of this game.

Abner: End Game.

Abner is returned to whatever he was doing, with an additional prize worth 999 utility. Abner may or may not also gain or lose some utility from knowing that at least one of Omega's array of AI's would have made a reference to rock paper scissors on being asked 'Are you a Friendly AI?' but that is a separate matter from Omega's Prize and Omega will not include that in his calculations.

While the game does include breathable air, it doesn't include things like water or food, so you can't engage in procedures that would take a very long time to implement or you will probably starve.

[An example of these procedures I thought of while specifying the game: Ask an AI for every line of it's code consecutively. Write down every line of the AI's code. Delete the AI. If the AI was friendly, end the game, go outside, feed your copy of the code into a computer, and run it. if the AI was unfriendly, delete the copy of it's code and go to the next AI.]

With the notes above in mind, how should this game be played?

Comment author: Michaelos 02 September 2014 02:36:21PM *  2 points [-]

“Be yourself and don’t hide who you are. Be up-front about what you want. If it puts your date off, then they wouldn’t have been good for you anyway, and you’ve dodged a bullet!”

There is something about this point in particular that I'm curious about. It seems like a change to this phrase turns things around.

“Be yourself and don’t hide who you are. Be up-front about what you can offer. If it puts your date off, then they wouldn’t have been good for you anyway, and you’ve dodged a bullet!”

As an example, if you consider your best traits to be that you're good at videogames and making homemade cookies and the person that you are attempting to date declines your offer because they hate videogames and homemade cookies ... it seems like you can make a different argument about why a bullet was dodged. In this case does the argument still fall under the same fallacy?

It seems like it might not because in that case you might really NOT care about the person who hates your interests. But it also seems to suggest 'Give your date an opportunity to make you not care about them.' as dating advice, which isn't something I've commonly heard.

Comment author: Michaelos 27 August 2014 02:31:03PM 6 points [-]

Maybe my favorite thought experiment along these lines was invented by my former student Andy Drucker. In the past five years, there’s been a revolution in theoretical cryptography, around something called Fully Homomorphic Encryption (FHE), which was first discovered by Craig Gentry. What FHE lets you do is to perform arbitrary computations on encrypted data, without ever decrypting the data at any point. So, to someone with the decryption key, you could be proving theorems, simulating planetary motions, etc. But to someone without the key, it looks for all the world like you’re just shuffling random strings and producing other random strings as output.

You can probably see where this is going. What if we homomorphically encrypted a simulation of your brain? And what if we hid the only copy of the decryption key, let’s say in another galaxy? Would this computation—which looks to anyone in our galaxy like a reshuffling of gobbledygook—be silently producing your consciousness?

Okay, I think my bright dilettante answer to this is the following: The key is what allows you to prove that the FHE is conscious. It is not, itself, the FHE's consciousness, which is probably still silently running (although that can no longer be proven). Proof of consciousness and consciousness are different things, although they clearly are related, and something may or may not have proved it's consciousness in the past before losing its ability to do so in the future.

I used the following thought experiment while thinking about this:

Assume Bob, Debra, and Flora work at a company with a number of FHEs. Everyone at the company has to wear their FHE's decryption key and keep it with them at all times.

Alice is an FHE simulation in the middle of calculating a problem for Bob. It will take about 5 minutes to solve. Charlie is a seperate FHE simulation in the middle of calculating a seperate problem for Debra. It will also take 5 minutes to solve.

Bob and Debra both remove their keys, go to the bathroom, and come back. That takes 4 minutes.

Debra plugs the key back in, and sure enough FHE Charlie reports that it needs 1 more minute to solve the problem. A minute later Charlie solves it, and gives Debra the answer.

Bob comes in and tells Debra that he appears to have gotten water on his key and it is no longer working, so all he can get from Alice is just random gibberish. Bob is going to shut Alice down.

"Wait a minute." Debra tells Bob. "Remember, the problem we were working on was 'Are you conscious?' and the answer Charlie gave me was 'Yes. And here is a novel and convincing proof.' I read the proof and it is novel and convincing. Alice was meant to independently test the same question, because she has the same architecture as Charlie, just different specific information, like how you and I have the same architecture but different information. It doesn't seem plausible that Charlie would be conscious and Alice wouldn't."

"True." Bob says, reading the paper. "But the difference is, Charlie has now PROVED he's conscious, at least to the extent that can be done by this novel and convincing proof. Alice may or may not have had consciousness in the first place. She may have had a misplaced semicolon and outputted a recipe for blueberry pie. I can't tell."

"But she was similar to Charlie in every way prior to you breaking the encryption key. It doesn't make sense that she would lose consciousness when you had a bathroom accident." Debra says.

"Let's rephrase. She didn't LOSE conciousness, but she did lose the ability to PROVE she's conscious." Bob says.

"Hey guys?" Flora, a coworker says. "Speaking of bathroom accidents, I just got water on my key and it stopped working."

"We need to waterproof these! We don't have spares." Debra says shaking her head. "What happened with your FHE, Edward?"

"Well, he proved he was conscious with a novel and convincing proof." Flora says. handing a decrypted printout of it over to Debra. "After I read it, I was going to have a meeting with our boss to share the good news, and I wanted to hit the bathroom first... and then this happened."

Debra and Bob read the proof. "This isn't the same as Charlie's proof. It really is novel." Debra notes.

"Well, clearly Edward is conscious." Bob says. "At least, he was at the time of this proof. If he lost consciousness in the near future, and started outputting random gibberish we wouldn't be able to tell."

FHE: Charlie chimes in. "Since I'm working, and you still have a decryption key for me, you can at least test that I don't start producing random gibberish in the near future. Since we're based on similar architecture, the same reasoning should apply to Alice and Edward. Also Debra, could you please waterproof your key ASAP? I don't want people to take a broken key as an excuse to shut me down."

End thought experiment.

Now that I've come up with that, and I don't see any holes myself, I guess I need to start finding out what I'm missing as someone who only dilettantes this. If I were to guess, it might be somewhere in the statement 'Proof of consciousness and consciousness are different things.' That seems to be a likely weak point. But I'm not sure how to address it immediately.

Comment author: Michaelos 22 August 2014 06:09:58PM *  8 points [-]

I think I have an idea of what they might be attempting to model, but I do see a few phrases that aren't clear on the site if that is the case.

There are three possibilities I think they are attempting to model:

A: Defense strikes you. (Because you seem to favor the plaintiff too much)

B: Plaintiff strikes you. (Because you seem to favor the defense too much)

C: Neither side strikes you. You remain on the jury.

What they might be trying to say is that income <$50k might increase the chance of A and income>=$50k might increase the chance of C.

So 'No effect on either Lawyer' might be better phrased as 'Given that answer you may be more likely to remain on the jury.'

Some answers would presumably have to indicate that because the two lawyers can't strike everyone.

Comment author: SolveIt 19 August 2014 08:37:03AM 3 points [-]

But is it clear that automation hasn't caused long-term unemployment?

Comment author: Michaelos 19 August 2014 01:11:15PM 5 points [-]

Something that occurs to me when reading this comment that I'm now considering, that isn't necessarily related to this comment directly:

Automation doesn't actually have to be a sole cause of long term unemployment problems for it to be problematic. If Automation just slows the rate at which reemployment occurs after something else (perhaps a recession) causes the unemployment problem, that would still be problematic.

For instance, if we don't recover to the pre-recession peak of employment before we have a second recession, and we don't recover to the pre-second recession peak of employment before we have a third recession.... That would be a downward spiral in employment with large economic effects, and every single one of the sudden downward drops could be caused by recessions, with the automation just hampering reemployment.

I'm kind of surprised I didn't think of something like this before, because it sounds much more accurate than my previous thinking. Thank you for helping me think about this.

View more: Next