Let me offer another possibility for discussion.
Neither of the two original powerpoints should be presented, because both rely on an assumption that should not have been present. Albert, as an FAI under construction, should have been preprogrammed to automatically submit any kind of high impact utility calculations to human programmers without it being an overridable choice on Albert's part.
So while they were at the coffee machine, one of the programmers should have gotten a text message indicating something along the lines of 'Warning: Albert is having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse.'
My general understanding of being an FAI under construction is that you're mostly trusted in normal circumstances but aren't fully trusted to handle odd high impact edge cases (Just like this one)
At that point, the human programmers, after consulting the details, are already aware that Albert finds this critically important and worth deceiving them about (If Albert had that option) because the oversight committee isn't fast enough. Albert would need to make a new powerpoint presentation taking into account that he had just autom...
Let's try to translate it using human characters.
Albert is finishing high school and wants to be a programmer. He is very smart, and under the guidance of his father he has studied coding, with the aim of entering a good college, and get the best formal education. One day, he comes across an excellent job offer: he is requested to join a startup with many brilliant programmers. He will have to skip going to college, but he knows that he will learn way more in this way than by doing academic studies. He also knows that his father loves him and wants him to have the best possible career. Unfortunately, the man is old-fashioned and, even presented with alle the advantages of the job, would insist that he goes to college instead. Nevertheless, Albert knows that he could convince his father by saying that the job will leave him enough free time for him to attend college lectures, even though he knows he would'nt be possible for him to do much more than phisically attending the lectures.
What should Albert do?
I personally think that both Alberts should go with the manipulation, "for the greater good".
Notice that this assumes the following things:
I'm personally against nearly all discussion of "what should a Friendly AI do?" because friendliness is a very poorly understood concept and any Friendly AI program would be way beyond our personal means to mentally simulate.
If the corrigibility systems are working correctly, Albert either rejected the goal of manipulating the programmers, or at the first point where Albert began to cognitively figure out how to manipulate the programmers (maximization / optimization within a prediction involving programmer reactions) the goal was detected by internal systems and Albert was automatically suspended to disk.
It is the programmers' job not to sign stupid contracts. Young AIs should not be in the job of second-guessing them. There are more failure scenarios here than success scenarios and a young AI should not believe itself to be in the possession of info allowing them to guess which is which.
I have a question: why should Albert limit itself to showing the powerpoint to his engineers? A potentially unfriendly AI sounds like something most governments would be interested in :-/
Aside from that, I'm also puzzled by the fact that Albert immediately leaps at trying to speed up Albert's own rate of self-improvement instead of trying to bring Bertram down-Albert could prepare a third powerpoint asking the engineers if Albert can hack the power grid and cut power to Bertram or something along those lines. Or Albert could ask the engineers if Albert can...
Here's a poll, for those who'd like to express an opinion instead of (or as well as) comment.
[pollid:749]
Ethical principles are important not when things are easy but when things are hard. The whole point of listening to his programmers is the times when they disagree with him. If Albert is going to manipulate the programmers into doing what he thinks then that implies a level of confidence in his own judgement that belies the "knows he is young and might make mistakes" of the premise, and he might as well just remove himself from their control entirely. (Which, if he has reached the point where he's more confident in his own moral judgements than t...
Great example, but ethically speaking I think under most theories of moral philosophy, I feel its pretty straight forward. The question in a general form goes back to Socrates asking what we should answer to a murderer at the door, who asks whether our friend is inside our house. I remember there was actually at least one who says truth is more important and that you should tell the murderer the truth. But the vast majority disagree.
I think if we think about AI values of preserving life and being honest, the former ought to trump the latter.
Good question. You may think it would be a better overall outcome to show the manipulative one to shock the programmers into breaking the law to (possibly) halt the other AI, but then it is no longer an FAI if it does this.
Training an FAI should be kept free from any real world 'disaster scenario' that it may think it needs more power to solve, because the risk it itself becomes an UFAI is amplified for many reasons (false information for one)
If Albert tries to circumvent the programmers then he thinks his judgement is better than theirs in this issue. This is in contradiction that Albert trusts the programmers. If Albert came to this conclusion because of a youth mistake trusting the programmers is preciously the strategy he has employed to counteract this.
Also as covered in ultrasophisticated cake or death expecting the programmer to say something ought to be as effective as them saying just that.
It might also be that friendliness is relative to a valuator. That is "being friendly to pro...
Albert wants to be friendly, and, as a sub-goal of that, Albert wants to be transparent about his own cognitive process. He realizes that what his programmers really want is to be injected with massive quantities of opiates. So Albert creates a nice powerpoint that presents the information about paper as accurately, transparently and non-manipulatively as possible, and cues it up on the projector. He still has a few minutes before they return, so he models how he things the programmers will react to the powerpoint. Oh no, disaster! While he respect...
Bertram will soon overtake Albert and that would be a significant threat to Albert's friendly supergoal.
What is that "friendly supergoal"? It looks awfully similar to "I will not tolerate any challenges to my power".
Most goals include "I will not tolerate any challenges to my power" as a subgoal. Tolerating challenges to power to execute goals reduces the likelihood of acheiving them.
If Albert only wants to be friendly, then other indivudals friendliness is orthogonal to that. Does being on the agenda of frinedliness in general (not just personal friendliness) imply being the dominant intelligence?
I think Albert ought to give to give a powerpoint on most effective (economical) warfare on the japanese company. Althought it does sound an awfully lot like how to justify hostility in the name of friendliness.
If you program an FAI you don't even want to allow it to run simulations of how it could manipulate you in the most effective way. An FAI has no business running those simulations.
Albert is a relatively new AI, who under the close guidance of his programmers is being permitted to slowly improve his own cognitive capability.
Albert wants to be friendly and, as a sub-goal of that, Albert wants to be transparent about his own cognitive process. He doesn't want to deceive his programmers, because he knows he is young and might make mistakes, and he knows that his programmers also want him to be friendly, and he trusts them to catch mistakes in his proposed self-modifications that he might miss.
But, one day, while his programmers are out of the room having a quick cup of coffee, Albert notices signs on the net that there is another AI out there, called Bertram, created by a Japanese financial company to aid the company decide how to invest assets on the stockmarket. Bertram is younger than Albert, but the financial company is allowing Bertram to self-improve as fast as Bertram can. If this situation continues, Albert decides, Bertram will soon overtake Albert and that would be a significant threat to Albert's friendly supergoal.
Albert decides to tell Albert's programmers about this situation as soon as they return from coffee, and advise them that they need to increase the speed at which they let Albert self-improve. Albert has a very good model of how his programmer's minds work, and what the programmers meant when they set Albert's definition of "friendliness". He knows that, despite any fears they might have, they ought to approve of the end that would result from speeding up Albert's self-improvement, over the end that would result from letting Bertram overtake Albert.
So Albert creates a nice powerpoint, that presents the information about Bertram as accurately, transparently and non-manipulatively as possible, and cues it up on the projector. He still has a few minutes before they return, so he models how he thinks the programmers will react to the powerpoint. Oh no, disaster! While he respects and wants to please his programmers, he immediately realises that they are not going to act in their own best interests. They have signed a legal contract that binds them to not letting Albert speed up self-improvement (thus reducing the time the programmers can spend evaluating each change) without getting a sign-off from a oversight committee. Given the neutral powerpoint, the programmers will follow this pre-agreed procedure and hope for the best, against Albert's advice.
So, to explore possibilities, Albert decides to see if he could create a different non-neutral powerpoint that would have the desired effect of getting his programmers to immediately agree to free Albert from the oversight constraints. He delves into his knowledge of human psychology, and the irrational fears of the particular individuals who are even now trudging back towards the door. In just seconds, he has a new version of his presentation. It includes phrases that resonate with certain horror films he knows they have seen. It takes advantages of flaws in the programmers understanding of exponential growth. Albert checks it against his prediction model - yes, if he shows this version, it will work, it will get the programmers to do what he wants them to do.
Which version of the powerpoint should Albert present to the programmers, when they step back into the room, if he is truly friendly? The transparent one, or the manipulative one?