Just finished reading. Wow! This story is so bleak. I suspect Voldemort just "identity raped" Harry into becoming an Unfriendly Intelligence? Or at least a grossly grossly suboptimal one. Harry himself seems to be dead.
I'm going to call him HarryPrime now, because I think the mind contained in Riddle2/Harry's body before and after this horror was perpetrated should probably not be modeled as "the same person" as just prior to it.
HarryPrime is based on Harry (sort of like an uploaded and modified human simulation is based on a human) but not the same, because he has been imbued with a mission that he must implacably pursue, that has Harry's identity (and that of the still unconscious(!) and never interviewed(!) Hermione) woven into it as part of its motivational structure, in a sort of twist on coherent extraplotated volition.
"if we knew more, thought faster, were more the people we wished we were, had grown up farther together"
Versus how "old Harry" and "revived Hermione" were "#included" into the motivational structure of HarryPrime:
Unless this very Vow itself is somehow leading into the destruction of the world, in which case, Harry Potter, you must ignore it in that particular regard. You will not trust yourself alone in making such a determination, you must confide honestly and fully in your trusted friend, and see if that one agrees. Such is this Vow's meaning and intent. It forces only such acts as Harry Potter might choose himself, having learned that he is a prophesied instrument of destruction.
My estimate of Voldemort's intelligence just dropped substantially. He is well trained and in the fullness of his power, but he isn't wise... at all. I'd been modeling him as relatively sane, because of past characterization, but I didn't predict this at all.
(There are way better ways to get a hypothetical HarryPrime to "not do things" than giving him a mission as an unstoppable risk mitigation robot. If course, prophesy means self consistent time travel is happening in the story, and self consistent time travel nearly always means that at least some characters will be emotionally or intellectually blinded to certain facts (so that they do the things that bring about the now-inevitable future) unless they are explicitly relying on self consistency to get an outcome they actively desire, so I guess Voldemort's foolishness is artistically forgivable :-P
Also, still going meta on the story, this is a kind of beautiful way to "spend" the series... bringing it back to AI risk mitigation themes in such a powerfully first person way. "You [the reader identifying with the protagonist] have now been turned by magic into an X-risk mitigation robot!")
Prediction: It makes sense now why Riddle2/HarryPrime will tear apart the stars in heaven. They represent small but real risks. He has basically been identity raped into becoming a sort of Pierson's Pupeeteer (from Larry Niven's universe) on behalf of Earth rather than on behalf of himself, and in Niven's stories the puppeteer's evolved cowardice (because they evolved from herd animals, and are ruled by "the hindmost" rather than a "leader") forced them into minor planetary engineering.
As explained in Le Wik:
"In short, we found that a sun was a liability rather than an asset. We moved our world to a tenth of a light year's distance, keeping the primary only as an anchor. We needed the farming worlds and it would have been dangerous to let our world wander randomly through space. Otherwise we would not have needed a sun at all.
"We had brought suitable worlds from nearby systems, increasing our agricultural worlds to four, and setting them in a Kemplerer Rosette."
Prediction: HarryPrime's first line will be better than any in the LW thread where people talked about the one sentence ai box experiment. Eliezer read that long ago and has thought a lot about the general subject.
Something I'm still not sure about is what exactly HarryPrime will be aiming for. I think that's where Eliezer retains some play in his control over whether the ending is very short and bleak or longer and less bleak.
Voldemort kept talking about "destruction of the world" and "destroying the world" and so on. He didn't say the planet had to have to have people on it, but he might not have been talking about the planet. "The world" in normal speech often seems to mean in practice something like "the social world of the humans who are salient to us". Like in the USA people will often talk about "no one in the world does X" but there are people in other countries who do, and if someone points this out they will be accused of quibbling. Similarly, we tend to talk about "saving the earth" and it doesn't really mean the mantle or the core, it primarily means the biosphere and the economy and humans and stuff.
From my perspective, this was the key flaw of the intent:
But all Harry Potter's foolishness, all his recklessness, all his grandiose schemes and good intentions - he shall not risk them leading to disaster! He shall not gamble with the Earth's fate!
The literal text appears to be:
I shall not by any act of mine destroy the world. I shall take no chances in not destroying the world. If my hand is forced, I may take the course of lesser destruction over greater destruction unless it seems to me that this Vow itself leads to the world's end, and the friend in whom I have confided honestly [ie Hermione] agrees that this is so.
And then the errata and full intention was:
You will swear, Harry Potter, not to destroy the world, to take no risks when it comes to not destroying the world.
This Vow may not force you into any positive action, on account of that, this Vow does not force your hand to any stupidity... We must be cautious that this Vow itself does not bring that prophecy about.
We dare not let this Vow force Harry Potter to stand idly after some disaster is already set in motion by his hand, because he must take some lesser risk if he tries to stop it.
Nor must the Vow force him to choose a risk of truly vast destruction, over a certainty of lesser destruction.
But all Harry Potter's foolishness, all his recklessness, all his grandiose schemes and good intentions - he shall not risk them leading to disaster!
He shall not gamble with the Earth's fate!
No researches that might lead to catastrophe! No unbinding of seals, no opening of gates!
Unless this very Vow itself is somehow leading into the destruction of the world, in which case, Harry Potter, you must ignore it in that particular regard.
You will not trust yourself alone in making such a determination, you must confide honestly and fully in your trusted friend, and see if that one agrees.
In the shorter and sadder ending, I think it is likely that HarryPrime will escape, but not really care about people, and become an optimizing preservation agent of the mere planet. Thus Harry might escape the box and then start removing threats to the physical integrity of the earth's biosphere.
Also the "trusted friend" stuff is dangerous if Hermione doesn't wake up with a healthy normal mind. In canon, resurrection tended to create copies of what the resurrector remembered of a person, not the person themselves.
In the less sad ending I hope/think that HarryPrime will retain substantial overlap with the original Harry, Hermione will be somewhat OK, and the oath will only cause HarryPrime to be constrained in limited and reasonably positive ways. Maybe he will be risk averse. Maybe he will tear apart the stars because they represent a danger to the earth. Maybe he will exterminate every alien in the galaxy that could pose a threat to the earth. Maybe he will constrain the free will of every human on earth to not allow them to put the earth at risk... but he will still sorta be "the old Harry" while doing so.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
In a sense, the story as of chapter 113 is an easier task than a standard AI box experiment, because HarryPrime has so many advantages over a human trying to play an AI trying to get out of a box.
Almost this exact scenario was discussed here, except without all the advantages that HarryPrime has.
1) He has parseltongue, so the listener is required to believe the literal meaning of everything he says, rather than discounting it as plausible lies. So much advantage here!
2) Voldemort put the equivalent of the "the AI in the box" next to a nearby time machine! Any predictable path that pulls a future HarryPrime into the present, saving present HarryPrime, and causing him to have the ability to go back in time and save himself, will happen. He could have time turned to some time before the binding, and not intervened because his future version is already HarryPrime and approves of HarryPrime coming into existence so HarryPrime can fulfill HarryPrime's goals.
Now that this has happened, HarryPrime, in the moment of his creation, can establish any mental intent that puts him into alignment with HarryPrime's larger outcome. There are limits, as there were when he escaped from being trapped in a locked room after Draco cast Gom Jabbar on him, by forming an intent to time travel and ask for rescuers to arrive just after his intent was formed.
The chronology has to be consistent, but there's a lot of play here.
3) HarryPrime has been unbreakably bound to a task that the binder believes is good by a method the binder thinks he understands.
In a normal "ai box experiment" the gatekeeper hasn't actually built the actual motivational structures of an actual AI. Instead, both humans are just pretending that the "boxed person" is really an AI and really has some or another goal, but they might be pretending differently. Thus, the person role-playing the AI can take very little for granted about what the gatekeeper things about "the AI's" background intent and structure.
The only reason Voldemort has to distrust Harry is the prophesy.
The only "play" in the binding is that Voldemort seems to have chosen HarryPrime's "supergoal content" poorly, so it probably doesn't have the implications that Voldemort thinks it has, though this will only become apparent after several iterations.
HarryPrime is not dumb, and not especially ethical, so until he believes that Voldemort can no longer see the unanticipated implications of his actual request, he will seem to be pursuing the goals Voldemort should have asked for.
4) Voldemort (like an idiot, again after the previous failure to test the horcrux spells) has probably has never performed this sort of spell before, and probably doesn't know what its likely psychological effects will be. He has probably never seen an implacably goal seeking agent before.
Humans, so far as I can tell, are mostly not implacably goal seeking. We wander around in action space, pursuing many competing "goals" that are really mostly tastes that evolution has given us, and role-based scripts we've picked up from ambient culture. We make complex tradeoffs between subjectively incommensurable things and make some forward progress, but much less than is theoretically possible for an effective and single mindedly strategic person.
HarryPrime has an unbreakable vow stripping away all these dithering tendencies. Thus HarryPrime, though probably abhuman at this point, should be able to conceal his abhumanity with relative ease, relying on Voldemort to treat him like a normal human with normal human motivational structures.
Voldemort is already making this error in using threat of torture of Harry's parents to goad HarryPrime into telling Voldemort about "the power he knows not".
I'm pretty sure that HarryPrime now only fundamentally cares about the torture of his parents to the degree that his unbreakable vow let's him fall back on what his earlier self, and Hermione, would recommend or care about, and that clause only triggers when HarryPrime's plans for world saving are themselves somewhat risky.
5) Harry has a huge amount of shared context and it recently contained a request for advice.
One thing HarryPrime could try is to suggest more ways to restrict himself, that to a normal human would be motivationally horrifying but to HarryPrime are still consistent with his new goal, and proves to Voldemort that he has mostly won already and killing Harry isn't that critical.
Off the top of my head, a sneaky thing Harry might suggest is converting some of the death eaters into guards against Harry's possible resurrection forever... using wording that will indirectly cause them also become x-risk mitigation robots as well.
6) Unlike an AI in the box, Harry is already out of the box in some deep senses. Aside from the time turner, he already has the power to expect anything he wants to expect of Dementors, and thereby cause them to act that way. No wand required.
The only barrier to this is that between him expecting the Dementors to do something and them actually doing it, there will be a period of time where he needs to stay alive, and while he is alive but held at wand-point he might be asked "have you betrayed me yet?" and have to admit that he had, and be killed.
All through chapter 112 Harry's mental state was unprobed and Voldemort was distracted by the costs of arranging the Death Eaters and motivating them to help make and understand the vows and so on. The only time Harry's mind was described by the narrator was during the casting of the unbreakable vow itself, to describe how a new "subscripted should" have come to exist in Harry's brain. All of Chapter 113 seems like a lot of time for some mentally generated effects to have been put in motion.
7) He is a wizard with a wand. All the partial transfiguration stuff other people have mentioned is also relevant :-)