Comment author: JenniferRM 01 March 2015 05:28:37AM *  19 points [-]

In a sense, the story as of chapter 113 is an easier task than a standard AI box experiment, because HarryPrime has so many advantages over a human trying to play an AI trying to get out of a box.

Almost this exact scenario was discussed here, except without all the advantages that HarryPrime has.

1) He has parseltongue, so the listener is required to believe the literal meaning of everything he says, rather than discounting it as plausible lies. So much advantage here!

2) Voldemort put the equivalent of the "the AI in the box" next to a nearby time machine! Any predictable path that pulls a future HarryPrime into the present, saving present HarryPrime, and causing him to have the ability to go back in time and save himself, will happen. He could have time turned to some time before the binding, and not intervened because his future version is already HarryPrime and approves of HarryPrime coming into existence so HarryPrime can fulfill HarryPrime's goals.

Now that this has happened, HarryPrime, in the moment of his creation, can establish any mental intent that puts him into alignment with HarryPrime's larger outcome. There are limits, as there were when he escaped from being trapped in a locked room after Draco cast Gom Jabbar on him, by forming an intent to time travel and ask for rescuers to arrive just after his intent was formed.

The chronology has to be consistent, but there's a lot of play here.

3) HarryPrime has been unbreakably bound to a task that the binder believes is good by a method the binder thinks he understands.

In a normal "ai box experiment" the gatekeeper hasn't actually built the actual motivational structures of an actual AI. Instead, both humans are just pretending that the "boxed person" is really an AI and really has some or another goal, but they might be pretending differently. Thus, the person role-playing the AI can take very little for granted about what the gatekeeper things about "the AI's" background intent and structure.

The only reason Voldemort has to distrust Harry is the prophesy.

The only "play" in the binding is that Voldemort seems to have chosen HarryPrime's "supergoal content" poorly, so it probably doesn't have the implications that Voldemort thinks it has, though this will only become apparent after several iterations.

HarryPrime is not dumb, and not especially ethical, so until he believes that Voldemort can no longer see the unanticipated implications of his actual request, he will seem to be pursuing the goals Voldemort should have asked for.

4) Voldemort (like an idiot, again after the previous failure to test the horcrux spells) has probably has never performed this sort of spell before, and probably doesn't know what its likely psychological effects will be. He has probably never seen an implacably goal seeking agent before.

Humans, so far as I can tell, are mostly not implacably goal seeking. We wander around in action space, pursuing many competing "goals" that are really mostly tastes that evolution has given us, and role-based scripts we've picked up from ambient culture. We make complex tradeoffs between subjectively incommensurable things and make some forward progress, but much less than is theoretically possible for an effective and single mindedly strategic person.

HarryPrime has an unbreakable vow stripping away all these dithering tendencies. Thus HarryPrime, though probably abhuman at this point, should be able to conceal his abhumanity with relative ease, relying on Voldemort to treat him like a normal human with normal human motivational structures.

Voldemort is already making this error in using threat of torture of Harry's parents to goad HarryPrime into telling Voldemort about "the power he knows not".

I'm pretty sure that HarryPrime now only fundamentally cares about the torture of his parents to the degree that his unbreakable vow let's him fall back on what his earlier self, and Hermione, would recommend or care about, and that clause only triggers when HarryPrime's plans for world saving are themselves somewhat risky.

5) Harry has a huge amount of shared context and it recently contained a request for advice.

If you can think of any trick that I have missed in being sure that Harry Potter's threat is ended, speak now and I shall reward you handsomely... speak now, in Merlin's name!"

One thing HarryPrime could try is to suggest more ways to restrict himself, that to a normal human would be motivationally horrifying but to HarryPrime are still consistent with his new goal, and proves to Voldemort that he has mostly won already and killing Harry isn't that critical.

Off the top of my head, a sneaky thing Harry might suggest is converting some of the death eaters into guards against Harry's possible resurrection forever... using wording that will indirectly cause them also become x-risk mitigation robots as well.

6) Unlike an AI in the box, Harry is already out of the box in some deep senses. Aside from the time turner, he already has the power to expect anything he wants to expect of Dementors, and thereby cause them to act that way. No wand required.

The only barrier to this is that between him expecting the Dementors to do something and them actually doing it, there will be a period of time where he needs to stay alive, and while he is alive but held at wand-point he might be asked "have you betrayed me yet?" and have to admit that he had, and be killed.

All through chapter 112 Harry's mental state was unprobed and Voldemort was distracted by the costs of arranging the Death Eaters and motivating them to help make and understand the vows and so on. The only time Harry's mind was described by the narrator was during the casting of the unbreakable vow itself, to describe how a new "subscripted should" have come to exist in Harry's brain. All of Chapter 113 seems like a lot of time for some mentally generated effects to have been put in motion.

7) He is a wizard with a wand. All the partial transfiguration stuff other people have mentioned is also relevant :-)

Comment author: JenniferRM 01 March 2015 05:11:18AM *  15 points [-]

Just finished reading. Wow! This story is so bleak. I suspect Voldemort just "identity raped" Harry into becoming an Unfriendly Intelligence? Or at least a grossly grossly suboptimal one. Harry himself seems to be dead.

I'm going to call him HarryPrime now, because I think the mind contained in Riddle2/Harry's body before and after this horror was perpetrated should probably not be modeled as "the same person" as just prior to it.

HarryPrime is based on Harry (sort of like an uploaded and modified human simulation is based on a human) but not the same, because he has been imbued with a mission that he must implacably pursue, that has Harry's identity (and that of the still unconscious(!) and never interviewed(!) Hermione) woven into it as part of its motivational structure, in a sort of twist on coherent extraplotated volition.

"if we knew more, thought faster, were more the people we wished we were, had grown up farther together"

Versus how "old Harry" and "revived Hermione" were "#included" into the motivational structure of HarryPrime:

Unless this very Vow itself is somehow leading into the destruction of the world, in which case, Harry Potter, you must ignore it in that particular regard. You will not trust yourself alone in making such a determination, you must confide honestly and fully in your trusted friend, and see if that one agrees. Such is this Vow's meaning and intent. It forces only such acts as Harry Potter might choose himself, having learned that he is a prophesied instrument of destruction.

My estimate of Voldemort's intelligence just dropped substantially. He is well trained and in the fullness of his power, but he isn't wise... at all. I'd been modeling him as relatively sane, because of past characterization, but I didn't predict this at all.

(There are way better ways to get a hypothetical HarryPrime to "not do things" than giving him a mission as an unstoppable risk mitigation robot. If course, prophesy means self consistent time travel is happening in the story, and self consistent time travel nearly always means that at least some characters will be emotionally or intellectually blinded to certain facts (so that they do the things that bring about the now-inevitable future) unless they are explicitly relying on self consistency to get an outcome they actively desire, so I guess Voldemort's foolishness is artistically forgivable :-P

Also, still going meta on the story, this is a kind of beautiful way to "spend" the series... bringing it back to AI risk mitigation themes in such a powerfully first person way. "You [the reader identifying with the protagonist] have now been turned by magic into an X-risk mitigation robot!")

Prediction: It makes sense now why Riddle2/HarryPrime will tear apart the stars in heaven. They represent small but real risks. He has basically been identity raped into becoming a sort of Pierson's Pupeeteer (from Larry Niven's universe) on behalf of Earth rather than on behalf of himself, and in Niven's stories the puppeteer's evolved cowardice (because they evolved from herd animals, and are ruled by "the hindmost" rather than a "leader") forced them into minor planetary engineering.

As explained in Le Wik:

"In short, we found that a sun was a liability rather than an asset. We moved our world to a tenth of a light year's distance, keeping the primary only as an anchor. We needed the farming worlds and it would have been dangerous to let our world wander randomly through space. Otherwise we would not have needed a sun at all.

"We had brought suitable worlds from nearby systems, increasing our agricultural worlds to four, and setting them in a Kemplerer Rosette."

Prediction: HarryPrime's first line will be better than any in the LW thread where people talked about the one sentence ai box experiment. Eliezer read that long ago and has thought a lot about the general subject.


Something I'm still not sure about is what exactly HarryPrime will be aiming for. I think that's where Eliezer retains some play in his control over whether the ending is very short and bleak or longer and less bleak.

Voldemort kept talking about "destruction of the world" and "destroying the world" and so on. He didn't say the planet had to have to have people on it, but he might not have been talking about the planet. "The world" in normal speech often seems to mean in practice something like "the social world of the humans who are salient to us". Like in the USA people will often talk about "no one in the world does X" but there are people in other countries who do, and if someone points this out they will be accused of quibbling. Similarly, we tend to talk about "saving the earth" and it doesn't really mean the mantle or the core, it primarily means the biosphere and the economy and humans and stuff.

From my perspective, this was the key flaw of the intent:

But all Harry Potter's foolishness, all his recklessness, all his grandiose schemes and good intentions - he shall not risk them leading to disaster! He shall not gamble with the Earth's fate!

The literal text appears to be:

I shall not by any act of mine destroy the world. I shall take no chances in not destroying the world. If my hand is forced, I may take the course of lesser destruction over greater destruction unless it seems to me that this Vow itself leads to the world's end, and the friend in whom I have confided honestly [ie Hermione] agrees that this is so.

And then the errata and full intention was:

You will swear, Harry Potter, not to destroy the world, to take no risks when it comes to not destroying the world.

This Vow may not force you into any positive action, on account of that, this Vow does not force your hand to any stupidity... We must be cautious that this Vow itself does not bring that prophecy about.

We dare not let this Vow force Harry Potter to stand idly after some disaster is already set in motion by his hand, because he must take some lesser risk if he tries to stop it.

Nor must the Vow force him to choose a risk of truly vast destruction, over a certainty of lesser destruction.

But all Harry Potter's foolishness, all his recklessness, all his grandiose schemes and good intentions - he shall not risk them leading to disaster!

He shall not gamble with the Earth's fate!

No researches that might lead to catastrophe! No unbinding of seals, no opening of gates!

Unless this very Vow itself is somehow leading into the destruction of the world, in which case, Harry Potter, you must ignore it in that particular regard.

You will not trust yourself alone in making such a determination, you must confide honestly and fully in your trusted friend, and see if that one agrees.

In the shorter and sadder ending, I think it is likely that HarryPrime will escape, but not really care about people, and become an optimizing preservation agent of the mere planet. Thus Harry might escape the box and then start removing threats to the physical integrity of the earth's biosphere.

Also the "trusted friend" stuff is dangerous if Hermione doesn't wake up with a healthy normal mind. In canon, resurrection tended to create copies of what the resurrector remembered of a person, not the person themselves.

In the less sad ending I hope/think that HarryPrime will retain substantial overlap with the original Harry, Hermione will be somewhat OK, and the oath will only cause HarryPrime to be constrained in limited and reasonably positive ways. Maybe he will be risk averse. Maybe he will tear apart the stars because they represent a danger to the earth. Maybe he will exterminate every alien in the galaxy that could pose a threat to the earth. Maybe he will constrain the free will of every human on earth to not allow them to put the earth at risk... but he will still sorta be "the old Harry" while doing so.

Comment author: JenniferRM 27 February 2015 10:36:51AM 7 points [-]

Two factors keep revolving in my head.

1) Riddle1/Quirrellmort/BadVoldemort is basically the only "existential risk activist" in the story at this point. Handling the big risks responsibly so that his immortal self would have a world worth living in forever was apparently his deep motivation for taking over Magical Britain in the first place, and then it turned out to be easier than expected. Eliezer probably doesn't agree with Riddle1's tactics or other values, but it seems like this aspect of him has to come out well by the end of the story for it to do the moral and educational work that Eliezer probably intends.

2) Riddle1 probably thinks that the prophecy makes Riddle2/Harry/GoodVoldemort into the number one existential risk to try to mitigate, and he is probably wrong about this because Riddle1 doesn't know much about science or science fiction, which are my leading candidates for "the power he knows not".

HE IS HERE. THE ONE WHO WILL TEAR APART THE VERY STARS IN HEAVEN. HE IS HERE. HE IS THE END OF THE WORLD.

The stars aren't sacred. They are fuel and construction material. Tearing them apart (under controlled conditions) and using them for productive purposes is totally part of how the future will go if humans make it off of this planet and start acting like a proper post-scarcity civilization from science fiction.

Presumably "he who tears the stars" is Riddle2/Harry/GoodVoldemort, but whoever does it presumably has a reason.

Many chapters ago my leading theory was that Hermione was close to information theoretically dead (brain ischemia being a significant problem within relatively short time periods and her body had hours before Harry got to it), and her body could be brought back animated by a plausible reconstruction of her mind built from external third party evidence sources... but this could produce a sad simulacrum or a high quality person depending on details.

Hermione not having woken up yet leaves the "sad simulacrum" option in play still :-/

Under this model, Harry could have had a long term plan to do the reconstruction very well, by using star sized computers that use every atom on the earth as part of the evidence base. There are lots of other reasons for doing something along these lines, like all the other minds that it might be possible to reconstruct and re-instantiate by the same method, which would flow with the anti-deathist themes.

I'm not strongly committed to this precise theory, because magic appears to make conservation of energy violations possible, and might allow effectively infinite computations to occur without using the stars to power them.

But still there are magical conservation laws it appears, as with "Dark" sacrifice costs and potion making. Given that Harry might be able to partially transfigure spacetime itself via an insight based on Julian Barbour's "timeless physics", it seems like he might be in a position to sacrifice and manipulate all kinds of things in clever ways and thereby not have to literally use hydrogen for fuel like a savage muggle... but it might still end up doing something to the stars?

One latent possibility that occurred to me is that Riddle2/Harry/GoodVoldemort might end up being "killed" and have the horcrux system work a bit weird and so that Harry ends up on the voyager probe... which he might have more luck controlling than Voldemort did during his first period stuck there. I think Harry might end up dead dead if he was discorporated, because he was created by Horcrux V1.0, and the Horcrux V2.0 network might only save Riddle1 rather than just saving any and all Riddle copies... but it seems like there is play in what might happen based on the evidence we've seen?

If Harry ends up on the Voyager probe, it puts him quite a bit closer to "the stars". It gives him time to think and "spaceship priming" might suggest an incredible array of options... Like transfiguring non-critical pieces of the probe into anti-matter or nukes, and using them to power exotic spaceship drives.

This particular scenario seems low probability (because Harry needs his wand to do transfigurations still and probably won't have that on the probe) but it shows how Harry already has crazily powerful science oriented options if he aims at short term profit taking instead of playing along with his student role and trying to level up in all the areas of magic that powerful wizards are expected to work on through years of school in order to become well rounded.

Of course, there's the 37 dark wizards aiming wands at Harry at the cliffhanger ending. I'm not sure how that will work out, but probably Riddle1 has some plans :-)

Comment author: Username 05 February 2015 03:14:04AM -9 points [-]

I'm sorry to be so blunt, but why do you keep posting to Main material that clearly belongs in Discussion even after having been told not to do so repeatedly in the past?

Comment author: JenniferRM 08 February 2015 10:40:26PM 1 point [-]

Troll tax gladly paid... (and there being a troll tax at all is something I wish were otherwise).

I wish Phil had more leeway. One reason my visits to LW have been decreasing is that it has few people saying actually interesting things and lots of people who just quibble with details. Phil is someone I recognize whose content I seek out based on his personal reputation with myself for saying insightful things grounded in deep experience. If he posted more and more regularly, treating Main more like his own personal blog, there is a non-trivial chance I'd come back more often just to read it.

Comment author: JenniferRM 09 January 2015 06:49:51AM *  3 points [-]

I liked the post, partly the mouse/cat/dog sentence but especially this:

He took a two-page argument about things he knew little about, spread it across 200 pages, and filled the gaps with tangential statements of impressive rigor and thoroughness on things he was expert in.

Penrose did roughly the same thing in The Emperor's New Mind. I mentioned this on OB a while back:

If you read his book he gives a fantastic pop science explanation of all kinds of subjects around computing, coding, and quantum mechanics and so on, up to the inclusion of a crowning moment of awesome when he gives an actual universal turing machine, bit for bit, that is his own design as far as I remember.

After hundreds of pages of this he gives about two pages of hand waving argument nominally related to Goedel's Incompleteness Theorem that completely drops the ball and is just gibberish when it comes to proving that human consciousness is uncomputable. He argues that since mathematicians can all agree about Goedel's Incompleteness Theorem, they must be doing something more than merely mechanically formal and thus their consciousness must be something outside the powers of a turing machine. The pages and page of quantum backstory is ignored -- I think its just there in an "argument by putting impressively difficult material next to your actual claims".

Comment author: JenniferRM 03 January 2015 09:08:15AM *  14 points [-]

This seems likely to be controversial but I want to put forward "sales". Every so often I wonder if I should spend several months in a job like selling cars, where things are presumably really stark, but so far I've generally ended up doing something more kosher and traditionally "geeky" like data science.

However, before I knew a marketable programming language I had a two separate "terrible college jobs" that polished a lot of stuff pretty fast: (1) signature gathering for ballot measures and (2) door to door campaigning for an environmental group.

Signature gathering was way way better than door-to-door, both financially and educationally. Part of that is probably simply because there were hundreds of opportunities per hour at peak periods, but part of that might have been that I was hired by a guy who traveled around doing it full time, and so he had spent longer slower cycles leveling up on training people to train people to gather signatures :-P

Comment author: JenniferRM 03 January 2015 08:30:49AM *  2 points [-]

Ilya is awesome. He keeps breaking benchmarks in a way that causes me to predict that he will keep breaking more benchmarks in the future...

Winning competitions in image recognition is pretty similar to Go (same basic neural net architecture), but he's also been cooking up stuff in natural language understanding and translation with "deep" LSTMs.

Comment author: [deleted] 11 December 2014 03:57:14AM 6 points [-]

A useful word here is "supererogation", but this still implies that there's a baseline level of duty, which itself implies that it's possible even in principle to calculate a baseline level of duty.

There may be cultural reasons for the absence of the concept: some Catholics have said that Protestantism did away with supererogation entirely. My impression is that that's a one-line summary of something much more complex (though possibly with potential toward the realization of the one-line summary), but I don't know much about it.

Comment author: JenniferRM 12 December 2014 06:27:54AM *  7 points [-]

Supererogation was part of the moral framework that justified indulgences. The idea was that the saints and the church did lots of stuff that was above and beyond the necessary amounts of good (and God presumably has infinitely deep pockets if you're allowed to tap Him for extra), and so they had "credit left over" that could be exchanged for money from rich sinners.

The protestants generally seem to have considered indulgences to be part of a repugnant market and in some cases made explicit that the related concept of supererogation itself was a problem.

In Mary at the Foot of the Cross 8: Coredemtion as Key to a Correct Understanding of Redemption on page 389 there is a quick summary of a Lutheran position, for example:

The greatest Lutheran reason for a rejection of the notion of works of supererogation is the insistence that even the justified, moved by the Holy Spirit, cannot obey all the rigors of divine law so as to merit eternal life. If the justified cannot obey these rigors, much less can he exceed them so as to offer his supererogatory merits to others in expiation for their sins.

The setting of the "zero point" might in some sense be arbitrary... a matter of mere framing. You could frame it as people already all being great, but with the option to be better. You could frame it as having some natural zero around the point of not actively hurting people and any minor charity counting as a bonus. In theory you could frame it as everyone being terrible monsters with a minor ability to make up a tiny part of their inevitable moral debt. If it is really "just framing" then presumably we could fall back to sociological/psychological empiricism, and see which framing leads to the best outcomes for individuals and society.

On the other hand, the location of the zero level can be absolutely critical if we're trying to integrate over a function from now to infinity and maximize the area under the curve. SisterY's essay on suicide and "truncated utility functions" relies on "being dead" having precisely zero value for an individual, and some ways of being alive having a negative value... in these cases the model suggests that suicide and/or risk taking can make a weird kind of selfish sense.

If you loop back around to the indulgence angle, one reading might be that if someone sins then they are no longer perfectly right with their local community. In theory, they could submit to a little extra hazing to prove that they care about the community despite transgressing its norms. In this case, the natural zero point might be "the point at which they are on the edge of being ostracized". If you push on that, the next place to look for justifications would focus on how ostracism and unpersoning works, and perhaps how it should work to optimize for whatever goals the community nominally or actually exists to achieve.

I have my own pet theories about how to find "natural zeros" in value systems, but this comment is already rather long :-P

I think my favorite insight from the concept of supererogation is the idea that carbon offsets are in some sense "environmental indulgences", which I find hilarious :-)

Comment author: Yvain 21 November 2014 08:02:34AM 20 points [-]

I agree with Toggle that this might not have been the best place for this question.

The Circle of Life goes like this. Somebody associates Less Wrong with neoreactionaries, even though there are like ten of them here total. They start discussing neoreaction here, or asking their questions for neoreactionaries here. The discussion is high profile and leads more people to associate Less Wrong with neoreactionaries. That causes more people to discuss it and ask questions here, which causes more people to associate us, and it ends with everybody certain that we're full of neoreactionaries, and that ends with bad people who want to hurt us putting "LESS WRONG IS A RACIST NEOREACTIONARY WEBSITE" in big bold letters over everything.

If you really want to discuss neoreaction, I'd suggest you do it in an Slate Star Codex open thread, since apparently I'm way too tarnished by association with them to ever escape. Or you can go to a Xenosystems open thread and get it straight from the horse's mouth.

Comment author: JenniferRM 23 November 2014 09:33:14PM *  0 points [-]

I believe that the parent and grandparent should be the first two comments someone reads when visiting this article on the "Best" setting.

Here is the current open thread on Slate Star Codex if you want to vote with your feet to move NRx comments over there. I link so that Yvain doesn't have to :-)

Please do not upvote my comment here or comment in response if you agree. Instead, please vote on other comments to express agreement, so as to bring about the suggested outcome.

Comment author: [deleted] 22 November 2014 04:51:43PM *  -1 points [-]

It doesn't seem a valid response to me, since it doesn't explain why neoreactionaries actually think, why they think it, and how they justify realism about their own views (that is, why they think neoreaction is true for all rational humans and not just plausible to a small clique). It mostly just attacks "progressives".

Comment author: JenniferRM 23 November 2014 09:15:22PM *  5 points [-]

I have upvoted for asking good questions :-)

If it helps, I think maybe you are thinking of "neo-reactionaries" and "progressives" as being a local modern phenomena, perhaps even just happening in the comments of this article.

If you post a PDF in the thread with your own idiosyncratic ideals, that serves for you to describe what you mean and stand for and think is good, and functions as the "ground" of a debate that you're willing to defend.

On the other hand, nydrwacu is coming at this from the perspective of a deeply-read aspiring expert in the practicalities of political semiotics. I think, for example, that his reference to a capitalized "World Spirit" is a reference to Hegel's concept of a Weltgeist which was widely known in the past, and explicitly used as a concept under which to organize actual historically existing political factions. If you were "against the Weltgeist" it had a simultaneously factional and practical meaning that was necessarily related both to meta-ethical doctrines and to propaganda processes that bound factions into social machines with many real world consequences that can themselves be judged.

When you said "neoreaction has a severe problem talking to ethical naturalists in general" (presuming pointing with the word "neoreaction" to speakers in this thread as "neoreaction") nydrwacu responded by pointing to actual "neoreactionaries" (not "I'm not a neoreactionary but I read them sometimes" but full fledged ones) who are not LWers and not in this thread (like Roissy and the Hestia Society) who appear to have some grounding in "naturalistic ethics". However their naturalistic ethics are grounded in things other than something with historical continuity with the faction that used the Weltgeist in their rallying cries...

(Or at least that's what they claim... For myself, I think neoreactionaries are in some sense just "super-ultra-progressives" if their own theories are applied to them in ways they might object to.)

A deeper issue here might be that neo-reactionaires have explicit theories about political categorization processes themselves (how they work, when they disgree, how to use them, etc), and one of their categorization techniques is socio-political cladistics.

Thus, if you use a Weltgeist-like justification, and are clearly influenced by previous Weltgeist-using political thinkers, neoreactionaries will sometimes lump you cladistically as all being part of the same unfurling memetic-political process that they can read about in history books and try to do bayesian updates thereby.

This is itself a somewhat controversial orientation. It is politically essentializing and can cause people to feel insulted when the descriptive process is applied to them with results they don't like based on history and people they don't even know about... if they didn't put the word "Weltgeist" in their personal statement of beliefs how can they be held responsible for the actions and consequences of people who did?!

However, despite the shortcomings of cladistic analysis, you can see that operating at this level of abstraction might be appealing to a certain kind of smarty-pants. Also, it has at least the virtue of creating a pre-stated data-based solution to some games of reference class tennis that might otherwise happen in political debates.

View more: Prev | Next