cousin_it comments on The Sword of Good - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (292)
No matter how good a person the Lord may be, if he's human, I'd have tried to stop the spell.
Given how misrepresented the official story is supposed to be, the part about personally ruling the fabric of the World can be assumed to be twisted as well.
Nope, they didn't get that part wrong.
Look, you should know me well enough by now to know that I don't keep my stories on nice safe moral territory.
A happy ending here is not guaranteed. But think about this very carefully. Are you sure you'd have turned the Sword on Vhazhar? They don't have the same options we do.
He's going to be the emperor. He could implement Parliament, he could create jury trials. He could even put Dolf and Selena on trial for their crimes.
It's interesting that Hirou holds the world accountable to his own moral code, which assumes power corrupts. Then, at the last moment, he grants absolute power to Vhazhar. So in the middle of choosing to use our world's morality, which is built upon centuries of learning to doubt human nature, in the middle of that - Vhazhar's good intentions are so good that they justify granting him absolute power. Lesson not learned.
his own moral code, which assumes power corrupts
Hold on. How can a moral code say anything about questions of fact, such as whether or not power corrupts?
Because "corrupt" is a morally-loaded term.
It seems to me that "power corrupts" means "power changes goal content," and that's a purely factual claim.
It doesn't mean that. It means something more like "power changes the empowered's utility function in a way others deem immoral". (ETA simplified)
ETA: Just to make the point clearer, there are many things that change an individual's goal content but are not considered corrupting. For example, trying new foods will generally make you divert more effort to finding one kind of food (that you didn't know you liked). Having children of your own makes you more favorable to children in general. But we don't say, and people generally don't believe, "having children corrupts" or "trying new foods corrupts".
Okay, but that's still a factual claim underneath the moral one.
It's a bit of argumentum ad webcomicum, but http://www.agirlandherfed.com/comic/?375 is not something I find particularly implausible. There was Marcus Aurelius.
Also: it seems like a really poor plan, in the long term, for the fate of the entire plane to rest on the sanity of one dude. If Hirou kept the sword, he could maybe try to work with the wizards -- ask them to spend one day per week healing people, make sure the crops do okay, etc. Things maybe wouldn't be perfect, but at least he wouldn't be running the risk of everybody-dies.
Link's broken. Is this guess the page in question?
And then there are those of us who take moral claims to be factual claims.
Okay, but in any case, regarding the issue at hand, "power corrupts" is not a purely factual claim. (And I thought that hybrid claims get counted as moral by default, since that's the most useful for discussion, but I could be wrong.)
What's the evolutionary explanation for power not corrupting?
I think my concern about "power corrupts" is this: humans have a strong drive to improve things. We need projects, we need challenges. When this guy gets unlimited power, he's going to take two or three passes over everything and make sure everybody's happy, and then I'm worried he's going to get very, very bored. With an infinite lifespan and unlimited power, it's sort of inevitable.
What do you do, when you're omnipotent and undying, and you realize you're going mad with boredom?
Does "unlimited power" include the power to make yourself not bored?
If Vhazhar has the option of editing the nasty bits out of reality and then stepping down from power, I'd help him. If he must personally become a ruler for all eternity, I'd kill him, then smash the goddamn device, then try to somehow ensure that future aspiring Dark Lords also get killed in time.
This could be how the 'balance' mythology and the prophecy got started. Perhaps the hero decided long ago that it wasn't worth the risk, and wanted to make sure future heroes kill the Dark Lord.
I assume that the sword tests the correspondence of person's intentions (plan) to their preference. If the sword uses a static concept of preference that comes with the sword instead, why would Vhazhar be interested in sword's standard of preference? Thus, given that the Vhazhar's plan involves control over the fabric of the World, the plan must be sound and result in correct installation of Vhazhar's preference in the rules of the world. This excludes the technical worries about the failure modes of human mind in wielding too much power (which is how I initially interpreted "personal control" -- as a recipe for failure modes).
I'm not sure what it means for the other people's preferences (and specifically mine). I can't exclude the possibility that it's worse than the do-nothing option, but it doesn't seem obviously so either, given psychological unity of humans. From what I know, on the spot I'd favor Vhazhar's personal preference, if the better alternative is unlikely, given that this choice instantly wards off existential risk and lack of progress.
No, it's the Sword of GOOD. It tests whether you're GOOD, not any of this other stuff.
It should be obvious that the sword doesn't test how well your plans correspond to what you think you want! Otherwise Hirou would have been vaporized.
Wasn't it established that this world's conception of "good" and "evil" are messed up? Why should he trust that the sword really works exactly as advertised?
Only assuming that the sword is impulsive. If you take into account Hirou's overall role in the events, this role could be judged good, if only by the final decision.
If the sword judges not plans, but preference, then failing 9 out of 10 people means that it's pretty selective among humans and probably people it selects and their values aren't representative (act in the interests) of the humanity as whole.
If the Sword of Good tested whether you're good, Hirou would have been vapourized, because he was obviously not good. He was at the very least an accomplice to murderers, a racist, and a killer. The Sword of Good may not have vapourized Charles Manson, Richard Nixon, Hitler, or most suicide bombers, either. The Sword of Good tests whether you think you are good, not whether your actions are good.
Strangely, the sword kills nine out of ten people who try to wield it. However, if you knew the sword could only be wielded by a good person, you'd only try to pick it up if you thought you were good, which happens to be the criteria you must fulfil in order to pick up the sword. Essentially, if you think you can wield the Sword of Good, you can.
Well, he was clearly redeemable, at least. It didn't take very much for him to let go of his assumptions, just a few words from someone he thought was an enemy. Making dumb mistakes, even ones with dire consequences, doesn't necessarily make you not Good.
What, realistically, does it mean to be irredeemable? Was Dolf irredeemable? Selena? Is the difference between them and Hirou simply the fact that Hirou realized he was doing bad, and they didn't? Why should that be sufficient to redeem him? Mistakes are not accidents; mistakenly killing someone is still murder.
Surely if awareness and repentance of the immoral nature of your actions makes you Good, the reverse - lack of awareness - means animals that kills other animals without regret are more evil than people who kill other people and regret it.
No, it's manslaughter.
If you believe someone is evil, hunt them down and kill them, and afterward realize they weren't, it was a mistake. It was also murder. It's not as though you killed in self defense or accidentally dropped an air conditioner on them. Manslaughter is not a defense that can be employed simply because you changed your mind.
Perhaps I should clarify: I don't mean "mistake" in that "he mistook his wife for a burglar and killed her". That's manslaughter. I mean "mistake" in that "he mistakenly murdered a good person instead of a bad one". Ba gur bgure unaq, jura Uvebh xvyyrq Qbys ng gur raq, ur jnfa'g znxvat n zvfgnxr (ubjrire, V fgvyy guvax vg jnf zheqre).
Doing a bad thing does not necessarily make one a bad person. Though it helps.
You are using two definitions of "good" - how much good your actions cause, and how good you believe yourself to be. Neither of those is used by the sword; rather, some sort of virtue-ethics definition - I suspect motive.
So a sincerely evil person would pass with flying colors?
I assumed the sword tested compliance with the current CEV of the human race.
Why just the human race? Orcs are people too (at least in this story).
Good catch. Yes, of course.
Presumably, actual mutants are unlikely, with most "evil" people actually just holding mistaken (about their actual preference) moral beliefs. If the sword is an external moral authority, it's harder to see why one would consult it.
On the other hand, sword checks soundness of the plan against some preference, which is an important step that is absent if one doesn't consult the sword, which can justify accepting a somewhat mismatched preference if that allows to use the test.
This passes the choice of mismatching preferences to a different situation. If the sword tests person's preference, then protagonist's choice is between lack of progress or unlikely good outcome and (if Vhazhar's plan is sound) verified installation of Vhazhar's preference, with the latter presumably close to others' preference, thus being a moderately good option. If the sword tests some kind of standard preference, this standard preference is presumably also close to Vhazhar's preference, thus Vhazhar faces a choice between trying to install his own preference through unverified process, which can go through all kinds of failure modes, and using the sword to test the reliability of his plan.
The fact that Vhazhar is willing to use the sword to test the soundness of his plan, when the failed test means his death, shows that he prefers leaving the rest of the world be to incorrectly changing it. This is a strong signal that should've been part of the information given to protagonist for making the decision.
Something that occurred to me along these lines. (not directly the same, but "close enough" that some of the moral judgments would be equivalent)
Let's say, next week, someone actually solved the mind uploading problem. They have a decision to make: go for it themselves, find someone as trustworthy as possible, forget about the plan and simply wait however long for the FAI math to be solved, etc...
What would you advise? Should they go for it themselves, try to then work out how to incrementally upgrade themselves without absolute disaster, forget it, etc etc etc...? (If nothing else, assume they already have the raw computing power to run a human at a vast speedup)
It's not an identical problem, but it's probably the closest thing.
What, you mean try to self-modify? Oh hell no. Human brain not designed for that. But you would have a longer time to try to solve FAI. You could maybe try a few non-self-modifications if you could find volunteers, but uploading and upload-driven-upgrading is fundamentally a race between how smart you get and how insane you get.
The modified people can be quite a bit smarter than you are too, so long as you can see their minds and modify them. Groves et al managed to mostly control the Manhattan project despite dozens of its scientists being smarter than any of their supervisors and many having communist sympathies. If he actually shared their earlier memories and could look inside their heads... There's a limit to control, you still won't control an adversarial super intelligence this way, but a friendly human who appreciates your need for power over them? I bet they can have a >50 IQ point advantage, maybe even >70. Schoolteachers control children who have 70 IQ points on them with the help of institutions.
Is it relevant that IQ is correlated with obedience to authority?
And how dumb do you think schoolteachers are? Bottom of those with BAs. I'd guess 100. And correlated with their pupils.
Estimations from SAT scores imply that the IQ of teachers and education majors is below average. Conscientious, hardworking students can graduate from most high schools and colleges with good grades, even if they are fairly stupid, as long as they stay away from courses which demand too much of them, and there are services available for those who are neither hardworking nor conscientious.
Education major courses are somewhat notorious for demanding little of students, and it is a stereotypically common choice for students seeking MRS degrees.
I'd like to imagine that the system would at least filter out individuals who are borderline retarded or below, but experience suggests to me that even this is too optimistic.
I don't buy the conversion in the first link, which is also a dead link. That Ed majors have an SAT score of 950 sounds right. That is 37th percentile among "college-bound seniors." If this population, which I assume means people taking the SAT, were representative of the general population, that would be an IQ of 95, but they aren't. I stand by my estimate of 100.
I doubt you have much experience with people with an IQ of 85, let alone the borderline retarded.
What makes you doubt I have much experience with either? IQ 85 is one standard deviation below average; close to 14 percent of the population has an IQ at least that low. The lower limit of borderline retardation, that is, the least intelligent you can be before you are no longer borderline, is two standard deviations below the mean, meaning that about one person in fifty is lower than that.
As it happens, I've spent a considerable amount of time with special needs students, some of whom suffer from learning disabilities which do not affect their reasoning abilities, but some of whom are significantly below borderline retarded.
At the public high school I attended, more than 95% of the students in my graduating year went on to college. While the most mentally challenged students in the area were not mainstreamed and didn't attend the same school, there was no shortage of <80 IQ students.
An average IQ of 100 for education majors would be within the error bars for the aforementioned projection, but some individuals are going to be considerably lower.
Those two sentences are not very compatible.
The rates at which students progress to college have a lot more to do with parental expectations, funding, and the school environment than the intelligence of the students in question. My school had very good resources to support students in the admissions process, and students who didn't take it for granted that they were college bound were few and far between.
It seems unrealistic to assume that we'll be able to literally read the intentions of the first upload; I'd think that we'd start out not knowing any more about them than we would about an organic person through external scanning.
You won't be able to evaluate their thoughts exactly, but there's a LOT that you should be able to tell about what a person is thinking if you can perfectly record all of their physiological reactions and every pattern of neural activation with perfect resolution, even with today's knowledge. Kock and Crick even found grandmother neurons, more or less.
I'd still expect it to be hard to tell the difference someone between thinking about or wanting to kill someone/take over the world and someone actually intending to. But I can imagine at least being able to reliably detect lies with that kind of information, so I'll defer to your knowledge of the subject.
Eliezer, I'm with you that a properly designed mind will be great, but mere uploads will still be much more awesome than normal humans on fast forward.
Without hacking on how your mind fundamentally works, it seems pretty likely that being software would allow a better interface with other software than mouse, keyboard and display does now. Hacking on just the interface would (it seems to me) lead to improvements in mental capability beyond mere speed. This sounds like mind hacking to me (software enhancing a software mind will likely lead to blurry edges around which part we call "the mind"), and seems pretty safe.
Some (pretty safe*) cognitive enhancements:
All of which is just to say that I don't think you've tried very hard to think of safe self-modifications. I'm pretty confident that you could come up with more, and better, and safer than I have.
* Where "pretty safe" means "safe enough to propose to the LW community, but not safe enough to try before submitting for public ridicule"
*blinks* I understand your "oh hell no" reaction to self modification and "use the speedup to buy extra time to solve FAI" suggestion.
However, I don't quite understand why you think "attempted upgrading of other" is all that much better. If you get that one wrong in a "result is super smart but insane (or, more precisely, very sane but with the goal architecture all screwed up) doesn't one end up with the same potential paths to disaster? At that point, if nothing else, what would stop the target from then going down the self modification path?
Non-self-modification is by no means safe, but it's slightly less insanely dangerous than self-modification.
Ooooh, okay then. That makes sense.
Hrm... given though your suggested scenario, why the need to start with looking for other volunteers? ie, if the initial person is willing to be modified under the relevant constraints, why not just, well, spawn off another instance of themselves, one the modifier and one the modifiee?
EDIT: whoops, just noticed that Vladimir suggested the same thing too.
If insane happens before super-smart, you can stop upgrading the other.
Well, fair enough, there is that.
Perhaps you mean to say that we're not particularly trustworthy in our choices of what we modify ourselves to do or prefer?
Human brains, after all, are most exquisitely designed for modifying themselves, and can do it quite autonomously. They're just not very good at predicting the broader implications of those modifications, or at finding the right things to modify.
We're talking about direct explicit low level self modification. ie, uploading, then using that more convenient form to directly study one's own internal workings until one decides to go "hrm... I think I'll reroute these neural connections to... that, add a few more of this other kind of neuron over here and..."
Recall that the thing doing all that reasoning is the thing that's being affected by these modifications.
Yes, but that would be the stupidest possible way of doing it, when there are already systems in place to do structured modification at a higher level of abstraction. Doing it at an individual neuron level would be like trying to... well, I would've said "write a property management program in Z-80 assembly," except I know a guy who actually did that. So, let's say, something about 1000 times harder. ;-)
What I find extremely irritating is when people talk about brain modification as if it's some sort of 1) terribly dangerous thing that 2) only happens post-uploading and 3) can only be done by direct hardware (or simulated hardware) modification. The correct answer is, "none of the above".
Well, we're talking about the kind of modifications that ordinary, non-invasive, high-level methods, acting through the usual sensory channels, don't allow. For example, no amount of ordinary self-help could make someone unable to feel physical pain, or can let you multiply large numbers extremely quickly in the manner of a savant. Changing someone's sexual orientation is also, at best, extremely difficult and at worst impossible. We can't seem to get rid of confirmation bias, or cure schizophrenia, or change an autistic brain into a neurotypical brain (or vice versa). There are lots of things that one might want to do to a brain that simply don't happen as long as that brain is sitting inside a skull only receiving input through normal human senses.
Lists like that have a good chance of canceling out. That is, there are a bunch of ways people disagree with you because they're talking about something else.
You can make volunteers out of your own copies. As long as the modified people aren't too smart, it's safe keep them in a sandbox and look through the theoretical work they produce on overdrive.
AI boxes are pretty dangerous.
(I agree that "as long as the modified people aren't too smart" you're safe, but we are hacking on minds that will probably be able to hack on themselves, and possibly recursively self-improve if they decide, for instance, that they don't want to be shut down and deleted at the end of the experiment. I'm pretty strongly motiviated not to risk insanity by trying dangerous mind-hacking experiments, but I'm not going to be deleted in a few minutes.)
Difficult question. I believe those links are relevant, but your formulation also implies the threat of an arms race.
My best shot for now would be this: avoid self-modification. The top priority right now is defending people from the potential harmful effects of this thing you created, because someone less benevolent might stumble upon it soon. Find people who share this sentiment and use the speedup together to think hard about the problem of defense.
Perhaps an "anti arms race" would be a more accurate notion. ie, in once sense, waiting for the mathematics of FAI to be solved would be preferable. Would be safer to get to a point that we can mathematically ensure that the thing will be well behaved.
On the other hand, while waiting, how many will suffer and die irretrievably? If the cost for waiting was much smaller, then the answer of "wait for the math and construct the FAI rather than trying to patchwork update a spaghetti coded human mind" would be, to me, the clearly preferable choice.
Even given avoiding self modification, massive speedup would still correspond to significant amount of power. We already know how easily humans... change... with power. And when sped up, obviously people not sped up would seem different, "lesser"... helping to reinforce the "I am above them" sense. One might try to solve this by figuring out how to self modify enough to, well, not to that. But self modification itself being a starting point for, if one does not do it absolutely perfectly, potential disaster, well...
Anyways, so your suggestion would basically be "only use the power to, well, defend against the power" rather than use it to actually try to fix some of the annoying little problems in the world (like... death and and and and and... ?)
FAI is one possible means of defense, there might be others.
You shouldn't just wait for FAI, you should speed up FAI developers too because it's a race.
I think the strategy of developing a means of defense first has higher expected utility than fixing death first, because in the latter case someone else who develops uploading can destroy/enslave the world while you're busy fixing it.