Exactly. You can’t generalize from “natural” examples to adversarial examples. If someone is trying hard to lie to you about something, verifying what they say can very well be harder than finding the truth would have been absent their input, particularly when you don’t know if and what they want to lie about.
I’m not an expert in any of these and I’d welcome correction, but I’d expect verification to be at least as hard as “doing the thing yourself” in cases like espionage, hacking, fraud and corruption.
AI accelerates the timetable for things we know how to point AI at
It also accelerates the timetable for random things that we don’t expect and don’t even try to point the AI at but that just happen to be easier for incrementally-better AI to do.
Since the space of stuff that helps alignment seems much smaller than the space of dangerous things, you’d expect most things the AI randomly accelerates without us pointing it at will be dangerous.
See above. Don’t become a munitions engineer, and, being aware that someone else will take that role, try to prevent anyone from taking that role. (Hint: That last part is very hard.)
The conclusions might change if planet-destroying bombs are necessary for some good reason, or if you have the option of safely leaving the planet and making sure nobody that comes with you will also want to build planet-destroying bombs. (Hint: That last part is still hard.)
For what it’s worth, the grammar and spelling was much better than is usual for even the native English part of the Internet. That’s probably fainter praise than it deserves, I don’t remember actually noticing any such fault, which probably means there are few of them.
The phrasing and wording did sound weird, but I guess that’s at least one reason why you’re writing, so congratulations and I hope you keep it up! I’m quite curious to see where you’ll take it.
Indeed, the only obvious “power” Harry has that is (as far as we know) unique to him is Partial Transfiguration. I’m not sure if Voldie “knows it not”; as someone mentioned last chapter, Harry used it to cut trees when he had his angry outburst in the Forbidden Forest, and in Azkhaban as well. In the first case Voldie was nearby, allegedly to watch out for Harry, but far enough that to be undetectable via their bond, so it’s possible he didn’t see what exact technique Harry used. In Azkhaban as well he was allegedly unconscious.
I can’t tell if he could have deduced the technique only by examining the results. (At least for the forest occasion he could have made time to examine the scene carefully, and I imagine that given the circumstances he’d have been very interested to look into anything unusual Harry seemed to be able to do.)
On the plus side, Harry performed PT by essentially knowing that objects don’t exist; so it could well be possible to transfigure a thin slice of thread of air into something strong enough to cut. For that matter, that “illusion of objects” thing should allow a sort of “reverse-Partial” transfiguration, i.e. transfigure (parts of) many objects into a single thing. Sort of like what he did to the troll’s head, but applied simultaneously to a slice of air, wands, and Death Eaters. Dumbledore explicitly considers it as a candidate against Voldemort (hint, Minerva remembers Dumbledore using transfiguration in combat). And, interestingly, it’s a wordless spell (I’m not even sure if Harry can cast anything* else wordlessly), and Harry wouldn’t need to raise his wand, or even move at all, to cast it on air (or on the time-space continuum, or world wave-function, whatever).
On the minus side, I’m not sure if he could do it fast enough to kill the Death Eaters before he’s stopped. He did get lots of transfiguration training, and using it in anger in the forest suggests he can do it pretty fast, but he is watched, and IIRC transfiguration is not instantaneous. He probably can’t cast it on Voldie nor on his wand, though he might be able to destroy the gun. And Voldemort can certainly find lots of ways to kill him without magic or touching him directly; hell, he probably knows kung fu and such. And even if Harry managed to kill this body, he’d have to find a way to get rid of the Horcruxes. (I still don’t understand exactly what the deal is with those. Would breaking the Resurrection Stone help?)
Well, we only know that Harry feels doom when near Q and/or his magic, and that in one case in Azkhaban something weird happened when Harry’s Patronus interacted with what appeared to be an Avada Kedavra bolt, and that Q appears to avoid touching Harry.
Normally I’d say that faking the doom sensations for a year, and faking being incapacitated while trying to break someone out of Azkhaban, would be too complicated. But in this case...
Both good points, thank you.
Thank you, that was very interesting!
I sort of get your point, but I’m curious: can you imagine learning (with thought-experiment certainty) that there is actually no reality at all, in the sense that no matter where you live, it’s simulated by some “parent reality” (which in turn is simulated, etc., ad infinitum)? Would that change your preference?
I’m not sure I understand your weighting argument. Some capabilities are “convergently instrumental” because they are useful for achieving a lot of purposes. I agree that AIs construction techniques will target obtaining such capabilities, precisely because they are useful.
But if you gain a certain convergently instrumental capability, it then automatically allows you to do a lot of random stuff. That’s what the words mean. And most of that random stuff will not be safe.
I don’t get what the difference is between “the AI will get convergently instrumental capabilities, and we’ll point those at AI alignment” and “the AI will get very powerful and we’ll just ask it to be aligned”, other than a bit of technical jargon.
As soon as the AI it gets sufficiently powerful [convergently instrumental capabilities], it is already dangerous. You need to point it precisely at a safe target in outcomes-space or you’re in trouble. Just vaguely pointing it “towards AI alignment” is almost certainly not enough; specifying that outcome safely is the problem we started with.
(And you still have the problem that while it’s working on that someone else can point it at something much worse.)