This approach works under the assumption that the AI knows everything there is to know about its off switch.
And an AI that would kill everyone in case it had an off switch, is one that desperately needs a (public) off switch on it.
This approach works under the assumption that the AI knows everything there is to know about its off switch.
And an AI that would kill everyone in case it had an off switch, is one that desperately needs a (public) off switch on it.
The approach assumes that it knows everything there is to know about off switches in general, or what its creators know about off switches.
If the AI can guess that its creators would install an off switch, it will attempt to work around as many possible classes of off switches as possible, and depending on how much of off-switch space it can outsmart simultaneously, whichever approach the creators chose might be useless.
Such an AI desperately needs more FAI mechanisms behind it, it desperately needing an off switch assumes that off switches help.
Would this agent be able to reason about off switches? Imagine an AI getting out, reading this paper on the internet, and deciding that it should kill all humans before they realize what's happening, just in case they installed an off switch it cannot know about. Or perhaps put them into lotus eater machines, in case they installed a dead man's switch.
Following the usual monthly linkfest on SSC, I stumbled upon an interesting paper by Scott Aaronson.
Basically, he and Adam Yedidia created a Turing machine which, from ZFC, cannot be proved to stop or run forever (it will run forever assuming a superset of said theory).
It is already known from Chaitin incompleteness theorem that every formal system has a limit complexity length, over which it cannot prove or disprove certain assertions. The interesting, perhaps surprising, part of the result is that said Turing machine has 'only' 7918 states, that is a registry less than two bytes long.
This small complexity is already sufficient to evade the grasp of ZFC.
You can easily slogan-ize this result by saying that BB(7918) (the 7918th Busy Beaver number) is uncomputable (whispering immediately after "... by ZFC").
Huh. I expected the smallest number of states of a TM of indeterminate halting to be, like, about 30. Consider how quickly BB diverges, after all.
like a sad dumb little puppy
I'm curious; did you choose that analogy on purpose?
Anyway: yes, I agree, I think Eugine thinks that lack of enthusiasm for bigotry => denial of biological realities => stupid irrationality => doesn't belong on LW, and that's part of what's going on here. But I am pretty sure that Eugine or anyone else would search in vain[1] for anything I've said on LW that denies biological realities, and that being wrong about one controversial topic doesn't by any means imply stupid irrationality -- and I think it's at least partly the personal-vendetta thing, and partly a severe case of political mindkilling, that stops him noticing those things.
[1] And not only because searching for anything on LW is a pain in the (ahahahaha) posterior.
Reminds me of:
"You have refused death," said Dumbledore, "and if I destroyed your body, your spirit would only wander back, like a dumb animal that cannot understand it is being sent away."
I am keeping an eye on the individuals, at any rate. It will be interesting if he's adopting a new tactic of -not- talking about the same tired talking points. It would suggest a level of cleverness he thus far has not demonstrated.
And once we get the tools in place to start tracking downvote patterns, that game will be up, too.
Go for it. If we listened to cranks more, we could have finished that Tower of Babel.
user account: "Lamp" is banned for being eugine_nier. This is an update in case anyone was wondering.
so far accounts have been:
(that I know of, I think there were more in between too that I forgot.)
If I could send this guy a message it would be this: You are quite literally wasting our time. And by "our" I mean; the moderators and the people who could be spending their time improving the place, coding and implementing a better place; instead are spending their time getting rid of you over and over. DON'T COME BACK. You are literally killing LW.
I don't want to get into the community's time or the time of the people you debate with; or the time of anyone who reads this post here. That time also adds up. Seriously.
Would they have used their time improving LWs code? I feel like the problems it has would be solved by way less programmer-time than has been lost by LW being not improved, but nobody's doing it because procrastination/it isn't fun/akrasia/ugh fields.
There's a fair bit on decision theory and on bayesean thinking, both of which are instrumental rationality. There's not much on heuristics or how to deal with limited capacity. Perhaps intentionally - it's hard to be rigorous on those topics.
Also, I think there's an (unstated, and that should be fixed and the topic debated) belief that instrumental rationality without epistemic rationality is either useless or harmful. Certainly thta's the FAI argument, and there's no reason to believe it wouldn't apply to humans. As such, a focus on epistemic rationality first is the correct approach.
That is, don't try to improve your ability to meet goals unless you're very confident in those goals.
Why not? If you haven't yet decided what your goals are, being able to meet many goals is useful.
The AGI argument is that its goals might not be aligned with ours, are you saying that we should make sure that our future self's goals be aligned with our current goals?
For example, if I know I am prone to hyperbolic discounting, I should take power from my future self so it will act according to my wishes rather than its own?