All of stanislavzza's Comments + Replies

No, didn't read the sequences. I will do that. The link might be better named to something that indicates what it actually is. But I didn't say the AIs would be safe (or super-intelligent, for that matter), and I don't assume they would be. But those who create them may assume that.

6JGWeissman
This sort of disclaimer can protect in you in a discussion on the level of armchair philosophy, whose sole purpose is to show off how smart you are, but if you were to actually build an AI, and it went FOOM and tiled the universe with molecular smiley faces, taking all humans apart in the process, the fact that you didn't claim the AI would be safe would not compel the universe to say "that's all right, then" and hit a magic reset button to give you another chance. Which is why we ask the question "Is this AI safe?" and tend to not like ideas that result in a negative answer, even if the idea didn't claim to address that concern.

The kind of constraint you propose would be very useful. We would have to first prove that there is a kind of topology in under general computation (because the machine can change its own language, so the solution can't be language specific) that only allows non-suicidal trajectories under all possible inputs and self-modifications. (or perhaps at least with low probability, but this is not likely to be computable). I have looked, but not found such a thing in existing theory. There is work on topology of computation, but it's something different from this... (read more)

Imagine that you want to construct an AI that will never self-halt (easier to define than friendliness, but the same idea applies). You could build the machine so that it doesn't have an off switch, and therefore can't halt simply out of inability. However, if the machine can self-modify, it could subsequently grant itself the ability to halt. So in your design, you'd have to figure out a way to prevent self-halting under all possible input conditions, under all possible self-modifications of the machine. This latter task cannot be solved in the general ca... (read more)

2red75
Self-modifications are being performed by the machine itself. Thus we (and/or machine) don't need to prove that all modifications aren't "suicidal". Machine can be programmed to perform only provably (in reasonable time) non-suicidal self-modifications. Rice's theorem doesn't apply in this case. Edit: However this leaves meta-level unpatched. Machine can self-modify into non-suicidal machine that doesn't care about preserving non-suicidability over modifications. This can be patched by constraining allowed self-modifications to a class of modifications that leads to machines with provably equivalent behavior (with a possible side effect of inability to self-repair).

I think if you want "proven friendly" AIs, they would almost have to be evolved because of Rice's Theorem. Compare it to creating a breed of dog that isn't aggressive. I think FOOM fails for the same reason--see the last bit of "Survival Strategies" .

As you say, it may not be practical to do so, perhaps because of technological limitations. But imagine a set "personality engine" with a bunch of parameters that affect machine-emotional responses to different stimuli. Genetic programming would be a natural approach to find a good mix of those parameter values for different applications.

6Eugine_Nier
How is Rice's theorem at all relevant here? Note: Just because there is no general algorithm to tell whether an arbitrary AI is friendly, doesn't mean it's impossible to construct a friendly AI.

You might be interested in this New Scientist article: Evidence that we can see the future to be published