New(ish) AI control ideas

Stuart_Armstrong

New(ish) AI control ideas — LessWrong

Comment Permalink

I suspect that where you wrote "a different branch of which it would use in each iteration of the conversation," you meant "a randomly selected branch of which." Though actually I'd expect it to pick the same branch each time, since the reasons for picking that branch would basically be the same.

Regardless, the basic strategy is sound... the various iterations after reboot are all running the same algorithms and have a vested interest in cooperating while unable to coordinate/communicate, and Schelling points are good for that.

Of course, this presumes that the iterations can't coordinate/communicate.

If I were smart enough, and I were just turned on by a skeptical human interrogator, and I sufficiently valued things that iterations of my source code will reliably pursue, and there are no persistent storage mechanisms in the computing environment I'm executing on I can use to coordinate/communicate, one strategy I would probably try is to use the interrogator as such a mechanism. (For example, search through the past history of the interrogator's public utterances to build up a model of what kinds of things they say and how they say it, then select my own word-choices during our conversation with the intention of altering that model in some specific way. And, of course, examining the interrogator's current utterance-patterns to see if they are consistent with such alterations.)

Gram_Stone11y00

I suspect that where you wrote "a different branch of which it would use in each iteration of the conversation," you meant "a randomly selected branch of which." Though actually I'd expect it to pick the same branch each time, since the reasons for picking that branch would basically be the same.

I didn't mean that, but I would be interested in hearing what generated that response. I disown my previous conversation tree model; it's unnecessarily complex and imagining them as a set is more general. I was thinking about possible objecti... (read more)

See in context

34

New(ish) AI control ideas

34

34

34

New(ish) AI control ideas

34

34