Open thread, June. 19 - June. 25, 2017

Elo

If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "Make this post available under..." before submitting

If it's worth saying, but not worth its own post, then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "Make this post available under..." before submitting

The idea is a not intended to be used as a primary way of the AI control but as the last form of AI turn off option. I describe it in the lengthy text, where all possible ways of AI boxing are explored, which I am currently writing under the name "Catching treacherous turn: confinement and circuit breaker system to prevent AI revolt, self-improving and escape".

It also will work only if the reward function is presented not as plain text in the source code, but as a separate black box (created using cryptography or physical isolation). The stop code is, in fact, some solution of complex cryptography used in this cryptographic reward function.

I agree that running subagents may be a problem. We still don't have a theory of AI halting. It probably better to use such super reward before many subagents were created.

The last your objection is more serious as it shows that such mechanism could turn safe AI into dangerous "addict".