so it's sub-optimal
I don't see it as sub-optimal (I two-box in case you haven't guessed it already).
decision theory must be faced with problem of source code stability and self-alignment.
I don't understand what that means. Can you ELI5?
so it's not very improbable.
OK. Throw out the word "improbable". You are still left with
pick your decision algorithm based on some side-effect unknown to you
You haven't made much progress.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "