A few thoughts:
It sounds like he's being rebellious. Separate the rebelliousness from the question of profanity, and discuss them separately. You might say something like "Asking questions if you genuinely want to understand something better is great, but asking questions to try to frustrate or annoy me is not. I'm getting the sense that you're doing the latter."
If he persists, put your foot down -- but be really clear that it's for the intent to annoy you, rather than because he's asking questions in an attempt to honestly understand something.
It may also help to ask him, openly and gently, why he's being rebellious. Sometimes rebelliousness comes from a perception that the rules are arbitrary or unfair. If you understand what he's feeling, you can be in a better position to address those underlying causes. For example -- Larks's suggestion that sharing the reasons you don't want him to swear may help. And maybe it would also help him to explain how this is a subjective issue, highly dependent on things like tone and social context, and perfectly clear rules are unfortunately impossible.
Yeah, there are plenty of examples of dictators that go through great lengths to inflict tremendous amounts of pain on many people. It's terrifying to think of someone like that in control of an AGI.
Granted, people like that probably tend to be less likely than the average head-of-state to find themselves in control of an AGI, since brutal dictators often have unhealthy economies, and are therefore unlikely to win an AGI race. But it's not like they have a monopoly on revenge or psychopathy either.
Practical problem #3: The agency successfully understands your intentions, and is willing to implement them, but not able to implement them.
For example, a fast intelligence explosion removes their capability of doing so before they can pull the plug. Or a change in their legal environment makes it illegal for them to pull the plug (and they aren't willing to put themselves at legal risk to do so).
Uh, plenty of born are born into worse-than-death situations already, at least by our standards, yet they generally make a go of their lives instead of committing suicide. We call many of them our "ancestors."
Can you elaborate? Your statement seems self-contradictory. By definition, situations "worse than death" would be the ones in which people prefer to kill themselves rather than continue living.
In the context of the original post, I take "worse-than-death" to mean (1) enough misery that a typical person would rather not continue living, and (2) an inability to commit suicide. While I agree many of our ancestors have had a rough time, relatively few of them have had it that hard.
I don't put any stock in the scary scenarios where an evil Omega tortures a gazillion of my revived clones for eternity.
Could you elaborate on this? I'd be curious to hear your reasoning.
Does "don't put any stock" mean P(x) = 0? 0.01? 1e-10?
I'm not sure "status conflict" is the only possibility here; for example, the terminal value might be something like autonomy, or feeling genuinely listened to.