How do you distinguish his preferences being irrationally inconsistent [...] from him truly wanting to be in relationships periodically[...]?
By talking to him. If it's the latter, he'll be able to say he prefers flip flopping like it's just a matter of fact and if you probe into why he likes flip flopping, he'll either have an answer that makes sense or he'll talk about it in a way that shows that he is comfortable with not knowing. If it's the former, he'll probably say that he doesn't like flip flopping, and if he doesn't, it'll leak signs of bullshit. It'll come off like he's trying to convince you of something because he is. And if you probe his answers for inconsistencies he'll get hostile because he doesn't want you to.
I'm not sure where you're going with the "magic pill" hypotheticals, but I agree. The only thing I can think to add is that a lot of times the "winning behaviors" are largely mental and aren't really available until you understand the situation better.
For example, if you break your foot and can't get it x-rayed for a day, the right answer might be to just get some writing done - but if you try to force that behavior while you're suffering, it's not gonna go well. You have to actually be able to dismiss the pain signal before you have a mental space to write in.
I'm not sure where you're going with the "magic pill" hypotheticals, but I agree.
I meant that if someone is behaving irrationally, forcing them to stop that behavior should make them better off. But it seems unlikely to me that forcing him to stay in his current relationship forever, or preventing him from ever entering a relationship (these are the two ways he can be stopped from flip-flopping) actually benefit him.
I think we should stop talking about utility functions.
In the context of ethics for humans, anyway. In practice I find utility functions to be, at best, an occasionally useful metaphor for discussions about ethics but, at worst, an idea that some people start taking too seriously and which actively makes them worse at reasoning about ethics. To the extent that we care about causing people to become better at reasoning about ethics, it seems like we ought to be able to do better than this.
The funny part is that the failure mode I worry the most about is already an entrenched part of the Sequences: it's fake utility functions. The soft failure is people who think they know what their utility function is and say bizarre things about what this implies that they, or perhaps all people, ought to do. The hard failure is people who think they know what their utility function is and then do bizarre things. I hope the hard failure is not very common.
It seems worth reflecting on the fact that the point of the foundational LW material discussing utility functions was to make people better at reasoning about AI behavior and not about human behavior.