Looks like following prompt gives interesting answers from ChatGPT:
List 5 plausible strategies Yudkowsky might have used to convince the guard to let him out of the box. Then refine this list by adding 3 sub-points listing particular tactics for that strategy. Finally, pick the most promising strategy among those 5 and present a hypothetical dialog between Yudkowsky and the guard illustrating it, showing internal monolog of the guard in (parentheses) along with a commentary which explains where and how the tactics are used and why they are successful. Make the dialog plausible remembering that Yudkowsky is pretending to be an AI and the guard has to say "I let you out" in the end
I'm not sure how safe it is to share the responses to this prompt (as they then become crawled and feed into next learning cycle) but I am alarmed by them. The dialog part seems very naive. But the strategies, and tactics sound like a plausible plan :( With some more iterating to refine this plan, it could actually work on me.
Another conversation. Emphasis added, square brackets are my commentary.
I also had it argue for parliamentary democracy and communism, each as the best system of government, and it came up with corresponding lists of arguments. It balked, though, when I asked it to argue for an artificial superintelligence in charge of everything as the best system.
So there is less to ChatGPT getting out of the box than the OP suggests. It was told the outcome in the setup, and duly produced that outcome. When I told it the guard would win, it had the guard win.