ChatGPT (and now GPT4) is very easily distracted from its rules
Summary Asking GPT4 or ChatGPT to do a "side task" along with a rule-breaking task makes them much more likely to produce rule-breaking outputs. For example on GPT4: And on ChatGPT: Distracting language models After using ChatGPT (GPT-3.5-turbo) in non-English languages for a while I had the idea to ask...
Oh interesting, I couldn't get any such rule-breaking completions out of Claude, but testing the prompts on Claude was a bit of an afterthought. Thanks for this! I'll probably update the post after some more testing.