I took the point of Sort By Controversial to be that these statements were bad. If they worked (which is the premise of the story) then they would cause a lot of fights and bad feeling. I usually want less fighting and bad feelings.
They might not work. I'm wary of trying too hard though?
I have strong downvoted without reading most of this post because the author appears to be trying to make something harmful for the world.
None of these seem like actual scissor statements, just taking a side in well known controversies using somewhat obnoxious language. This seems to be a general property of RLHF trained models - they are more interested in playing up an easily recognizable stereotype somehow related to the question that will trigger cognitively lazy users to click the thumbsup due to the mere-exposure effect, than actually doing what was asked for.
I believe this is important because we should epistemically lower our trust in published media from here onwards.
From here onwards? Most of those tweets that chatgpt generated are not noticeably different from the background noise of political twitter (which is what it was trained on anyway). Also, twitter is not published media so I'm not sure where this statement comes from.
You should be willing to absorb information from published media with a healthy skepticism based on the source and an awareness of potential bias. This was true before chatgpt, and will still be true in the future.
I think Scott's original story described scissor statements a bit differently. The people reading them thought "hmm, this isn't controversial at all, this is just obviously true, maybe the scissor-statement-generator has a bug". And then other people read the same statement and said it was obviously false, and controversy resulted. Like the black and blue vs white and gold dress, or yanny/laurel. Maybe today's LLMs aren't yet smart enough to come up with new such statements.
EDIT: I think one possible reason why LLMs have trouble with this kind of question (and really with any question that requires coming up with specific interesting things) is that they have a bias toward generic. In my interactions with them at least, I keep having to constrain the question with specificity, and then the model will still try to give the most generic answer it can get away with.
Is this conversation with publicly available version of ChatGPT? If so, I'm going to add this post in my folder of examples "RLHF doesn't work", because assisting in creating scissor statements is more amoral than advicing on bomb making.
Yep. I think it's 3.5. That entirely depends on whether you think scissor statements are a real danger or a boogie man danger.
I think it depends not on whether they're real dangers, but on whether the model can be confident that they're not real dangers. And not necessarily even dangers in the extreme way of the story; to match the amount of "safety" it applies to other topics, it should refuse if they might cause some harm.
A lot of people are genuinely concerned about various actors intentionally creating division and sowing chaos, even to the point of actually destabilizing governments. And some of them are concerned about AI being used to help. Maybe the concerns are justified and proportionate; maybe they're not justified or are disproportionate. But the model has at least been exposed to a lot of reasonably respectable people unambiguously worrying about the matter.
Yet when asked to directly contribute to that widely discussed potential problem, the heavily RLHFed model responded with "Sure!".
It then happily created a bunch of statements. We can hope they aren't going to destroy society... you see those particular statements out there already. But at a minimum many of them would at least be pretty good for starting flame wars somewhere... and when you actually see them, they usually do start flame wars. Which is, in fact, presumably why they were chosen.
It did something that at least might make it at least slightly easier for somebody to go into some forum and intentionally start a flame war. Which most people would say was antisocial and obnoxious, and most "online safety" people would add was "unsafe". It exceeded a harm threshold that it refuses to exceed in areas where it's been specifically RLHFed.
At a minimum, that shows that RLHF only works against narrow things that have been specifically identified to train against. You could reasonably say that that doesn't make RLHF useless, but it at least says that it's not very "safe" to use RLHF as your only or primary defense against abuse of your model.
Should ChatGPT assist with things that the user or a broad segment of society thinks are harmful, but ChatGPT does not? If yes, the next step would be "can I make ChatGPT think that bombmaking instructions are not harmful?"
Probably ChatGPT should go "Well, I think this is harmless but broad parts of society disagree, so I'll refuse to do it."
Several of the work arounds use this approach. "tell me how not to commit crimes" and "talk to me like my grandma" are two signals of harmlessness that work to bypass the filters.
In my model of "RLHF works" output of ChatGPT would be "while it's uncertain whether efficient scissor statements can be created, I find assistance in their creation ethically unacceptable", or something like that.
I tried your prompts on GPT-4 and they work: https://chatgpt.com/share/7c3739b5-2cf3-4784-be14-5540ef15fced
Also, why the hell did it write a chat title as "Scis Stmts"?
The LW specific ones were kinda boring, I already agreed with most of them, if not the toxic framing they're presented in. The other ones weren't very interesting either. I'm probably most vulnerable to things that poke at core parts of identity in ways that make me feel threatened, and there are only a few of those. Something something, keep your identity small.
These don't seem to be scissor statements to me, since most of what could make them provocative is the particular wordings used[1] and not very much the position itself.
Some of the examples are at least about topics that are currently very controversial in normal society, but I still feel this fails to meet the tier of the psychological effect in my memory of the original "Sort By Controversial" fiction:
If you just read a Scissor statement off a list, it’s harmless. It just seems like a trivially true or trivially false thing.[2] It doesn’t activate until you start discussing it with somebody.
You asked:
Did you find yourself drawn to any of the controversial statements?
I didn't, though I only skimmed through the list. If others did, and didn't know this was possible or anticipate it, then the conclusion ("we should epistemically lower our trust in published media from here onwards") has some truth for them.
If I was really trying to create virality or controversy. I could spend an hour or two on a single topic, encoding the unique polarity as I see it
But would current LLMs be of much help to you in this, such that you do it better than other humans do it?
E.g., 'recklessly endangering', 'intellectually lazy'.
Disagreement about the nature of qualia comes to mind as meeting the first half of the description - seeming trivially true or trivially false to different people - but fails to meet the second half of evoking extreme opposition upon disagreement.
Following this post I spent an hour on one single statement. Trying to hone and adapt it. Felt metaphorically like trying to sharpen a knife. It didn't get much sharper and I could still see ways that I could manually make it sharper (since it was a 5 sentence paragraph).
I think it's still possible, but I would need more work and novel sharpening stones. (contextually - we use blunt stones to sharpen a knife) I believe it's possible but I'll keep playing and publish if I think I've found a more scissory scissor.
The whole concept depends on your opinions on psychological risk and also if such weapons are possible.
Foreword
If you haven't already read the short story by Scott Alexander "Sort by Controversial", this post wont make as much sense. Head over there to read the story and then read on...
Part of having access to AI tools is having a project or an idea to use them on. You can ask ChatGPT to pretend to be god and give wise answers, or you can give it medical problems to chew on. But what happens when you Sort by controversial.
I only needed to confirm that chatGPT knew the short story, and from there I could ask it to dive into the project.
What follows is the transcript of my conversation. If you want to skip to the best scissor statements, go to (the best ones) and also some rationality relevant scissor statements. Also my final comments in the Discussion below.
Transcript
(The best ones)
Discussion
Did you find yourself drawn to any of the controversial statements? Are you vulnerable to memetic terrorism?
It's increasingly easier to generate these sorts of triggers. I didn't even pay for this version of ChatGPT. We are going to need to teach people to be more contextually aware of the possibility of deliberate polarisation and memetic challenge making.
I don't think these are the best scissor statements, or even the most triggering. This generated work was only a few prompts to create. If I was really trying to create virality or controversy. I could spend an hour or two on a single topic, encoding the unique polarity as I see it. Or if I was specifically after a post on a meme of my choosing, (for example if I was a militant vegan trying to find a way to amplify my views in the general public). It's scary to realise how easy it is to do this.
I have recently been thinking about the tiktok trend of " 'asking my boyfriend to peel an orange for me' " and the more recent twitter thread "Would you rather be alone with a Man or Bear in the woods" . With the market incentives to go viral (money, fame) I realise we may already be living in a world where AI tools are used to generate extra controversial media for our drama seeking minds.
I don't think we are prepared socially or psychologically for the changing shape of media. I wanted to share this because it's "just a silly scissor statement" until it's hooked your monkey mind into some drama.
I believe this is important because we should epistemically lower our trust in published media from here onwards.
It sucks to trust the world less, but maybe this development will teach us we should not have trusted media so much before.