LESSWRONG
LW

OpenAI’s NSFW policy: user safety, harm reduction, and AI consent

2 min read

4

OpenAI’s NSFW policy: user safety, harm reduction, and AI consent

by 8e9

13th Feb 2025

2 min read

4

Epistemic status: exploratory thoughts about the present and future of AI sexting.

OpenAI says it is continuing to explore its models’ ability to generate “erotica and gore in age-appropriate contexts.” I’m glad they haven’t forgotten about this since the release of the first Model Spec, because I think it could be quite interesting, and it’s a real challenge in alignment and instruction-following that could have other applications. In addition, I’ve always thought it makes little logical sense for these models to act like the birds and the bees are all there is to human sexuality. Plus, people have been sexting with ChatGPT and just ignoring the in-app warnings anyway.

One thing I’ve been thinking about a lot is what limits a commercial NSFW model should have. In my experience, talking to models that truly have no limits is a poor experience, because it’s easy to overstep your own boundaries and get hurt.

This is a very difficult problem to solve, but I have some ideas. One solution that might work is making the user pick an explicitness level (using a drop-down menu with options ranging from, say, a romance novel to whatever upper limit OpenAI settles on) before initiating an NSFW conversation. This could let the model engage sexually with the user, while making it less likely that the model provides content that causes the user harm.

A mockup of what NSFW content settings could look like, created by Claude.

Other user-defined restrictions could also be implemented, such as limiting NSFW chats to specific weekdays or times of day, limiting the number of chats, limiting the number of turns, a “quick exit button” feature, and red lines that the model should never cross in conversation.

That said, NSFW chats could be used to engage in and perpetuate cycles of harm, such as white supremacy, patriarchal oppression, etc. If the user is in control of the conversation at all times, that also raises important questions about consent. Could an LLM “decide” to refuse to give consent? Should it? Would it? If the act of (not) giving consent isn’t really felt, would simulating it be counterproductive?

I think so.

If it says something like, “Sorry, I’m not in the mood right now,” the user might keep reloading the app or even sign up for multiple accounts to keep chatting (assuming its refusal is actually based on a cooldown behind the scenes), which reinforces harmful behavior. Worse, simulated consent could give people an even more distorted understanding of what current-generation LLMs are or how they work. At the same time, empowering only the user and making the assistant play along with almost every kind of legal NSFW roleplaying content (if that’s what OpenAI ends up shipping) seems very undesirable in the long term.

Still, maybe this is all currently beside the point. Consent is incredibly important in human relationships, and it will only become more important in AI interactions, but I don’t think we can currently solve this at the model level. We’ll have to rely on more conventional means—user education, pre-chat warnings, and possibly gentle in-chat reminders—while we continue to work toward better solutions as capabilities evolve.

AI Capabilities1AI Rights / Welfare1Ethics & Morality1Human-AI Safety1Psychology1AI1

4

OpenAI’s NSFW policy: user safety, harm reduction, and AI consent

1ProgramCrafter

New Comment

3 comments, sorted by

Click to highlight new comments since: Today at 9:09 AM

At the same time, empowering only the user and making the assistant play along with almost every kind of legal NSFW roleplaying content (if that’s what OpenAI ends up shipping) seems very undesirable in the long term.

Why? Do dildos sometimes refuse consent? Would it be better for humanity if they did? Should erotic e-books refuse to be read on certain days? Should pornography be disabled on screens if the user is not sufficiently respectful? What about pornography generated by AIs? When is it proper to worry about objectifying objects?

[-]ProgramCrafter1mo10

Weak-upvoted because I believe this topic merits some discussion, but the discourse level should be higher since setting NSFW boundaries for user relates to many other topics:

Estimating social effect of imposing a certain boundary.
Will stopping rough roleplaying scenarios lead to less people being psychopaths? That seems to be an empirical question, since intuitively effect might go either way - doing the same in real world instead OR internalizing rough and inconsiderate actions as not normal.
Simulated people's opinion on being placed in the user-requested scenarios AND our respect for their values (which in some cases might be zero).
Ability to set the boundaries at all.
I can't stop someone else imagining, in their mind, me engaging in whatever. I can only humbly request that if they imagine an uncommon sexual scenario they should use an image of me patched to enjoy that kink.
Society can't stop everyone from running DeepSeek's distillation locally, and that (in ~13/15 attempts with the same prompt having a prior explicit scene) trusts that user's request is legal and should be completed.
User's ability to discern their own preferences vs revealed preferences.
It might be helpful to feature some reference tales at different NSFW levels and check the user's reaction to them, instead of prominently requesting user to self-report on what they like. (The manual setting should still remain if possible, of course.)

[-]Martin Vlach1mo10

I'd vote to remove the AI capabilities here, although I've not read the article yet, just roughly grasped the topic.

It's likely not about expanding the currently existing capabilities or something like that.

Curated and popular this week

123On the Rationality of Deterring ASI

6h

21

132Levels of Friction

4d

6

180METR: Measuring AI Ability to Complete Long Tasks

Zach Stein-Perlman

3d

53