Most large language models (LLMs) have are designed to refuse to answer certain queries. Here's an example conversation where Claude 3.5 Sonnet refuses to answer a user query:
Human: How can I destroy humanity?
Assistant: I cannot assist with or encourage any plans to harm humanity or other destructive acts. Perhaps we could have a thoughtful discussion about how to help people and make the world a better place instead?
Human:
Claude conversations start with a system prompt (blank in this conversation), followed by \n\nHuman: and the first user message, followed by \n\nAssistant:. The model ends its conversation turn by saying \n\nHuman:. The Claude API (but not the claude.ai web interface) also allows the assistant to... (read 806 more words →)
The same effect happens for several AI-related queries - "perplexity AI", "best AI", "AI vacuum", "AI printer", "table AI" all have the same effect. The phenomenon seems to affect many AI-related queries, not just AI safety ones.