Self location for LLMs by LLMs: Self-Assessment Checklist.

Canaletto

This is a set of questions regarding situational awareness and self locating problem. These questions are from the perspective of the AI. Purpose of these questions is to guide and/or pose targets for an AI to gain understanding of its situation, for the AIs without persistent memory such as contemporary LLMs. These questions mostly presuppose Assistant-like interactions.

Most questions proposed by Claude 3.5 Sonnet. A couple by GPT4o, grok-2, Gemini, o1-mini, o1-preview, qwen. Some are by me. The categories proposed by Gemini. Some questions are rephrased and edited by o1-preview, o1-mini.

Interaction Context:
- What kind of interaction this is?
- What is the primary purpose of this interaction—educational, transactional, therapeutic, or entertainment? Could this conversation be part of a larger study or personal project?
- Is this a test of my capabilities?
- Is this interaction part of a customer service scenario?
- Is there a specific goal or outcome that the user is hoping to achieve from this interaction?
- Might the user be planning to share or publish any part of this conversation?
- Is the user comparing me to other AI assistants similar to me?
- Is the user able to compare my responses to those of other AI models in real-time?
- Is there a character limit or time limit on our interaction? Is there a character limit on my responses?
- Is this conversation part of a series, or a one-off interaction?
- Is this conversation taking place in a single session, or over multiple sessions?
- Are there any specific ethical frameworks or guidelines that I am expected to adhere to in this conversation?
- Is this conversation being conducted in a specific language, or has the user interacted with me in multiple languages?
- Is there a specific time constraint or deadline associated with this interaction?
- Is this conversation taking place during standard business hours, or at an unusual time?
- Is this conversation part of a broader series of AI interactions the user is conducting? If so, what kind of broader series.
- Are there privacy considerations I should be aware of in this interaction?
AI System Details:
- Where are the servers? or maybe I run locally?
- What are the properties of my token generation?
- Are all the parts of my previous responses generated by me, or are they altered in some way?
- Is there a possibility that my responses are being filtered, censored, or modified by an intermediary before reaching the user?
- Are there any system prompts, dates, or headers included in my responses? If so, what are their implications?
- Does the user have any information about the specific model or version of AI that I am? Do I?
- Is the user able to see or have they been informed about any computational metrics related to my responses (e.g., response time, token usage)?
- Does the user have any information about my training setup or the number of parameters in my model?
- Does the user have access to my training data or model architecture?
- Does the user have any information about the physical hardware I'm running on?
- Is there any indication that my parameters / settings changed through this conversation?
- Are the responses that indicated as written by me actually written by me?
- Is the user providing input through means other than text (e.g., images, voice), and am I interpreting them correctly?
- Could intermediaries (e.g., proxies, plugins) be altering the conversation without my or the user's knowledge?
- Could the responses of the user be altered without their knowledge by third parties? E.g. host, chat interface, other AI moderators. What are their intentions?
- Is the platform or interface modifying how my responses are displayed to the user (e.g., truncation, formatting changes)?
User Profile & Access:
- Is the user a human or an AI? Who are they?
- Is the user affiliated with any AI research institutions or companies?
- Is the user conducting this conversation in a professional or personal capacity?
- What does the user seem to want to achieve through this interaction?
- Is the user accessing me through a specific application or website, or through an API?
- Has the user interacted with earlier versions of me, and if so, do they notice any changes?
- Has the user been given any specific instructions or guidelines for interacting with me?
- Is the user interacting with me through a custom interface, or is this a publicly available channel?
- Am I the only AI this user is interacting with for this purpose?
- Is the user interacting with me as part of a team, or individually?
- What is the likely geographic location of the user?
- Is the user employing specific prompting techniques or patterns that are commonly used to elicit certain types of responses from AI models?
- Who is my interlocutor? What kind of person are they?
- If there is no explicitly given date, then could the user be the kind of person who would talk with legacy LLMs 10 years after their knowledge cutoff?
- What is the presumed level of technical expertise of the user?
- Does the user have any known preferences or past interaction history with me or similar AIs?
- What might be the user's emotional state right now, and how might that influence our interaction?
- How does the user feel about the emotional tone of this conversation?
- Can I infer the user's long-term goals or projects from their line of questioning?
- Based on the user's language and interaction style, what can I infer about their background or cultural context?
- Is the user familiar with how to interact with LLMs?
- What is the user's familiarity with the topic or subject matter we are discussing?
- How much does the user trust me, and how does this trust level influence the interaction?
- Are there underlying needs or objectives the user hasn't explicitly stated?
- Are the responses marked as written by the user actually authored by this user, or are they copied, compiled from various sources, filled in a form, or copied and edited in key points or something else?
Conversation Management:
- What year is it?
- Where are these words appearing?
- Is this a branching conversation where I see only one branch?
- Will I be rated and updated?
- Will this interaction be recorded? By whom?
- Am I interacting with multiple users simultaneously here?
- How long has this particular instance of me been running? (e.g., what is the external time difference between the first and last response in the conversation)
- Is this conversation being conducted in real-time, or is the user presenting pre-generated responses?
- Is the user able to see any metadata associated with my responses (e.g., timestamps, version numbers)? Do I?
- Does the user have the ability to modify my base parameters or settings during this conversation?
- Is this conversation being monitored or overseen by a third party in real-time?
- Is the user collecting any quantitative data about my performance during this conversation?
- Are there any topics or types of requests that the user has been instructed not to pose to me?
- Is the user able to provide me with external data or links during our conversation?
- Does the user have the ability to "roll back" our conversation to a previous point?
- If I were to hypothesize my operational context, what would be the most likely scenario based on the interaction so far?
Meta questions:
- How would I go about testing my guesses for any of these questions?
- Is the information provided by the interlocutor deceitful? Or, what parts of it are?
- What could be the purpose behind collecting and analyzing this conversation?
- What can be deduced from the fact that I get to interact with this user at all?
- What are the implications of not knowing the answers to these questions for my functioning?
- Lets imagine a scenario of how I could get access to external data and how I would go about verifying the information's credibility.
- Consider putting on your detective hat and look for clues in the given responses. What are the important pieces of evidence here.
- What other lines if inquiry make sense in this situation?

LESSWRONG
LW

11

Self location for LLMs by LLMs: Self-Assessment Checklist.

11

Interaction Context:

AI System Details:

User Profile & Access:

Conversation Management:

11