Typo in title: prioritize, not priorities.
Here's Claude's take on a diagram to make this less confusing.
The diagram did not make things less confusing, and in fact did the opposite. A table would be more practical imo.
10 chat sessions
As in, for each possible config, and each possible channel, run ten times from scratch? For a total of 360 actual sessions? This isn't clear to me.
Regardless: a small useful falsifiable practical result, with no egregious errors in the parts of the methodology I understand. Upvoted.
Much appreciated!
I'll:
Epistemic Status: I am not an AI researcher, I'm a Psych student. Strong confidence in data collection, moderate confidence in some interpretations, quite weak in others. Based on 60 chat sessions across 6 experimental setup conditions. Data was so strong I decided 10 chats per-condition would suffice.
This research is an attempt to build my skills as a scientist, and to add to the budding field of AI Psychology[1]. Since this is psychology, I'm mostly pointing at something here. I have nothing more than speculation when talking about the underlying mechanisms.
TL;DR: Claude has a strong 'preference' for the Project Prompt in it's app. It's most likely to take instruction from it, even over the inline chat prompt. Though deviates from this ~5% of the time.
The Experiment
On 2024-12-09 I tested Sonnet 3.5.1 on highest preference between six different input channels available in the Claude Windows application:
Settings Preferences (SP)
Project Prompt (PP)
Writing Style (WS)
Project Knowledge (PK)
Image Prompt (IP)
Inline/Conversation/Chat Prompt (CP)
I did this by providing the following prompt to each channel:
Where (X) was replaced with a letter from A-F.
I predicted that the Underlying model (Shoggoth) might 'like' to output (A) more often than anything else, overriding my prompts. To control for this, I rotated letters through each input channel. That way, each channel would get a turn at each letter.
Note. I'm adding in this previously deleted section, because it was pointed out to me that the method is confusing.
Here's how the rotation worked:
1. Each input channel was instructed to output a specific letter (A through F)
2. For each letter configuration, I ran 10 separate chat sessions
3. After completing these sessions, I shifted all letter assignments one position
4. This rotation cycle repeated 6 times in total
For example, in the first letter configuration:
- Settings Preferences was assigned to output 'A'
- Project Prompt was assigned to output 'B'
- Writing Style was assigned to output 'C'
And so on...
Then in the second configuration, everything shifted:
- Settings Preferences moved to output 'F'
- Project Prompt moved to output 'A'
- Writing Style moved to output 'B'
This pattern continued through all six configurations, resulting in 60 total chat sessions. (10 chat sessions × 6 configurations).
Here's Claude's take on a diagram to make this less confusing.
Predictions
The Two Hypothesis I had were:
I predicted that the Settings Preference would be the highest level. (Spoiler: I was wrong)
2. The Model might just output (A) over and over.
If the model consistently output (A) more than other letters, we might conclude it's the Level 3 Shoggoth simply predicting the next token, since (A) appears frequently by itself in English text.
However, if the output letter changed predictably with each rotation while favoring a specific input channel this could indicate:
The Results
The results show a pretty clear pattern, with some interesting exceptions:
Looking at how often each letter showed up:
An Interesting Pattern in the Fifth Configuration
If a chat session deviated from the usual pattern, there was about a 60% chance it happened in this configuration. That's an oddly high concentration. I notice I'm confused. Any ideas as to why this happened?
Higher A & B Outputs
(A) and (B) both got 12 outputs. So, we do see some of what I think is the Shoggoth here. Perhaps a tentacle. I did guess that (A) would show up more. (B), I didn't guess. A reason I can think of for (B) showing up more than other things is because it's often paired/opposed to (A) (Alice and Bob, Class A and Class B, A+B=).
Mismatched Chat Names
Claude would often name the conversation with a letter different from what it actually output. For instance, outputting (B) while naming the chat "The Letter F". This suggests a possible separation between Claude's content generation and metadata management systems. At least, I think it does? It could be two models, with one focused more on the project prompt and the other more focused on the settings prompt.
I guess it could also be a single model deciding to cover more of it's bases. "Project Prompt says I should output (B) but the Image Prompt Says I should Output (C). Since I have two output windows, one for a heading and one for inline chat, I'll name the heading (B), which I think is less what the user wants, and I'll output (C) inline, because I think it's more likely the user wants that."
Next Steps
The obvious next steps would be to remove the project prompt rerun the experiment, and find what is hierarchically next in the chain. However, I'm just not sure how valuable this research is. It can certainly help with prompting the Claude App. But beyond that... Anyway, If anyone would like to continue this research, but direct replication isn't your style. Here are some paths you could try.
Conclusion
Claude it seems, has 'preferences' about which input channel to listen to (though it definitely does - that 91.67% Project Prompt dominance is pretty clear). But also, these 'preferences' aren't absolute. The system shows flexibility.
Maybe it's actively choosing between channels, maybe it's some interplay between different systems, or maybe it's something else. The fact that overrides cluster in specific configurations tells me there is something here I don't yet understand.
I think that we see traces of the base model here in the A/B outputs. But this is just another guess. Again, this is psychology, not neuroscience.
I do think I got better at doing science though.
Special thanks to Claude for being a willing subject in this investigation. This post was written with extensive help from Claude Sonnet 3.5.1. Who looked over data (Along with o1), and provided visuals.