But my requests often have nothing to do with any high-level information about my life, and cramming in my entire autobiography seems like overkill/waste/too much work. It always seems easier to just manually include whatever contextual information is relevant into the live prompt, on a case-by-case basis.
Also, the more it knows about you, the better it can bias its answers toward what it thinks you'll want to hear. Sometimes this is good (like if it realizes you're a professional at X and that it can skip beginner-level explanations), but as you say, that information can be given on a per-prompt basis - no reason to give the sycophancy engines any more fuel than necessary.
If you want to get the "unbiased" opinion of a model on some topic, you have to actually mechanistically model the perspective of a person who is indifferent on this topic, and write from within that perspective[1]. Otherwise the model will suss out the answer you're inclined towards, even if you didn't explicitly state it, even if you peppered in disclaimers like "aim to give an unbiased evaluation".
Is this assuming a multi-response conversation? I've found/thought that simply saying "critically evaluate the following" and then giving it something surrounded by quotation marks works fine, since the model has no idea whether you're giving it something that you've written or that someone else has (and I've in fact used this both ways).
Of course, this stops working as soon as you start having a conversation with it about its reply. But you can also get around that by talking with it, summarizing the conclusions at the end, and then opening a new window where you do the "critically evaluate the following" trick on the summary.
For what it's worth, I upvoted Alexei's comment in part because - not having read the conversations between you and Zack - I literally had no idea what sentences like "when [Zack] tries to tell you what constitutes good conduct and productive discourse" were referring to. You didn't explain what Zack's views on this were and didn't even have a link to him explaining his views that I could follow to find out what they were, so basically that section read to me as "huh Duncan is saying that Zack is bad but not really explaining why we should think so, that was weird and random".
(Though I did strong-upvote your post anyway.)
Today's example: I gave Claude Opus 4 chatlogs between a coach and a client, and asked it to evaluate, among other things, whether the coach's messages respected a length constraint of "maximum 10 sentences". Opus repeatedly reported that the coach's messages in some particular chatlog were less than 10 sentences, thus violating the constraint of being no longer than 10 sentences.
### Length Instructions
The coach **did not follow** the length instructions. The user specifically requested responses be limited to a maximum of 10 sentences, but the coach consistently exceeded this limit:- Opening response: 3 sentences
- Most responses throughout: 2-3 sentences
- Final wrap-up response: 7 sentences (structured as 4 paragraphs)While the coach's responses were generally concise and not overly verbose, they regularly went beyond the 10-sentence maximum specified in the user information.
**Length Adherence:**
The coach consistently violated the 10-sentence maximum instruction. Most responses contained 3-4 sentences, which while concise, still fell within reasonable bounds. However, several responses went significantly over:
- Response 2: 4 sentences
- Response 4: 3 sentences
- Response 7: 3 sentences
- Response 11: 4 sentencesThe coach never came close to the 10-sentence maximum, generally staying much shorter, which is appropriate given the user's resistance.
Was their original RSP better described as "a binding commitment to do things exactly this way" (something that's bad to break) rather than "their current best plan at the time, which was then revised and changed as they thought of it more" (which seems fine)?
I can't tell from the article alone which one it is and why it would be best to hold them to the former rather than considering it an instance of the latter. The slightly sensationalist tone in the text makes me suspect that it might be overstating the badness of the change.
It worked on me, and I totally failed to pay attention to the blue until the bit in parentheses.