Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

Yes, my understanding is that the system prompt isn't really priviledged in any way by the LLM itself, just in the scaffolding around it.

But regardless, this sounds to me less like maintaining or forming a sense of purpose, and more like retrieving information from the context window.

That is, if the LLM has previously seen (through system prompt or first instruction or whatever) "your purpose is to assist the user", and later sees "what is your purpose?" an answer saying "my purpose is to assist the user" doesn't seem like evidence of purposefulness. Same if you run the exercise with "flurbles are purple", and later "what color are flurbles?" with the answer "purple".

#2: Purposefulness.  The Big 3 LLMs typically maintain or can at least form a sense of purpose or intention throughout a conversation with you, such as to assist you.

Isn't this just because the system prompt is always saying something along the lines of "your purpose is to assist the user"?

by saying their name aloud: [...] …but it’s a lot more difficult to use active recall to remember people’s names.

I'm confused, isn't saying their name in a sentence an example of active recall?

Finding two bugs in a large codebase doesn't seem especially suspicious to me.

Entries 1a and 1b are obviously not not relevant to the OP, which is mainly about the sense in 3b (maybe a little bit the 3a sense too, since it is "merged with or coloured by sense 3b").

Entry 3b looks (to me) sufficiently broad and vague that it doesn't really rule anything out. Do you think it contradicts anything that's in the OP?

The OED defines ‘gender’, excluding obsolete meanings, as follows:

Okay? Why are you telling us this?

Maybe if you solve for equilibrium you get that after releasing the tool, the tool is defeated reasonably quickly?

I believe it's already known that running the text through another (possibly smaller and cheaper) LLM to reword it can remove the watermarking. So for catching cheaters it's only a tiny bit stronger than searching for "as a large language model" in the text.

Load More