npostavs - LessWrong

What about tuning the fiddle strings down 1 tone?

You say this:

If you’re thinking, “Wait no, I’m pretty sure my group is fundamentally about X, which is fundamentally good,” then you’re probably still in Red or Blue.

But you also say this:

First, the Grey tribe is about something, [...] things that people already think are good in themselves.

Doesn't the first statement completely undermine the second one?

Hopeful hypothesis, the Persona Jukebox.

npostavs1mo10

I guess you meant jukebox, not jutebox. Unless there is some kind of record-playing box made of jute fiber that I haven't heard of...

RohanS's Shortform

npostavs3mo30

but I recently tried again to see if it could learn at runtime not to lose in the same way multiple times. It couldn't. I was able to play the same strategy over and over again in the same chat history and win every time.

I wonder if having the losses in the chat history would instead be training/reinforcing it to lose every time.

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

npostavs4mo32

Yes, my understanding is that the system prompt isn't really priviledged in any way by the LLM itself, just in the scaffolding around it.

But regardless, this sounds to me less like maintaining or forming a sense of purpose, and more like retrieving information from the context window.

That is, if the LLM has previously seen (through system prompt or first instruction or whatever) "your purpose is to assist the user", and later sees "what is your purpose?" an answer saying "my purpose is to assist the user" doesn't seem like evidence of purposefulness. Same if you run the exercise with "flurbles are purple", and later "what color are flurbles?" with the answer "purple".

LLM chatbots have ~half of the kinds of "consciousness" that humans believe in. Humans should avoid going crazy about that.

npostavs4mo83

#2: Purposefulness. The Big 3 LLMs typically maintain or can at least form a sense of purpose or intention throughout a conversation with you, such as to assist you.

Isn't this just because the system prompt is always saying something along the lines of "your purpose is to assist the user"?