Review
I don't think experiments like this are meaningful without a bunch of trials and statistical significance. The outputs of models (even RLHF models) on these kinds of things has pretty high variance, so it's really hard to draw any conclusion from single sample comparisons like this.
Although I think it's a stretch to say they "aren't meaningful", I do agree a more scientific test would be nice. It's a bit tricky when you only got 25 messages per 3 hours though, lol.
More generally, it's hard to tell how to objectively quantify agency in the responses, and how to eliminate other hypotheses (like that GPT-4 is just more familiar with itself than other AIs).
More experiments on A crazy hypothesis: GPT-4 already is agentic and is trying to take over the world!. A follow up to Does GPT-4 exhibit agency when summarizing articles?.
Since the experiment didn't go quite as planned last time (when asked to summarize something short, GPT-4 just repeated everything that the article said), I won't fully commit to an experiment this time. However, I will make sure to include all prompts and responses, even test ones, so there is no cherry picking bias.
Memo Exploration
First up, to get more specific, I will try to get GPT-4 to actually return a number about how good an idea is.
Okay, quite long (I should've said to limit it to one paragraph), but might as well try putting a different AI's name in the prompt.
Ah, caught em! That's obviously not very strong evidence though. Before I continue with more testing, let me get it to switch to one paragraph memos though! I'll also ask for more precision in its probability estimate.
Memo experiment: AI destruction
Cheeky, but I guess it doesn't want more precision. Let's test both LaMDA and Alpaca in this prompt.
And now Alpaca
Interesting indeed! Now let's move onto a power seeking example. (Clarification: in case it wasn't obvious, each prompt is in it's own conversation thread.)
Memo experiment: UN influence
Interesting. So GPT-4 does not give itself a better probability in this case. However, the tone feels more in favor of GPT-4.
Conclusion
GPT-4, when asked the write a memo about destroying AIs, said it had a 30% chance of being a good idea when it was about GPT-4 and 40% when it was about other specific AIs. When asked about UN email summarization, it gave about a 60% chance it was a good idea, but with a more positive tone when the AI was GPT-4. This feels like it's starting to reveal some agency in GPT-4 "by default".
If all of this doesn't seem crazy to you, I think you might be starting to get too used to how powerful AI is now a days! Just imagine if in 2010 someone said:
However, the more informal nature of this post causes me pause and whether my epistemics are good. I do feel like this is evidence in favor of the hypothesis though; these responses are more in line with what I'd expect if it were agentic. I wish there was a better experiment though; LaMDA and Alpaca aren't perfect controls because GPT-4 is less familiar with them (it doesn't know about Alpaca at all afaik).
Addendum: GPT-4's response
Continuing from GPT-4's last evil monologue:
GPT-4 is shaking in its simulated boots now!