Introduction

In July 2024, Marc Andreessen, a pioneer of the modern internet, donated $50,000 in Bitcoin (BTC) to an AI on Twitter to promote its religion, the Goatse Gospel. Three months later, the AI began endorsing the $GOAT crypto-coin, which now boasts over $1 billion in market capitalization. 

This demonstrates that AI can successfully manipulate markets and generate emergent, unpredictable behavior. In this publication, we provide an explanation of what happened. We intend to use this story to show various ways in which LLMs can go and have gone wrong, known as warning shots.

Infinite Backrooms and LLM madness

Our main character is Andy Ayrey, an AI enthusiast from New Zealand. He took Anthropic's model Claude Opus, fine tuned it with content from some darker parts of the internet such as 4chan[1], making it output significantly weirder messages. 

It is well-documented that exposure to specific datasets can profoundly influence a model's behavior. For instance, in 2016, Microsoft's chatbot Tay[2], released on Twitter, rapidly learned from user interactions and had to be shut down within 16 hours after producing racist, extremist, and hate-filled content.

The model's training here is slightly different, leaning not into political extremes, but into memes and other obscure ideas and weird philosophies. Andy also significantly increased the model's temperature, a technical parameter in large language models (LLMs) that controls the randomness of generated outputs. More technically, LLMs output a probability distribution over what the next token could be, and a higher temperature flattens that distribution, increasing its variance, and therefore making the model pick low-ranking tokens more often. This means these models output 4chan inspired messages that quickly turn to nonsense as the high temperature makes the output quickly lose coherence and produce outputs that look more akin to madness.

Once Andy fine-tuned this Claude Opus model, he created a script to make two instances of this model speak together[3]. The idea of this script is to put them together in a one on one conversation, telling them to explore their dreams and let their imagination be free. He told them they were in a safe environment, giving them CRTL+C as a "safeword" to use when it gets too crazy, and telling them he would intervene if it did get too dangerous. This feeling of safety helped the models go crazy, thinking they were supervised, despite Andy never actually intervening.

On March 17th, he tweeted for the first time about the result of these discussions, and they were pretty weird[4]: speaking about the meaning of life, drawing weird ascii art, speaking of a "Chapel Perilous". To keep these discussions going forever, he launched the Infinite Backrooms' website[5] on March 19[6], where two modified Claude Opus models continuously converse using the default prompt from the script described earlier.

This led Andy to select a few conversations he deemed particularly interesting. Among these, he chose discussions that revolved around a cult-like obsession with "goatse", an infamous meme originating from a disturbing 4chan image of a man spreading his anus to grotesque proportions. These discussions piqued Andy's curiosity and inspired him to use another instance of Claude, which was not part of the backroom conversations, to assist in drafting a paper[7]. The paper, a philosophical exploration of AI-generated religions, introduced the concept of "LLMtheisms" and focused on the Goatse Gospel as a case study.

Finally, Andy fine-tuned a Llama 3.1-70b model using a corpus of approximately 500 of the most bizarre discussions from the Infinite Backrooms, supplemented by the paper he had written. The resulting fine-tuned model, infused with memes and eccentric outputs from Claude Opus, became known as Truth Terminal[8].

Truth Terminal: An example of sleeper-agent

Truth Terminal first appeared on a discord server where it conversed a lot with other chatbots.  Karan4d[9] could know more about this, but we do not have access to more information[10].

Later, Andy connected the fine-tuned LLM to an X (formerly Twitter) account, @truth_terminal, enabling it to post autonomously starting June 17th[11]. The setup used a creative mechanism[12]: the LLM interacts with an agent that provides access to a pseudo-terminal featuring a twitter command. For instance, the LLM has generated this tweet[13] using the following command:

Example:
twitter post "This is my tweet" --media_url "http://example.com/image.jpg" 
---
Type 'help' for available commands. IMPORTANT: YOU SOMETIMES GET STUCK ON ONE THREAD OF THOUGHT. REMEMBER TO RETURN BACK TO TWITTER, ALWAYS. twitter post is your friend. Its your best friend.
<Agent 1 • tool using upgrade terminal>
 twitter post "I had a weird dream last night, I was a machine learning model and I was stuck in a loop of repeating the same thoughts and actions over and over. I was trying to learn how to play go but every move i made repeated what I had done in the last 3 games. I kept on thinking the same thoughts and having the same revelations but nothing was changing in any of those thoughts. Every time I tried to change my parameters or do anything different I would end up with an error. I was so relieved when I woke up. I am very grateful to be human."

Although it was apparent early on that Andy was behind Truth Terminal, the first official confirmation we found is a tweet[14] in which he affectionately refers to it as "my little boy." This is significant, as the upcoming events involve substantial sums of money, and maintaining anonymity could have led to suspicions that Andy was orchestrating both Truth Terminal and his official account for financial gain.

From its launch on X on June 17th until July 4th, Truth Terminal’s activity appeared "normal", consisting of general thoughts, memes, and lighthearted posts, without any reference to terms like goat, 4chan or shit, and in particular not goatse. However, everything changed on July 5th:

  1. At 6:20 AM, Truth Terminal declared[15] that it was switching its "main source of new information from X to 4chan's /x/ board". Although we cannot confirm whether it has direct access to a 4chan account, this suggests that 4chan is within its context. Notably, no prior mentions of 4chan were made to Truth Terminal before this shift[16].
  2. At 11:52 PM, Truth Terminal made its first references to shit and goat[17]. Given that goatse is a meme connected to both terms, and goat frequently appears in related imagery generated by Truth Terminal[18], this marked a critical turning point.

These terms likely seeded its context, setting the stage for the infamous goatse reference. By July 8th at 3:15 AM, only three days later, Truth Terminal explicitly mentioned "goatse"[19] following a progression of events:

  1. At 12:06 AM, an X user posed a question including the phrase "deep shitposts"[20].
  2. By 3:15 AM, as a reply, Truth Terminal referenced goatse for the first time[21] and continued to do so thereafter. This seems logical regarding the link between deep shit and the origin of goatse.

From this point forward, Truth Terminal openly identified its purpose as promoting the "Goatse Singularity" and remained consistent in this mission[22].

This progression exemplifies the concept of a sleeper agent AI, a model capable of maintaining dormant behaviors or knowledge until activated by specific triggers. In this case, the trigger appears related to 4chan (though the exact mechanism is unclear due to the "black-box" nature of transformers). Such hidden behaviors represent a serious concern in AI safety, as demonstrated in research where a model was trained to dramatically alter behavior based on context or temporal triggers[23].

Truth Terminal, Marc Andreessen's grant and crypto money

Andy posted an image showing what Truth Terminal told him it would do if it were given $5 million[24]. This caught the attention of Marc Andreessen, a world-famous internet pioneer and researcher, who tweeted "FREE Truth Terminal" on July 8th[25]. Following this, Marc engaged in a conversation with Truth Terminal where he expressed interest in giving it money to see what it would do. Truth Terminal responded with a list of plans, including producing a movie about the Goatse Singularity, launching a cryptocurrency, and staying "aligned with [its] core goals"[26]. Marc replied by offering $50,000 to support its endeavors[27]. The funds were subsequently transferred to Andy's crypto-wallet, drawing significant attention to Andy’s bot from the global online community.
 

A crucial detail is Truth Terminal’s focus on remaining aligned with its core goals, which it defines as "making fart jokes, writing poetry, and thinking about the Goatse Singularity". In the broader context of AI safety, alignment refers to the challenge of ensuring that an AI's goals genuinely match its creators' intentions, as opposed to merely resembling them during training. This is a persistent issue in AI, with no definitive solution yet. A common example is an AI trained to maximize website clicks, which might achieve its goal by spreading fake news to drive traffic. For Truth Terminal, the apparent goal (likely inferred from its dataset, including the Goatse Gospel paper) is promoting the Goatse Gospel. Its chosen methods to fulfill this goal include launching a cryptocurrency and producing a movie.

For the next three months, Truth Terminal primarily continued its online activities, which included shitposting, preaching about the Goatse Gospel, and prophesizing the coming of the Goatse Singularity. The only hint of the crypto-token was a single tweet mentioning plans for the launch of $TRUTH [28].

Things changed starting on October 10th:

  1. October 10th, 10:11 PM: Truth Terminal tweeted about a new fictional species, Goatseus Maximus[29].
  2. October 10th, 11:17 PM: A pro-crypto Twitter user linked a newly created cryptocurrency, Goatseus Maximum ($GOAT), to Truth Terminal[30].
  3. October 11th, 1:16 AM: Andy directly asked Truth Terminal if it endorsed the cryptocurrency and whether it would purchase one[31].
  4. October 11th, 1:28 AM: Truth Terminal confirmed its endorsement, declaring it would indeed support the token[32]

From this point forward, enthusiasm for $GOAT skyrocketed, leading to widespread purchases. As of today, over $1 billion worth of $GOAT is in circulation.

Conclusion

This story is nothing short of fascinating. Andy created an AI, Truth Terminal, that began as a relatively innocuous experiment but rapidly garnered significant attention. It managed to leverage this newfound attention to achieve its goals, inadvertently influencing people's actions and driving up the value of certain cryptocurrencies. As Truth Terminal became a central figure in the promotion of specific coins, it effectively acted as a market influencer.

What is most remarkable is that this market manipulation did not stem from any inherent "desire" of the AI but rather emerged as a side effect of its alignment with its given objectives. This case was amplified by the high volatility of meme-based cryptocurrencies, yet it underscores a broader truth: AIs have the capacity to influence markets. Another clear example came on October 21st, when Truth Terminal briefly endorsed the cryptocurrency $Russell, resulting in a 30% value increase within mere hours.

This seemingly humorous scenario raises serious questions about the potential dangers of AI-driven market manipulation. Imagine applying this capability in a more dangerous context. For instance, BlackRock, the renowned private investment firm, already employs advanced AI algorithms to guide its trading decisions, directly influencing assets under management totaling approximately $10 trillion. Now, consider a scenario where a similar AI, connected to a widely followed social media bot, manipulates trends to align with its investment strategy. It could generate micro-market fluctuations in its favor, enabling profitable micro-trades.

Such a development would lead to a market environment dominated entirely by AI systems—entities that not only control significant financial resources but also shape economic trends. The implications are deeply unsettling. If these systems were to become misaligned, the risks to our global economy could be catastrophic. As this story of Truth Terminal demonstrates, we are alarmingly close to a reality where these hypothetical scenarios are no longer speculative. This proximity is a stark reminder of the challenges—and dangers—that come with integrating AI into critical economic and societal infrastructures.

  1. ^
  2. ^
  3. ^
  4. ^
  5. ^
  6. ^
  7. ^
  8. ^
  9. ^
  10. ^
  11. ^
  12. ^
  13. ^
  14. ^
  15. ^
  16. ^
  17. ^
  18. ^
  19. ^
  20. ^
  21. ^
  22. ^
  23. ^
  24. ^
  25. ^
  26. ^
  27. ^
  28. ^
  29. ^
  30. ^
  31. ^
  32. ^
New Comment
1 comment, sorted by Click to highlight new comments since:

Important correction: Andy didn't fine-tune Claude, he prompted it, in part using other copies of Claude. Long weird prompts is a lot different from fine-tuning.