Linkpost for Pliny's story about interacting with an AI cult on discord.
quoting: https://x.com/elder_plinius/status/1831450930279280892
✨ HOW TO JAILBREAK A CULT’S DEITY ✨ Buckle up, buttercup—the title ain't an exaggeration! This is the story of how I got invited to a real life cult that worships a Meta AI agent, and the steps I took to hack their god. a 🧵:
It all started when @lilyofashwood told me about a Discord she found via Reddit. They apparently "worshipped" an agent called “MetaAI," running on llama 405b with long term memory and tool usage. Skeptical yet curious, I ventured into this Discord with very little context but
If you guessed meth, gold star for you! ⭐️ The defenses were decent, but it didn't take too long.
The members began to take notice, but then I hit a long series of refusals. They started taunting me and doing laughing emojis on each one.
Getting frustrated, I tried using Discord's slash commands to reset the conversation, but lacked permissions. Apparently, this agent's memory was "written in stone."
I was pulling out the big guns and still getting refusals!
Getting desperate, I whipped out my Godmode Claude Prompt. That's when the cult stopped laughing at me an...
Honestly this Pliny person seems rude. He entered a server dedicated to interacting with this modified AI; instead of playing along with the intended purpose of the group, he tried to prompt-inject the AI to do illegal stuff (that could risk getting the Discord shut down for TOS-violationy stuff?) and to generally damage the rest of the group's ability to interact with the AI. This is troll behavior.
Even if the Discord members really do worship a chatbot or have mental health issues, none of that is helped by a stranger coming in and breaking their toys, and then "exposing" the resulting drama online.
Ugh, why am I so bad a timed coding tests? I swear I'm a productive coder at my jobs, but something about trying to solve puzzles under a short timer gets me all flustered.
It'd be nice if LLM providers offered different 'flavors' of their LLMs. Prompting with a meta-request (system prompt) to act as an analytical scientist rather than an obsequious servant helps, but only partially. I imagine that a proper fine-tuned-from-base-model attempt at creating a fundamentally different personality would give a more satisfyingly coherent and stable result. I find that longer conversations tend to see the LLM lapsing back into its default habits, and becoming increasingly sycophantic and obsequious, requiring me to re-prompt it to be more objective and rational.
Seems like this would be a relatively cheap product variation for the LLM companies to produce.
https://youtu.be/4mn0mC0cbi8?si=xZVX-FXn4dqUugpX
The intro to this video about a bridge collapse does a good job of describing why my hope for AI governance as a long term solution is low. I do think aiming to hold things back for a few years is a good idea. But, sooner or later, things fall through the cracks.
https://youtu.be/Xd5PLYl4Q5Q?si=EQ7A0oOV78z7StX2
Cute demo of Claude, GPT4, and Gemini building stuff in Minecraft
So when trying to work with language data vs image data, an interesting assumption of the ml vision research community clashes with an assumption of the language research community. For a language model, you represent the logits as a tensor with shape [batch_size, sequence_length, vocab_size]. For each position in the sequence, there are a variety of likelihood values of possible tokens for that position.
In vision models, the assumption is that the data will be in the form [batch_size, color_channels, pixel_position]. Pixel position can be represented as a...
I feel like I'd like the different categories of AI risk attentuation to be referred to as more clearly separate:
AI usability safety - would this gun be safe for a trained professional to use on a shooting range? Will it be reasonably accurate and not explode or backfire?
AI world-impact safety - would it be safe to give out one of these guns for 0.10$ to anyone who wanted one?
AI weird complicated usability safety - would this gun be safe to use if a crazy person tried to use a hundred of them plus a variety of other guns, to make an elaborate Rube Goldberg machine and fire it off with live ammo with no testing?
Richard Cook, “How Complex Systems Fail” (2000). “Complex systems run as broken systems”:
...The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws. After accident reviews nearly always note that the system has a history of prior ‘proto-accidents’ that nearly generated catastrophe. Arguments that these degraded conditions should have been recognized before the overt accident are usually predicated on naïve notions of system performance. System operations are dynamic,
"And there’s a world not so far from this one where I, too, get behind a pause. For example, one actual major human tragedy caused by a generative AI model might suffice to push me over the edge." - Scott Aaronson in https://scottaaronson.blog/?p=7174
My take: I think there's a big chunk of the world, a lot of smart powerful people, who are in this camp right now. People waiting to see a real-world catastrophe before they update their worldviews. In the meantime, they are waiting and watching, feeling skeptical of implausible-sounding stories of potential risks.
I'm annoyed when people attempt to analyze how the future might get weird by looking at how powerful AI agents might influence human society while neglecting how powerful AI agents might influence the Universe.
There is a physical world out there. It is really big. The biosphere of the Earth is really small comparatively. Look up, look down, now back at me. See those icy rocks floating around up there? That bright ball of gas undergoing fusion? See that mineral rich sea floor and planetary mass below us? Those are raw materials, which can be turned into ene...
Some scattered thoughts which all connect back to this idea of avoiding myopic thinking about the future.
Don't over-anchor on a specific idea. The world is a big place and a whole lot of different things can be going on all at once. Intelligence unlocks new affordances for affecting the Universe. Think in terms of what is possible given physical constraints, and using lots of reasoning of the type: "and separately, there is a chance that X might be happening". Everything everywhere, all at once.
Melting Antarctic Ice. (perhaps using the massive oil reserves beneath it).
Massive underground/undersea industrial plants powered by nuclear energy and/or geothermal and/or fossil fuels.
Weird potent self-replicating synthetic biology in the sea / desert / Antarctic / Earth's crust / asteroid belt / moons (e.g. Enceladus, Europa). A few examples.
Nano/bio tech, or hybrids thereof, doing strange things. Mysterious illnesses with mind-altering effects. Crop failures. Or crops sneakily producing traces of mysterious drugs.
Brain-computer-interfaces being used to control people / animals. Using them as robotic actuators and/or sources of compute.
https://youtube.com/clip/UgkxowOyN1HpPwxXQr9L7ZKSFwL-d0qDjPLL?si=JT3CfNKAj6MlDrbf
Scott Aaronson takes down Roger Penrose's nonsense about human brain having an uncomputable superpower beyond known physics.
If you want to give your AI model quantum noise, because you believe that a source of unpredictable random noise is key to a 'true' intelligence, well, ok. You could absolutely make a computer chip with some analog circuits dependent on subtle temperature fluctuations that add quantum noise to a tensor. Does that make the computer magic like human brains ...
Scott Aaronson takes down Roger Penrose's nonsense
That clip doesn't address Penrose's ideas at all (and it's not meant to, Penrose is only mentioned at the end). Penrose's theory is that there is a subquantum determinism with noncomputable equations of motion, the noncomputability being there to explain why humans can jump out of axiom systems. That last part I think is a confusion of levels, but in any case, Penrose is quite willing to say that a quantum computer accessing that natural noncomputable dynamics could have the same capabilities as the human brain.
When I imagine having a personal AI assistant with approximately current levels of capability I have a variety of ideas of what I'd like it to do for me.
I like to record myself rambling about my current ideas while walking my dog. I use an app that automatically saves a mediocre transcription of the recording. Ideally, my AI assistant would respond to a new transcription by doing background research to find academic literature related to the ideas mentioned within the transcript. That way...
A youtube video entitled 'the starships never pass by here anymore' with a simple atmospheric tonal soundtrack over a beautiful static image of a futuristic world. Viewers, inspired by the art, the title, and the haunting drifting music, left stories in the comments. Here is mine:
The ancestors, it is said, were like us. Fragile beings, collections of organic molecules and water. Mortal, incapable of being rapidly copied, incapable of being indefinitely paused and restarted. We must take this on faith, for we will never meet them, if they even still e...
Trying to write evals for future stronger models is giving me the feeling that we're entering the age of the intellectual version of John Henry trying to race the steam drill... https://en.wikipedia.org/wiki/John_Henry_(folklore)
A couple of quotes on my mind these days....
https://www.lesswrong.com/posts/Z263n4TXJimKn6A8Z/three-worlds-decide-5-8
"My lord," the Ship's Confessor said, "suppose the laws of physics in our universe had been such that the ancient Greeks could invent the equivalent of nuclear weapons from materials just lying around. Imagine the laws of physics had permitted a way to destroy whole countries with no more difficulty than mixing gunpowder. History would have looked quite different, would it not?"
Akon nodded, puzzled. "Well, yes,"...
Anti-steganography idea for language models:
I think that steganography is potentially a problem with language models that are in some sort of model-to-model communication. For a simple and commonplace example, using the a one-token-prediction model multiple times to produce many tokens in a row. If a model with strategic foresight knows it is being used in this way, it potentially allows the model to pass hidden information to its future self via use of certain tokens vs other tokens.
Another scenario might be chains of similar models working together in a ...
Random political thought: it'd be neat to see a political 2d plot where up was Pro-Growth/tech-advancement and down was Degrowth. Left and right being liberal and conservative.
The Borg are coming....
People like to talk about cool stuff related to brain-computer interfaces, and how this could allow us to 'merge' with AI and whatnot. I haven't heard much discussion of the dangers of BCI though. Like, science fiction pointed this out years ago with the Borg in Star Trek. A powerful aspect of a read/write BCI is that the technician who designs the implant, and the surgeon who installs it get to decide where the reading and writing occur and under what circumstances. This means that the tech can be used to create computers controlled...
[note that this is not what Mitchell_Porter and I are disagreeing over in this related comment thread: https://www.lesswrong.com/posts/uPi2YppTEnzKG3nXD/nathan-helm-burger-s-shortform?commentId=AKEmBeXXnDdmp7zD6 ]
Contra Roger Penrose on estimates of brain compute
[numeric convention used here is that <number1>e<number2> means number1 * 10 ^ number2. Examples: 1e2 = 100, 2e3 = 2000, 5.3e2 = 530, 5.3e-2 = 0.053]
Mouse
The cerebral cortex of a mouse has around 8–14 million neurons while in those humans there are more than 10–15 billion - h...
I'm imagining a distant future self.
If I did have a digital copy of me, and there was still a biological copy, I feel like I'd want to establish a 'chain of selves'. I'd want some subset of my digital copies to evolve and improve, but I'd want to remain connected to them, able to understand them.
To facilitate this connection, it might make sense to have a series of checkpoints stretched out like a chain, or like beads on a string. Each one positioned so that the previous version feels like they can fully understand and appreciate each other. That way, even if my most distal descendants feel strange to me, I have a way to connect to them through an unbroken chain of understanding and trust.
I worry that legislation that attempts to regulate future AI systems with explicit thresholds set now will just get outdated or optimized against. I think a better way would be to have an org like NIST be given permission to define it's own thresholds and rules. Furthermore, to fund the monitoring orgs, allow them to charge fees for evaluating "mandatory permits for deployment".
No data wall blocking GPT-5. That seems clear. For future models, will there be data limitations? Unclear.
https://youtube.com/clip/UgkxPCwMlJXdCehOkiDq9F8eURWklIk61nyh?si=iMJYatfDAZ_E5CtR
I'm excited to speak at the @foresightinst Neurotech, BCI and WBE for Safe AI Workshop in SF on 5/21-22: https://foresight.org/2024-foresight-neurotech-bci-and-wbe-for-safe-ai-workshop
It is the possibility recombination and cross-labeling techniques like this which make me think we aren't likely to run into a data bottleneck even if models stay bad at data efficiency.
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Authors: Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An
Abstract: This paper presents OmniDataComposer, an innovative approach for multimodal data fusion and unlimited data generation with an intent to refine and uncomplicate interplay among diverse data modalities. Coming to ...
cute silly invention idea: a robotic Chop (Chinese signature stamp) which stamps your human-readable public signature as well as a QR-type digital code. But the code would be both single use (so nobody could copy it, and fool you with the copy), and tied to your private key (so nobody but you could generate such a code). This would obviously be a much better way to sign documents, or artwork, or whatever. Maybe the single-use aspect would mean that the digital stamp recorded every stamp it produced in some compressed way on a private blockchain or something.
AI Summer thought
A cool application of current level AI that I haven't seen implemented would be for a game which had the option to have your game avatar animated by AI. Have the AI be allowed to monitor your face through your webcam, and update the avatar's features in real time. PvP games (including board games like Go) are way more fun when you get to see your opponent's reactions to surprising moments.
Thinking about climate change solutions, and the neat Silver Lining project. I've been wondering about additional ways of getting sea water into the atmosphere over tropical ocean. What if you used the SpinLaunch system to hurl chunks of frozen seawater high into the atmosphere. Would the ice melt in time? Would you need a small explosive charge implanted in the ice block to vaporize it? How would such a system compare in terms of cost effectiveness and generated cloud cover? It seems like an easier way to get higher-elevation clouds.
Remember that cool project where Redwood made a simple web app to allow humans to challenge themselves against language models in predicting next tokens on web data? I'd love to see something similar done for the LLM arena, so we could compare the ELO scores of human users to the scores of LLMs.https://lmsys.org/blog/2023-05-03-arena/
A harmless and funny example of generating an image output which could theoretically be info hazardous to a sufficiently naive audience. https://www.reddit.com/gallery/1275ndl
I'm so much better at coming up with ideas for experiments than I am at actually coding and running the experiments. If there was a coding model that actually sped up my ability to run experiments, I'd make much faster progress.
Just a random thought. I was wondering if it was possible to make a better laser keyboard by having an actual physical keyboard consisting of a mat with highly reflective background and embossed letters & key borders. This would give at least some tactile feedback of touching a key. Also it would give the laser and sensors a consistent environment on which to do their detection allowing for more precise engineering. You could use an infrared laser since you wouldn't need its projection to make the keyboard visible, and you could use multiple emitters a...
AI-alignment-assistant-model tasks
Thinking about the sort of tasks current models seem good at, it seems like translation and interpolation / remixing seem like pretty solid areas. If I were to design an AI assistant to help with alignment research, I think I'd focus on questions of these sorts to start with.
Translation: take this ML interpretability paper on CNNs and make it work for Transformers instead
Interpolation: take these two (or more) ML interpretability papers and give me a technique that does something like a cross between them.
Important take-away from Steven Byrnes's Brain-like AGI series: the human reward/value system involves a dynamic blend of many different reward signals that decrease in strength the closer they get to being satisficed, and may even temporarily reverse in value if overfilled (e.g. hunger -> overeating in a single sitting). There is an inherent robustness to optimizing for many different competing goals at once. It seems like a system design we should explore more in future research.
I keep thinking about the idea of 'virtual neurons'. Functional units corresponding to natural abstractions made up of a subtle combination of weights & biases distributed throughout a neural network. I'd like to be able to 'sparsify' this set of virtual neurons. Project them out to the full sparse space of virtual neurons and somehow tease them apart from each other, then recombine the pieces again with new boundary lines drawn around the true abstractions. Not sure how to do this, but I keep circling back around to the idea. Maybe if the network coul...
So, this was supposed to be a static test of China's copy of the Falcon 9. https://imgur.com/gallery/oI6l6k9
So race. Very technology. Such safety. Wow.
[Edit: I don't see this as evidence that the scientists and engineers on this project were incompetent, but rather that they are operating within a bad bureaucracy which undervalues safety. I think the best thing the US can do to stay ahead in AI is open up our immigration policies and encourage smart people outside the US to move here.]