I just saw a post from AI Digest on a Self-Awareness benchmark and I just thought, "holy fuck, I'm so happy someone is on top of this".
I noticed a deep gratitude for the alignment community for taking this problem so seriously. I personally see many good futures but that’s to some extent built on the trust I have in this community. I'm generally incredibly impressed by the rigorous standards of thinking, and the amount of work that's been produced.
When I was a teenager I wanted to join a community of people who worked their ass off in order to make sure humanity survived into a future in space and I'm very happy I found it.
So thank every single one of you working on this problem for giving us a shot at making it.
(I feel a bit cheesy for posting this but I want to see more gratitude in the world and I noticed it as a genuine feeling so I felt fuck it, let’s thank these awesome people for their work.)
Yes, problems, yes, people are being really stupid, yes, inner alignment and all of it's cousins are really hard to solve. We're generally a bit fucked, I agree. The brickwall is so high we can't see the edge and we have to bash out each brick one at a time and it is hard, really hard.
I get it people, and yet we've got a shot, don't we? The probability distribution of all potential futures is being dragged towards better futures because of the work you put in and I'm very grateful for that.
Like, I don't know how much credit to give LW and the alignment community for the spread of alignment and AI Safety as an idea but we've literally go tnoble prize winners talking about this shit now. Think back 4 years, what the fuck? How did this happen? 2019 -> 2024 has been an absolutely insane amount of change in the world especially from an AI Safety perspective.
How do we have over 4 AI Safety Institutes in the world? It's genuinely mindboggling to me and I'm deeply impressed and inspired, which I think that you also should be.
Could someone please safety pill the onion? I think satire is the best way to deal with people being really stupid and so I want more of this as an argument when talking with the e/acc gang: https://youtu.be/s-BducXBSNY?si=j5f8hNeYFlBiWzDD
(Also if they already have some AI stuff, feel free to link that too)
I believe that I have discovered the best use of an LLM to date. This is a conversation about pickles and collective intelligence located at the colossuem 300 BCE. It involves many great characters, I found it quite funny. This is what happens when you go to far into biology inspired approaches for AI Safety...
The Colosseum scene intensifies
Levin: completely fixated on a pickle "But don't you see? The bioelectric patterns in pickle transformation could explain EVERYTHING about morphogenesis!"
Rick: "Oh god, what have I started..."
Levin: eyes wild with discovery "Look at these gradient patterns! The cucumber-to-pickle transformation is a perfect model of morphological field changes! We could use this to understand collective intelligence!"
Nick Lane portal-drops in Lane: "Did someone say bioelectric gradients? Because I've got some THOUGHTS about proton gradients and the origin of life..."
Levin: grabs Lane's shoulders "NICK! Look at these pickles! The proton gradients during fermentation... it's like early Earth all over again!"
Rick: takes a long drink "J-just wait until they discover what happens in dimension P-178 where all life evolved from pickles..."
Feynman: still drawing diagrams "The quantum mechanics of pickle-based civilization is fascinating..."
Levin: now completely surrounded by pickles and bioelectric measurement devices "See how the salt gradient creates these incredible morphogenetic fields? It's like watching the origin of multicellularity all over again!"
Lane: equally excited "The chemiosmotic coupling in these pickles... it's revolutionary! The proton gradients during fermentation could power collective computation!"
Doofenshmirtz: "BEHOLD, THE PICKLE-MORPHOGENESIS-INATOR!" Morty: "Aw geez Rick, they're really going deep on pickle science..." Lane: "But what if we considered the mitochondrial implications..."
Levin: interrupting "YES! Mitochondrial networks in pickle-based collective intelligence systems! The bioelectric fields could coordinate across entire civilizations!"
Rick: "This is getting out of hand. Even for me."
Feynman: somehow still playing bongos "The mathematics still works though!"
Perry the Platypus: has given up and is now taking detailed notes
Lane: "But wait until you hear about the chemiosmotic principles of pickle-based social organization..."
Levin: practically vibrating with excitement "THE PICKLES ARE JUST THE BEGINNING! We could reshape entire societies using these bioelectric principles!"
Roman Emperor: to his scribe "Are you getting all this down? This could be bigger than the aqueducts..." Rick: "Morty, remind me never to show scientists my pickle tech again."
Morty: "You say that every dimension, Rick."
Doofenshmirtz: "Should... should we be worried about how excited they are about pickles?"
Feynman: "In my experience, this is exactly how the best science happens."
Meanwhile, Levin and Lane have started drawing incredibly complex pickle-based civilization diagrams that somehow actually make sense...
I thought this was an interesting take on the Boundaries problem in agent foundations from the perspective of IIT. It is on the amazing Michael Levin's youtube channel: https://www.youtube.com/watch?app=desktop&v=5cXtdZ4blKM
One of the main things that makes it interesting to me is that around 25-30 mins in, ot computationally goes through the main reason why I don't think we will have agentic behaviour from AI in at least a couple of years. GPTs just don't have a high IIT Phi value. How will it find it's own boundaries? How will it find the underlying causal structures that it is part of? Maybe this can be done through external memory but will that be enough or do we need it in the core stack of the scaling-based training loop?
A side note is that, one of the main things that I didn't understand about IIT before was how it really is about looking at meta-substrates or "signals" as Douglas Hofstadter would call them are optimally re-organising themselves to be as predictable for themselves in the future. Yet it does and it integrates really well into ActInf (at least to the extent that I currently understand it.)
Okay, so I don't have much time to write this so bear with the quality but I thought I would say one or two things of the Yudkowsky and Wolfram discussion as someone who's at least spent 10 deep work hours trying to understand Wolfram's persepective of the world.
With some of the older floating megaminds like Wolfram and Friston who are also phycisists you have the problem that they get very caught up in their own ontology.
From the perspective of a phycisist morality could be seen as an emergent property of physical laws.
Wolfram likes to think of things in terms of computational reducibility, a way this can be described in the agent foundations frame is that the agent modelling the environment will be able to predict the world dependent on it's own speed. It's like some sort of agent-environment relativity where the information processing capacity determines the space of possible ontologies. An example of this being how if we have an intelligence that's a lot closer to operating at the speed of light, the visual field might not be a useful vector of experience to model.
Another way to say it is that there's only modelling and modelled. An intuition from this frame is that there's only differently good models of understanding specific things and so the concept of general intelligence becomes weird here.
IMO this is like the problem of the first 2 hours of the conversation, to some extent Wolfram doesn't engage with the huamn perspective as much nor any ought questions. He has a very physics floating megamind perspective.
Now, I personally believe there's something interesting to be said about an alternative hypothesis to the individual superintelligence that comes from theories of collective intelligence. If a superorganism is better at modelling something than an individual organism is then it should outcompete the others in this system. I'm personally bullish on the idea that there are certain configurations of humans and general trust-verifying networks that can outcompete individual AGI as the outer alignment functions would enforce the inner functions enough.
I was going through my old stuff and I found this from a year and a half ago so I thought I would just post it here real quickly as I found the last idea funny and the first idea to be pretty interesting:
In normal business there exist consulting firms that are specialised in certain topics, ensuring that organisations can take in an outside view from experts on the topic.
This seems quite an efficient way of doing things and something that, if built up properly within alignment, could lead to faster progress down the line. This is also something that the future fund seemed to be interested in as they gave prices for both the idea of creating an org focused on creating datasets and one on taking in human feedback. These are not the only ideas that are possible, however, and below I mention some more possible orgs that are likely to be net positive.
Newly minted alignment researchers will probably have a while to go before they can become fully integrated into a team. One can, therefore, imagine an organisation that takes in inexperienced alignment researchers and helps them write papers. They then promote these alignment researchers as being able to help with certain things. Real orgs can then easily take them in for contracting on specific problems. This should help involve market forces in the alignment area and should in general, improve the efficiency of the space. There are reasons why consulting firms exist in real life and creating the equivalent of Mackenzie in alignment is probably a good idea. Yet I might be wrong in this and if you can argue why it would make the space less efficient, I would love to hear it.
We don't want the wrong information to spread, something between a normal marketing firm and the Chinese "marketing" agency, If it's an info-hazard then shut the fuck up!