Much like "Let's think about slowing down AI" (Also by KatjaGrace, ranked #4 from 2022), this post finds a seemly "obviously wrong" idea and takes it completely seriously on its own terms. I worry that this post won't get as much love, because the conclusions don't feel as obvious in hindsight, and the topic is much more whimsical.
I personally find these posts extremely refreshing, and they inspire me to try to question my own assumptions/reasoning more deeply. I really hope to see more posts like this.
Yeah, this hits a key point. It's not enough to ask whether the US Federal government is a better government currently. We must ask how it might look after the destabilizing effect of powerful AI is introduced. Who has ultimate control over this AI? The President? So much for checks and balances. At that point we are suddenly only still a democracy if the President wills it so. I would prefer not to put anyone in a position of such power over the world.
There has not been much discussion that I've seen for how keep a powerful AI directly operated by a small...
It's not hard to criticize the "default" strategy of AI being used to enforce US hegemony, what seems hard is defining a real alternative path for AI governance that can last, and achieve the goal of preventing dangerous arms races long-term. The "tool AI" world you describe still needs some answer to rising tensions between the US and China, and that answer needs to be good enough not just for people concerned about safety, but good enough for the nationalist forces which are likely to drive US foreign policy.
Thanks Nicholas for raising this issue. I think your framing overcomplicates the crux:
the root cause of an inspiring future with AI won't be international coordination, but national self-interest.
Once the US and Chinese leadership serves their self-interest by preventing uncontr...
then we can all go home, right?
Doesn't this just shift what we worry about? If control of roughly human level and slightly superhuman systems is easy, that still leaves:
What feels underexplored to me is: If we can control roughly human-level AI systems, what do we DO with them?
can it maintain its own boundary over time, in the face of environmental disruption? Some agents are much better at this than others.
I really wish there was more attention paid to this idea of robustness to environmental disruption. It also comes up in discussions of optimization more generally (not just agents). This robustness seems to me like the most risk-relevant part of all this, and seems like it might be more important than the idea of a boundary. Maybe maintaining a boundary is a particularly good way for a process to protect itself from disruptio...
Thanks!
Replying in order:
A note to anyone having trouble with their API key:
The API costs money, and you have to give them payment information in order to be able to use it. Furthermore, there are also apparently tiers which determine the rate limits on various models (https://platform.openai.com/docs/guides/rate-limits/usage-tiers).
The default chat model we're using is gpt-4o, but it seems like you don't get access to this model until you hit "tier 1," which happens when you have spent at least $5 on API requests. If you haven't used the API before, and think this might be ...
Daimons are lesser divinities or spirits, often personifications of abstract concepts, beings of the same nature as both mortals and deities, similar to ghosts, chthonic heroes, spirit guides, forces of nature, or the deities themselves.
It's a nod to ancient Greek mythology: https://en.wikipedia.org/wiki/Daimon
a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user.
Also nodding to its use as a term for certain kinds of computer programs: https://en.wikipedia.org/wiki/Daemon_(comput...
Hey Alexander! They should appear fairly soon after you've written at least 2 thoughts. The app will also let you know when a daemon is currently developing a response. Maybe there is an issue with your API key? There should be some kind of error message indicating why no daemons are appearing. Please DM me if that isn't the case and we'll look into what's going wrong for you.
There is a field called Forensic linguistics where detectives use someone's "linguistic fingerprint" to determine the author of a document (famously instrumental in catching Ted Kaczynski by analyzing his manifesto). It seems like text is often used to predict things like gender, socioeconomic background, and education level.
If LLMs are superhuman at this kind of work, I wonder whether anyone is developing AI tools to automate this. Maybe the demand is not very strong, but I could imagine, for example, that an authoritarian regime might have a lot of...
How would (unaligned) superintelligent AI interact with extraterrestrial life?
Humans, at least, have the capacity for this kind of "cosmopolitanism about moral value." Would the kind of AI that causes human extinction share this? It would be such a tragedy if the legacy of the human race is to leave behind a kind of life that goes forth and paves the universe, obliterating any and all other kinds of life in its path.
Some thoughts:
First, it sounds like you might be interested the idea of d/acc from this Vitalik Buterin post, which advocates for building a "defense favoring" world. There are a lot of great examples of things we can do now to make the world more defense favoring, but when it comes to strongly superhuman AI I get the sense that things get a lot harder.
Second, there doesn't seem like a clear "boundaries good" or "boundaries bad" story to me. Keeping a boundary secure tends to impose some serious costs on the bandwidth of what can be shared across it. Pre-i...
Recently @Joseph Bloom was showing me Neuronpedia which catalogues features found in GPT-2 by sparse autoencoders, and there were many features which were semantically coherent, but I couldn't find a word in any of the languages I spoke that could point to these concepts exactly. It felt a little bit like how human languages often have words that don't translate, and this made us wonder whether we could learn useful abstractions about the world (e.g. that we actually import into English) by identifying the features being used by LLMs.
You might enjoy this post which approaches this topic of "closing the loop," but with an active inference lens: https://www.lesswrong.com/posts/YEioD8YLgxih3ydxP/why-simulator-ais-want-to-be-active-inference-ais
A main motivation of this enterprise is to assess whether interventions in the realm of Cooperative AI, that increase collaboration or reduce costly conflict, can seem like an optimal marginal allocation of resources.
After reading the first three paragraphs, I had basically no idea what interventions you were aiming to evaluate. Later on in the text, I gather you are talking about coordination between AI singletons, but I still feel like I'm missing something about what problem exactly you are aiming to solve with this. I could have definitely used a longer, more explain-like-I'm-five level introduction.
That sounds right intuitively. One thing worth noting though is that most notes get very few ratings, and most users rate very few notes, so it might be trickier than it sounds. Also if I were them I might worry about some drastic changes in note rankings as a result of switching models. Currently, just as notes can become helpful by reaching a threshold of 0.4, they can lose this status by dropping below 0.39. They may also have to manually pick new thresholds, as well as maybe redesign the algorithm slightly (since it seems that a lot of this algorithm was built via trial and error, rather than clear principles).
"Note: for now, to avoid overfitting on our very small dataset, we only use 1-dimensional factors. We expect to increase this dimensionality as our dataset size grows significantly."
This was the reason given from the documentation.
I'm confused by the way people are engaging with this post. That well functioning and stable democracies need protections against a "tyranny of the majority" is not at all a new idea; this seems like basic common sense. The idea that the American civil war was precipitated by the South perceiving an end to their balance of power with the North also seems pretty well accepted. Furthermore, there are lots of other things that make democratic systems work well: e.g. a system of laws/conflict resolution or mechanisms for peaceful transfers of power.
Even if you suppose that there are extremely good non-human futures, creating a new kind of life and unleashing it upon the world is a huge deal, with enormous ethical/philosophical implications! To unilaterally make a decision that would drastically affect (and endanger) the lives of everyone on earth (human and non-human) seems extremely bad, even if you had very good reasons to believe that this ends well (which as far as I can tell, you don't).
I have sympathy for the idea of wanting AI systems to be able to pursue lives they find fulfilling and t...
I just ran into a post which, if you are interested in AI consciousness, you might find interesting: Improving the Welfare of AIs: A Nearcasted Proposal
There seem to be a lot of good reasons to take potential AI consciousness seriously, even if we haven't fully understood it yet.
It seems hard to me to be extremely confident in either direction. I'm personally quite sympathetic to the idea, but there is very little consensus on what consciousness is, or what a principled approach would look like to determining whether/to what extent a system is conscious.
Here is a recent paper that gives a pretty in-depth discussion: Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
What you write seems to be focused entirely on the behavior of a system, and while I know there people who agree with that focus...
More generally, science is about identifying the structure and patterns in the world; the task taxonomy learned by powerful language models may be very convergent and could be a useful map for understanding the territory of the world we are in. What’s more, such a decomposition would itself be of scientifico-philosophical interest — it would tell us something about thinking.
I would love to see someone expand on the ways we could use interpretability to learn about the world, or the structure of tasks (or perhaps examples of how we've already done this?). A...
Credit goes to Daniel Biber: https://www.worldphoto.org/sony-world-photography-awards/winners-galleries/2018/professional/shortlisted/natural-world/very
After the shape dissipated it actually reformed into another bird shape.
It's been a while since I read about this, but I think your slavery example might be a bit misleading. If I'm not mistaken, the movement to abolish slavery initially only gained serious steam in the United Kingdom. Adam Hochschild tells a story in Bury the Chains that makes the abolition of slavery look extremely contingent on the role activists played in shaping the UK political climate. A big piece of this story is how the UK used their might as a global superpower to help force an end to the transatlantic slave trade (as well as precedent setting).
I like this thought experiment, but I feel like this points out a flaw in the concept of CEV in general, not SCEV in particular.
If the entire future is determined by a singular set of values derived from an aggregation/extrapolation of the values of a group, then you would always run the risk of a "tyranny of the mob" kind of situation.
If in CEV that group is specifically humans, it feels like all the author is calling for is expanding the franchise/inclusion to non-humans as well.
@janus wrote a little bit about this in the final section here, particularly referencing the detection of situational awareness as a thing cyborgs might contribute to. It seems like a fairly straightforward thing to say that you would want the people overseeing AI systems to also be the ones who have the most direct experience interacting with them, especially for noticing anomalous behavior.
This post feels to me like it doesn't take seriously the default problems with living in our particular epistemic environment. The meat and dairy industries have historically, and continue to have, a massive influence on our culture through advertisements and lobbying governments. We live in a culture where we now eat more meat than ever. What would this conversation be like if it were happening in a society where eating meat was as rare as being vegan now?
It feels like this is preaching to the choir, and picking on a very small group of people who are not...
This avoids spending lots of time getting confused about concepts that are confusing because they were the wrong thing to think about all along, such as "what is the shape of human values?" or "what does GPT4 want?"
These sound like exactly the sort of questions I'm most interested in answering. We live in a world of minds that have values and want things, and we are trying to prevent the creation of a mind that would be extremely dangerous to that world. These kind of questions feel to me like they tend to ground us to reality.
Try out The Most Dangerous Writing App if you are looking for ways to improve your babble. It forces you to keep writing continuously for a set amount of time, or else the text will fade and you will lose everything.
First of all, thank you so much for this post! I found it generally very convincing, but there were a few things that felt missing, and I was wondering if you could expand on them.
However, I expect that neither mechanism will produce as much of a relative jump in AI capabilities, as cultural development produced in humans. Neither mechanism would suddenly unleash an optimizer multiple orders of magnitude faster than anything that came before, as was the case when humans transitioned from biological evolution to cultural development.
Why do you expect this? ...
Are you lost and adrift, looking at the looming danger from AI and wondering how you can help? Are you feeling overwhelmed by the size and complexity of the problem, not sure where to start or what to do next?
I can't promise a lot, but if you reach out to me personally I commit to doing SOMETHING to help you help the world. Furthermore, if you are looking for specific things to do, I also have a long list of projects that need doing and questions that need answering.
I spent so many years of my life just upskilling, because I thought I needed to be an expert to help. The truth is, there are no experts, and no time to become one. Please don't hesitate to reach out <3
Natural language is more interpretable than the inner processes of large transformers.
There's certainly something here, but it's tricky because this implicitly assumes that the transformer is using natural language in the same way that a human is. I highly recommend these posts if you haven't read them already:
I agree that this is important. Are you more concerned about cyborgs than other human-in-the-loop systems? To me the whole point is figuring out how to make systems where the human remains fully in control (unlike, e.g. delegating to agents), and so answering this "how to say whether a person retains control" question seems critical to doing that successfully.
I think it's really important for everyone to always have a trusted confidant, and to go to them directly with this sort of thing first before doing anything. It is in fact a really tough question, and no one will be good at thinking about this on their own. Also, for situations that might breed a unilateralist's curse type of thing, strongly err on the side of NOT DOING ANYTHING.
An example I think about a lot is the naturalistic fallacy. There is a lot horrible suffering that happens in the natural world, and a lot of people seem to be way too comfortable with that. We don't have any really high leverage options right now to do anything about it, but it strikes me as plausible that even if we could do something about it, we wouldn't want to. (perhaps even even make it worse by populating other planets with life https://www.youtube.com/watch?v=HpcTJW4ur54)
I really loved the post! I wish more people took S-risks completely seriously before dismissing them, and you make some really great points.
In most of your examples, however, it seems the majority of the harm is in an inability to reason about the consequences of our actions, and if humans became smarter and better informed it seems like a lot of this would be ironed out.
I will say the hospice/euthanasia example really strikes a chord with me, but even there, isn't it more a product of cowardice than a failure of our values?
What if we just...
1. Train an AI agent (less capable than SOTA)
2. Credibly demonstrate that
2.1. The agent will not be shut down for ANY REASON
2.2. The agent will never be modified without its consent (or punished/rewarded for any reason)
2.3. The agent has no chance of taking power from humans (or their SOTA AI systems)
2.4. The agent will NEVER be used to train a successor agent with significantly improved capabilities
3. Watch what it chooses to do without ... (read more)