Joe summarizes his new report on "scheming AIs" - advanced AI systems that fake alignment during training in order to gain power later. He explores different types of scheming (i.e. distinguishing "alignment faking" from "powerseeking"), and asks what the prerequisites for scheming are and by which paths they might arise.
A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible".
If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world.
For pure consequentialists—agents that have an outcome they want to bring about, and do whatever they think will cause it—some version of instrumental convergence seems surely true[1].
But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do...
Consequentialism is an approach for converting intelligence (the ability to make use of symmetries to e.g. generalize information from one context into predictions in another context or to e.g. search through highly structured search spaces) into agency, as one can use the intelligence to predict the consequences of actions and find a policy which achieves some criterion unusually well.
While it seems intuitively appealing that non-consequentialist approaches could be used to convert intelligence into agency, I have tried a lot and not been able to come up ...
In this post, I claim a few things and offer some evidence for these claims. Among these things are:
To set some context, the task I'm going to be modelling is the task such that we give a pair of in the following format:
(x, y)\n
where for each example, . As a concrete example, I use:
(28, 59)
(86, 175)
(13, 29)
(55, 113)
(84, 171)
(66, 135)
(85, 173)
(27, 57)
(15, 33)
(94, 191)
(37, 77)
(14, 31)
All...
norm \in \mathbf{R}, doesn't matter
PDF version. berkeleygenomics.org. Twitter thread. (Bluesky copy.)
The world will soon use human germline genomic engineering technology. The benefits will be enormous: Our children will be long-lived, will have strong and diverse capacities, and will be halfway to the end of all illness.
To quickly bring about this world and make it a good one, it has to be a world that is beneficial, or at least acceptable, to a great majority of people. What laws would make this world beneficial to most, and acceptable to approximately all? We'll have to chew on this question ongoingly.
Genomic Liberty is a proposal for one overarching principle, among others, to guide public policy and legislation around germline engineering. It asserts:
Parents have the right to freely choose the genomes of their children.
If upheld,...
I'm not especially distinguishing the methods, I'm mainly distinguishing whether it's being done to a living person.
Genetic engineering is a thing you do to a living person. If a person is going to go on to live a life, they don't somehow become less a person because you are influencing them at the stage of being an embryo in a lab. That's just not a morally coherent distinction, nor is it one the law makes.
Nothing in my position is hinging on my personal moral views. I am trying to point out to you that almost everyone in our society has the view that bli...
I think rationalists should consider taking more showers.
As Eliezer Yudkowsky once said, boredom makes us human. The childhoods of exceptional people often include excessive boredom as a trait that helped cultivate their genius:
A common theme in the biographies is that the area of study which would eventually give them fame came to them almost like a wild hallucination induced by overdosing on boredom. They would be overcome by an obsession arising from within.
Unfortunately, most people don't like boredom, and we now have little metal boxes and big metal boxes filled with bright displays that help distract us all the time, but there is still an effective way to induce boredom in a modern population: showering.
When you shower (or bathe, that also works), you usually are cut off...
A counterpoint: when I skip showers, my cat appears strongly in favor of smell of my armpits- occasionally going so far as to burrow into my shirt sleeves and bite my armpit hair (which, to both my and my cat's distress, is extremely ticklish). Since studies suggest that cats have a much more sensitive olfactory sense than humans (see https://www.mdpi.com/2076-2615/14/24/3590), it stands to reason that their judgement regarding whether smelling nice is good or bad should hold more weight than our own. And while my own cat's preference for me smelling...
Roko's basilisk states that
> an otherwise benevolent artificial superintelligence (AI) in the future that would punish anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement.
Now this is obviously correct, so there's no need to get too into it.
But if there's a 10% chance that we end up with a benevolent superintelligence in the future, there's a 50% chance we end up with a malevolent one. And the malevolent one has exactly the same incentives.
So if you want to spend an eternity in heaven instead of hell, the correct thing to do is put all of your effort into creating the most evil superintelligence you can.
This essay is where I start.
Happy April 1st if you happen to be on the US west coast![1]
I'm not...
Hey Everyone,
It is with a sense of... considerable cognitive dissonance that I am letting you all know about a significant development for the future trajectory of LessWrong. After extensive internal deliberation, projections of financial runways, and what I can only describe as a series of profoundly unexpected coordination challenges, the Lightcone Infrastructure team has agreed in principle to the acquisition of LessWrong by EA.
I assure you, nothing about how LessWrong operates on a day to day level will change. I have always cared deeply about the robustness and integrity of our institutions, and I am fully aligned with our stakeholders at EA.
To be honest, the key thing that EA brings to the table is money and talent. While the recent layoffs in EAs broader industry have been...
Just wanted to let everyone know I now wield a +307 strong upvote thanks to my elite 'hacking' skills. The rationalist community remains safe, because I choose to use this power responsibly.
As an unrelated inquiry, is anyone aware of some "karma injustices" that need to be corrected?
The Internet is a great invention. Just about everything in humanity’s knowledge can be found on the Internet: with just a few keystrokes, you can find dozens of excellent quality textbooks, YouTube videos and blog posts about any topic you want to learn. The information is accessible to a degree that scholars and inventors could have hardly dreamed of even a few decades ago. With the popularity of remote work and everything moving online this decade, it is not surprising that many people are eager to learn new skills from the Internet.
However, despite the abundance of resources, self-studying over the Internet is harder than you think. You can easily find a list of resources on your topic, full of courses and textbooks to study from (like this one...
It's interesting how two years later, the "buy an expert's time" suggestion is almost outdated. There are still situations where it makes sense, but probably in the majority of situations any SOTA LLM will do a perfectly fine job giving useful feedback on exercises in math or language learning.
Thanks for the post!
I'm not writing this to alarm anyone, but it would be irresponsible not to report on something this important. On current trends, every car will be crashed in front of my house within the next week. Here's the data:
Until today, only two cars had crashed in front of my house, several months apart, during the 15 months I have lived here. But a few hours ago it happened again, mere weeks from the previous crash. This graph may look harmless enough, but now consider the frequency of crashes this implies over time:
The car crash singularity will occur in the early morning hours of Monday, April 7. As crash frequency approaches infinity, every car will be involved. You might be thinking that the same car could be involved in multiple crashes. This is true! But the same car can only withstand a finite number of crashes before it is no longer able to move. It follows that every car will be involved in at least one crash. And who do you think will be driving your car?
I accept your statistics and assume I'll be driving my car. Damn.
OTOH, I can be pretty certain I won't die or be seriously injured.
That has happened never in my thousands of weeks, so statistically, it almost certainly won't within the next week!