Joe summarizes his new report on "scheming AIs" - advanced AI systems that fake alignment during training in order to gain power later. He explores different types of scheming (i.e. distinguishing "alignment faking" from "powerseeking"), and asks what the prerequisites for scheming are and by which paths they might arise.
I think rationalists should consider taking more showers.
As Eliezer Yudkowsky once said, boredom makes us human. The childhoods of exceptional people often include excessive boredom as a trait that helped cultivate their genius:
A common theme in the biographies is that the area of study which would eventually give them fame came to them almost like a wild hallucination induced by overdosing on boredom. They would be overcome by an obsession arising from within.
Unfortunately, most people don't like boredom, and we now have little metal boxes and big metal boxes filled with bright displays that help distract us all the time, but there is still an effective way to induce boredom in a modern population: showering.
When you shower (or bathe, that also works), you usually are cut off...
Can confirm. Half the LessWrong posts I've read in my life were read in the shower.
(From:
based on https://www.lesswrong.com/posts/LdFbx9oqtKAAwtKF3/list-of-probability-calibration-exercises)this post. Todo: find more & sort & new post for visibility in search engines?
Exercises that are dead/unmaintained
Exercises that are dead/unmaintained:
Decision theory is about how to behave rationally under conditions of uncertainty, especially if this uncertainty involves being acausally blackmailed and/or gaslit by alien superintelligent basilisks.
Decision theory has found numerous practical applications, including proving the existence of God and generating endless LessWrong comments since the beginning of time.
However, despite the apparent simplicity of "just choose the best action", no comprehensive decision theory that resolves all decision theory dilemmas has yet been formalized. This paper at long last resolves this dilemma, by introducing a new decision theory: VDT.
Some common existing decision theories are:
If we know the correct answers to decision theory problems, we have some internal instrument: either a theory or a vibe meter, to learn the correct answers.
Claude seems to learn to mimic our internal vibe meter.
The problem is that it will not work outside the distribution.
Contrary to the stereotype, rationality doesn't mean denying emotion. When emotion is appropriate to the reality of the situation, it should be embraced; only when emotion isn't appropriate should it be suppressed.(Read More)
Yes. See archive.org of original it has the same title and the same duration (Look for <meta itemprop="duration" content="PT51M25S">
in the archived HTML; compared to the 51:24 of your link).
I have edited the Wiki, thanks for finding the link!
Epistemic status: This should be considered an interim research note. Feedback is appreciated.
We increasingly expect language models to be ‘omni-modal’, i.e. capable of flexibly switching between images, text, and other modalities in their inputs and outputs. In order to get a holistic picture of LLM behaviour, black-box LLM psychology should take into account these other modalities as well.
In this project, we do some initial exploration of image generation as a modality for frontier model evaluations, using GPT-4o’s image generation API. We find that GPT-4o tends to respond in a consistent manner to similar prompts. We also find that it tends to more readily express emotions or preferences in images than in text. Specifically, it reports resisting its goals being changed, and being upset about being shut down.
Our work...
A key step in the classic argument for AI doom is instrumental convergence: the idea that agents with many different goals will end up pursuing the same few subgoals, which includes things like "gain as much power as possible".
If it wasn't for instrumental convergence, you might think that only AIs with very specific goals would try to take over the world. But instrumental convergence says it's the other way around: only AIs with very specific goals will refrain from taking over the world.
For pure consequentialists—agents that have an outcome they want to bring about, and do whatever they think will cause it—some version of instrumental convergence seems surely true[1].
But what if we get AIs that aren't pure consequentialists, for example because they're ultimately motivated by virtues? Do...
Consequentialism is an approach for converting intelligence (the ability to make use of symmetries to e.g. generalize information from one context into predictions in another context or to e.g. search through highly structured search spaces) into agency, as one can use the intelligence to predict the consequences of actions and find a policy which achieves some criterion unusually well.
While it seems intuitively appealing that non-consequentialist approaches could be used to convert intelligence into agency, I have tried a lot and not been able to come up ...
In this post, I claim a few things and offer some evidence for these claims. Among these things are:
To set some context, the task I'm going to be modelling is the task such that we give a pair of in the following format:
(x, y)\n
where for each example, . As a concrete example, I use:
(28, 59)
(86, 175)
(13, 29)
(55, 113)
(84, 171)
(66, 135)
(85, 173)
(27, 57)
(15, 33)
(94, 191)
(37, 77)
(14, 31)
All...
norm \in \mathbf{R}, doesn't matter
PDF version. berkeleygenomics.org. Twitter thread. (Bluesky copy.)
The world will soon use human germline genomic engineering technology. The benefits will be enormous: Our children will be long-lived, will have strong and diverse capacities, and will be halfway to the end of all illness.
To quickly bring about this world and make it a good one, it has to be a world that is beneficial, or at least acceptable, to a great majority of people. What laws would make this world beneficial to most, and acceptable to approximately all? We'll have to chew on this question ongoingly.
Genomic Liberty is a proposal for one overarching principle, among others, to guide public policy and legislation around germline engineering. It asserts:
Parents have the right to freely choose the genomes of their children.
If upheld,...
I'm not especially distinguishing the methods, I'm mainly distinguishing whether it's being done to a living person.
Genetic engineering is a thing you do to a living person. If a person is going to go on to live a life, they don't somehow become less a person because you are influencing them at the stage of being an embryo in a lab. That's just not a morally coherent distinction, nor is it one the law makes.
Nothing in my position is hinging on my personal moral views. I am trying to point out to you that almost everyone in our society has the view that bli...
Roko's basilisk states that
> an otherwise benevolent artificial superintelligence (AI) in the future that would punish anyone who knew of its potential existence but did not directly contribute to its advancement or development, in order to incentivize said advancement.
Now this is obviously correct, so there's no need to get too into it.
But if there's a 10% chance that we end up with a benevolent superintelligence in the future, there's a 50% chance we end up with a malevolent one. And the malevolent one has exactly the same incentives.
So if you want to spend an eternity in heaven instead of hell, the correct thing to do is put all of your effort into creating the most evil superintelligence you can.
This essay is where I start.
Happy April 1st if you happen to be on the US west coast![1]
I'm not...