LessOnline Festival

May 31st - June 2nd, in Berkeley CA

A festival of truth-seeking, optimization, and blogging. We'll have writing workshops, rationality classes, puzzle hunts, and thoughtful conversations across a sprawling fractal campus of nooks and whiteboards.

Buy Tickets

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
I thought Superalignment was a positive bet by OpenAI, and I was happy when they committed to putting 20% of their current compute (at the time) towards it. I stopped thinking about that kind of approach because OAI already had competent people working on it. Several of them are now gone. It seems increasingly likely that the entire effort will dissolve. If so, OAI has now made the business decision to invest its capital in keeping its moat in the AGI race rather than basic safety science. This is bad and likely another early sign of what's to come. I think the research that was done by the Superalignment team should continue happen outside of OpenAI and, if governments have a lot of capital to allocate, they should figure out a way to provide compute to continue those efforts. Or maybe there's a better way forward. But I think it would be pretty bad if all that talent towards the project never gets truly leveraged into something impactful.
Epistemic status: not a lawyer, but I've worked with a lot of them. As I understand it, an NDA isn't enforceable against a subpoena (though the former employer can seek a protective order for the testimony).   Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...
elifland4534
1
The word "overconfident" seems overloaded. Here are some things I think that people sometimes mean when they say someone is overconfident: 1. They gave a binary probability that is too far from 50% (I believe this is the original one) 2. They overestimated a binary probability (e.g. they said 20% when it should be 1%) 3. Their estimate is arrogant (e.g. they say there's a 40% chance their startup fails when it should be 95%), or maybe they give an arrogant vibe 4. They seem too unwilling to change their mind upon arguments (maybe their credal resilience is too high) 5. They gave a probability distribution that seems wrong in some way (e.g. "50% AGI by 2030 is so overconfident, I think it should be 10%") * This one is pernicious in that any probability distribution gives very low percentages for some range, so being specific here seems important. 6. Their binary estimate or probability distribution seems too different from some sort of base rate, reference class, or expert(s) that they should defer to. How much does this overloading matter? I'm not sure, but one worry is that it allows people to score cheap rhetorical points by claiming someone else is overconfident when in practice they might mean something like "your probability distribution is wrong in some way". Beware of accusing someone of overconfidence without being more specific about what you mean.
RobertM4621
7
Vaguely feeling like OpenAI might be moving away from GPT-N+1 release model, for some combination of "political/frog-boiling" reasons and "scaling actually hitting a wall" reasons.  Seems relevant to note, since in the worlds where they hadn't been drip-feeding people incremental releases of slight improvements over the original GPT-4 capabilities, and instead just dropped GPT-5 (and it was as much of an improvement over 4 as 4 was over 3, or close), that might have prompted people to do an explicit orientation step.  As it is, I expect less of that kind of orientation to happen.  (Though maybe I'm speaking too soon and they will drop GPT-5 on us at some point, and it'll still manage to be a step-function improvement over whatever the latest GPT-4* model is at that point.)
For anyone interested in Natural Abstractions type research: https://arxiv.org/abs/2405.07987 Claude summary: Key points of "The Platonic Representation Hypothesis" paper: 1. Neural networks trained on different objectives, architectures, and modalities are converging to similar representations of the world as they scale up in size and capabilities. 2. This convergence is driven by the shared structure of the underlying reality generating the data, which acts as an attractor for the learned representations. 3. Scaling up model size, data quantity, and task diversity leads to representations that capture more information about the underlying reality, increasing convergence. 4. Contrastive learning objectives in particular lead to representations that capture the pointwise mutual information (PMI) of the joint distribution over observed events. 5. This convergence has implications for enhanced generalization, sample efficiency, and knowledge transfer as models scale, as well as reduced bias and hallucination. Relevance to AI alignment: 1. Convergent representations shaped by the structure of reality could lead to more reliable and robust AI systems that are better anchored to the real world. 2. If AI systems are capturing the true structure of the world, it increases the chances that their objectives, world models, and behaviors are aligned with reality rather than being arbitrarily alien or uninterpretable. 3. Shared representations across AI systems could make it easier to understand, compare, and control their behavior, rather than dealing with arbitrary black boxes. This enhanced transparency is important for alignment. 4. The hypothesis implies that scale leads to more general, flexible and uni-modal systems. Generality is key for advanced AI systems we want to be aligned.

Popular Comments

Recent Discussion

It is easier to ask than to answer. 

That’s my whole point.

It is much cheaper to ask questions than answer them so beware of situations where it is implied that asking and answering are equal. 

Here are some examples:

Let's say there is a maths game. I get a minute to ask questions. You get a minute to answer them. If you answer them all correctly, you win, if not, I do. Who will win?

Preregister your answer.

Okay, let's try. These questions took me roughly a minute to come up with. 

What's 56,789 * 45,387?

What's the integral from -6 to 5π of sin(x cos^2(x))/tan(x^9) dx?

What's the prime factorisation of 91435293173907507525437560876902107167279548147799415693153?

Good luck. If I understand correctly, that last one's gonna take you at least an hour1 (or however long it takes to threaten...

Here's an example of a cheap question I just asked on twitter. Maybe Richard Hanania will find it cheap to answer too, but part of the reason I asked it was because I expect him to find it difficult to answer.

If he can't answer it, he will lose some status. That's probably good - if his position in the OP is genuine and well-informed, he should be able to answer it. The question is sort of "calling his bluff", checking that his implicitly promised reason actually exists.

Co-Authors: @Rocket, @Ryan Kidd, @LauraVaughan, @McKennaFitzgerald, @Christian Smith, @Juan Gil, @Henry Sleight, @Matthew Wearden 

The ML Alignment & Theory Scholars program (MATS) is an education and research mentorship program for researchers entering the field of AI safety. This winter, we held the fifth iteration of the MATS program, in which 63 scholars received mentorship from 20 research mentors. In this post, we motivate and explain the elements of the program, evaluate our impact, and identify areas for improving future programs.

Summary

Key details about the Winter Program:

  • The four main changes we made after our Summer program were:
  • Educational attainment of MATS scholars:
    • 48%
...
Akash20

Thanks for this (very thorough) answer. I'm especially excited to see that you've reached out to 25 AI gov researchers & already have four governance mentors for summer 2024. (Minor: I think the post mentioned that you plan to have at least 2, but it seems like there are already 4 confirmed and you're open to more; apologies if I misread something though.)

A few quick responses to other stuff:

  • I appreciate a lot of the other content presented. It feels to me like a lot of it is addressing the claim "it is net positive for MATS to upskill people who end u
... (read more)
8Ryan Kidd
Of the scholars ranked 5/10 and lower on value alignment, 63% worked with a mentor at a scaling lab, compared with 27% of the scholars ranked 6/10 and higher. The average scaling lab mentors rated their scholars' value alignment at 7.3/10 and rated 78% of their scholars at 6/10 and higher, compared to 8.0/10 and 90% for the average non-scaling lab mentor. This indicates that our scaling lab mentors were more discerning of value alignment on average than non-scaling lab mentors, or had a higher base rate of low-value alignment scholars (probably both). I also want to push back a bit against an implicit framing of the average scaling lab safety researcher we support as being relatively unconcerned about value alignment or the positive impact of their research; this seems manifestly false from my conversations with mentors, their scholars, and the broader community.
2habryka
The second hypothesis here seems much more likely (and my guess is your mentors would agree). My guess is after properly controlling for that you would find a mild to moderate negative correlation here.  But also, more importantly, the set of scholars from which MATS is drawing is heavily skewed towards the kind of person who would work at scaling labs (especially since funding has been heavily skewing towards funding the kind of research that can occur at scaling labs). 
13habryka
Huh, not sure where you are picking this up. I am of course very concerned about the ability of researchers at scaling labs being capable of evaluating their positive impact in respect to their choice of working at a scaling lab (their job does after all depend on them not believing that is harmful), but of course they are not unconcerned about their positive impact.

Executive summary

Fecal Microbiota Transplant (FMT) is a procedure that involves transferring the stool of healthy people to the guts of unhealthy people. The bacteria in the healthy person’s stool helps to rebalance the unhealthy person’s dysbiotic (imbalanced) gut microbiome, making their microbiome healthier, disease-resistant, and more youthful. Think of FMTs as a kind of super probiotic to optimize your gut health!

Since the microbiome affects almost all aspects of human health, functioning, and development, FMTs are a promising treatment for a huge variety of health conditions, including multiple sclerosis, ALS, neurodegenerative diseases like Alzheimer's, autism, chronic fatigue syndrome, long Covid, and many more. FMTs from young donors might even have anti-aging effects!

FMTs can easily and safely be done at home without a doctor - both for the donor and recipient.

FMT treatment could...

This is a linkpost for https://arxiv.org/abs/2405.06624

Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum

Abstract:

Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components:

...

This sounds really intriguing. I would like someone who is familiar with natural abstraction research to comment on this paper.

1kromem
It's going to have to. Ilya is brilliant and seems to really see the horizon of the tech, but maybe isn't the best at the business side to see how to sell it. But this is often the curse of the ethically pragmatic. There is such a focus on the ethics part by the participants that the business side of things only sees that conversation and misses the rather extreme pragmatism. As an example, would superaligned CEOs in the oil industry fifty years ago have still only kept their eye on quarterly share prices or considered long term costs of their choices? There's going to be trillions in damages that the world has taken on as liabilities that could have been avoided with adequate foresight and patience. If the market ends up with two AIs, one that will burn down the house to save on this month's heating bill and one that will care if the house is still there to heat next month, there's a huge selling point for the one that doesn't burn down the house as long as "not burning down the house" can be explained as "long term net yield" or some other BS business language. If instead it's presented to executives as "save on this month's heating bill" vs "don't unhouse my cats" leadership is going to burn the neighborhood to the ground. (Source: Explained new technology to C-suite decision makers at F500s for years.) The good news is that I think the pragmatism of Ilya's vision on superalignment is going to become clear over the next iteration or two of models and that's going to be before the question of models truly being unable to be controlled crops up. I just hope that whatever he's going to be keeping busy with will allow him to still help execute on superderminism when the market finally realizes "we should do this" for pragmatic reasons and not just amorphous ethical reasons execs just kind of ignore. And in the meantime I think given the present pace that Anthropic is going to continue to lay a lot of the groundwork on what's needed for alignment on the way to s
3Bogdan Ionut Cirstea
Strongly agree; I've been thinking for a while that something like a public-private partnership involving at least the US government and the top US AI labs might be a better way to go about this. Unfortunately, recent events seem in line with it not being ideal to only rely on labs for AI safety research, and the potential scalability of automating it should make it even more promising for government involvement. [Strongly] oversimplified, the labs could provide a lot of the in-house expertise, the government could provide the incentives, public legitimacy (related: I think of a solution to aligning superintelligence as a public good) and significant financial resources.

Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.

Reasons are unclear (as usual when safety people leave OpenAI).

The NYT piece and others I've seen don't really have details. Archive of NYT if you want to read it anyway.

OpenAI announced Sutskever's departure in a blogpost.

Sutskever and Leike confirmed their departures in tweets.

Pure speculation: The timing of these departures being the day after the big, attention-grabbing GPT-4o release makes me think that there was a fixed date for Ilya and Jan to leave, and OpenAI lined up the release and PR to drown out coverage. Especially in light of Ilya not (apparently) being very involved with GPT-4o.

12Thane Ruthenis
That's good news. There was a brief moment, back in 2023, when OpenAI's actions made me tentatively optimistic that the company was actually taking alignment seriously, even if its model of the problem was broken. Everything that happened since then has made it clear that this is not the case; that all these big flashy commitments like Superalignment were just safety-washing and virtue signaling. They were only going to do alignment work inasmuch as that didn't interfere with racing full-speed towards greater capabilities. So these resignations don't negatively impact my p(doom) in the obvious way. The alignment people at OpenAI were already powerless to do anything useful regarding changing the company direction. On the other hand, what these resignations do is showcasing that fact. Inasmuch as Superalignment was a virtue-signaling move meant to paint OpenAI as caring deeply about AI Safety, so many people working on it resigning or getting fired starkly signals the opposite. And it's good to have that more in the open; it's good that OpenAI loses its pretense. Oh, and it's also good that OpenAI is losing talented engineers, of course.
15Rob Bensinger
FWIW I do think "don't trust this guy" is warranted; I don't know that he's malicious, but I think he's just exceptionally incompetent relative to the average tech reporter you're likely to see stories from. Like, in 2018 Metz wrote a full-length article on smarter-than-human AI that included the following frankly incredible sentence:
35Rob Bensinger
FWIW, Cade Metz was reaching out to MIRI and some other folks in the x-risk space back in January 2020, and I went to read some of his articles and came to the conclusion in January that he's one of the least competent journalists -- like, most likely to misunderstand his beat and emit obvious howlers -- that I'd ever encountered. I told folks as much at the time, and advised against talking to him just on the basis that a lot of his journalism is comically bad and you'll risk looking foolish if you tap him. This was six months before Metz caused SSC to shut down and more than a year before his hit piece on Scott came out, so it wasn't in any way based on 'Metz has been mean to my friends' or anything like that. (At the time he wasn't even asking around about SSC or Scott, AFAIK.) (I don't think this is an idiosyncratic opinion of mine, either; I've seen other non-rationalists I take seriously flag Metz as someone unusually out of his depth and error-prone for a NYT reporter, for reporting unrelated to SSC stuff.)
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Caspar Oesterheld came up with two of the most important concepts in my field of work: Evidential Cooperation in Large Worlds and Safe Pareto Improvements. He also came up with a potential implementation of evidential decision theory in boundedly rational agents called decision auctions, wrote a comprehensive review of anthropics and how it interacts with decision theory which most of my anthropics discussions built on, and independently decided to work on AI some time late 2009 or early 2010.


 

Needless to say, I have a lot of respect for Caspar’s work. I’ve often felt very confused about what to do in my attempts at conceptual research, so I decided to ask Caspar how he did his research. Below is my writeup from the resulting conversation.

How Caspar came up with surrogate goals

The process

  • Caspar
...

Yes! Edited the main text to make it clear

In an online discussion elsewhere today someone linked this article which in turn linked the paper Gignac & Zajenkowski, The Dunning-Kruger effect is (mostly) a statistical artefact: Valid approaches to testing the hypothesis with individual differences data (PDF) (ironically hosted on @gwern's site).

And I just don't understand what they were thinking.

Let's look at their methodology real quick in section 2.2 (emphasis added):

2.2.1. Subjectively assessed intelligence
Participants assessed their own intelligence on a scale ranging from 1 to 25 (see Zajenkowski, Stolarski, Maciantowicz, Malesza, & Witowska, 2016). Five groups of five columns were labelled as very low, low, average, high or very high, respectively (see Fig. S1). Participants' SAIQ was indexed with the marked column counting from the first to the left; thus, the scores ranged from 1 to

...

This is the first in a sequence of four posts taken from my recent report: Why Did Environmentalism Become Partisan?

 

Introduction

In the United States, environmentalism is extremely partisan.

It might feel like this was inevitable. Caring about the environment, and supporting government action to protect the environment, might seem like they are inherently left-leaning. Partisanship has increased for many issues, so it might not be surprising that environmentalism became partisan too.

Looking at the public opinion polls more closely makes it more surprising. Environmentalism in the United States is unusually partisan, compared to other issues, compared to other countries, and compared to the United States itself at other times. 

The partisanship of environmentalism was not inevitable.

Compared to Other Issues

Environmentalism is one of the, if not the, most partisan issues in the...

It's making environmentalism bi-partisan.

It's too late to make environmentalism never have been partisan in the first place. And you can't just persuade current people in the environmentalist movement to stop caring about all the other issues, except environment. Neither it will work, nor I think it will be net positive thing to do.

But there is still an opportunity to have its own branch of environmentalism for republicans.