100

16h

Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board Expands.

Ilya Sutskever and Jan Leike have left OpenAI. This is almost exactly six months after Altman’s temporary firing and The Battle of the Board, the day after the release of GPT-4o, and soon after a number of other recent safety-related OpenAI departures. Many others working on safety have also left recently. This is part of a longstanding pattern at OpenAI.

Jan Leike later offered an explanation for his decision on Twitter. Leike asserts that OpenAI has lost the mission on safety and culturally been increasingly hostile to it. He says the superalignment team was starved for resources, with its public explicit compute commitments dishonored, and...

(Continue Reading – 13163 more words)

orthonormal3h20

Go all-in on lobbying the US and other governments to fully prohibit the training of frontier models beyond a certain level, in a way that OpenAI can't route around (so probably block Altman's foreign chip factory initiative, for instance).

13Linch5h

This has been OpenAI's line (whether implicitly or explicitly) for a while iiuc. I referenced it on my Open Asteroid Impact website, under "Operation Death Star."

1HiddenPrior6h

Super helpful! Thanks!

3HiddenPrior6h

I am limited in my means, but I would commit to a fund for strategy 2. My thoughts were on strategy 2, and it seems likely to do the most damage to OpenAI's reputation (and therefore funding) out of the above options. If someone is really protective of something, like their public image/reputation, that probably indicates that it is the most painful place to hit them.

What's Going on With OpenAI's Messaging?

ozziegooen

This is a quickly-written opinion piece, of what I understand about OpenAI. I first posted it to Facebook, where it had some discussion.

Some arguments that OpenAI is making, simultaneously:

OpenAI will likely reach and own transformative AI (useful for attracting talent to work there).
OpenAI cares a lot about safety (good for public PR and government regulations).
OpenAI isn’t making anything dangerous and is unlikely to do so in the future (good for public PR and government regulations).
OpenAI doesn’t need to spend many resources on safety, and implementing safe AI won’t put it at any competitive disadvantage (important for investors who own most of the company).
Transformative AI will be incredibly valuable for all of humanity in the long term (for public PR and developers).
People at OpenAI have thought long and

...

(See More – 835 more words)

Odd anon5m10

Meta’s messaging is clearer.
“AI development won’t get us to transformative AI, we don’t think that AI safety will make a difference, we’re just going to optimize for profitability.”

So, Meta's messaging is actually quite inconsistent. Yann LeCun says (when speaking to certain audiences, at least) that current AI is very dumb, and AGI is so far away it's not worth worrying about all that much. Mark Zuckerberg, on the other hand, is quite vocal that their goal is AGI and that they're making real progress towards it, suggesting 5+ year timelines.

Scorable Functions: A Format for Algorithmic Forecasting

ozziegooen

32m

You can think of a *scorable function* as a "*box*" that can produce large clusters of complex forecasts. Credit to Dall-E 3.

Introduction

Imagine if a forecasting platform had estimates for things like:

"For every year until 2100, what will be the probability of a global catastrophic biological event, given different levels of biosecurity investment and technological advancement?"
"What will be the impact of various AI governance policies on the likelihood of developing safe and beneficial artificial general intelligence, and how will this affect key indicators of global well-being over the next century?"
"How valuable is every single project funded by Open Philanthropy, according to a person with any set of demographic information, if they would spend 1000 hours reflecting on it?"

These complex, multidimensional questions are useful for informing decision-making and resource...

(Continue Reading – 2185 more words)

The Problem With the Word ‘Alignment’

peligrietzer, particlemania

Ω 31h

This post was written by Peli Grietzer, inspired by internal writings by TJ (tushant jha), for AOI^[1]. The original post, published on Feb 5, 2024, can be found here: https://ai.objectives.institute/blog/the-problem-with-alignment.

The purpose of our work at the AI Objectives Institute (AOI) is to direct the impact of AI towards human autonomy and human flourishing. In the course of articulating our mission and positioning ourselves -- a young organization -- in the landscape of AI risk orgs, we’ve come to notice what we think are serious conceptual problems with the prevalent vocabulary of ‘AI alignment.’ This essay will discuss some of the major ways in which we think the concept of ‘alignment’ creates bias and confusion, as well as our own search for clarifying concepts.

At AOI, we try to...

(Continue Reading – 1550 more words)

Introducing AI Lab Watch

211

Zach Stein-Perlman

20d

This is a linkpost for https://ailabwatch.org

I'm launching AI Lab Watch. I collected actions for frontier AI labs to improve AI safety, then evaluated some frontier labs accordingly.

It's a collection of information on what labs should do and what labs are doing. It also has some adjacent resources, including a list of other safety-ish scorecard-ish stuff.

(It's much better on desktop than mobile — don't read it on mobile.)

It's in beta—leave feedback here or comment or DM me—but I basically endorse the content and you're welcome to share and discuss it publicly.

It's unincorporated, unfunded, not affiliated with any orgs/people, and is just me.

Some clarifications and disclaimers.

How you can help:

Give feedback on how this project is helpful or how it could be different to be much more helpful
Tell me what's wrong/missing; point me to sources

...

(See More – 208 more words)

6Wei Dai1h

Suggest having a row for "Transparency", to cover things like whether the company encourages or discourages whistleblowing, does it report bad news about alignment/safety (such as negative research results) or only good news (new ideas and positive results), does it provide enough info to the public to judge the adequacy of its safety culture and governance, etc.

Zach Stein-Perlman1h20

Thanks.

Any takes on what info a company could publish to demonstrate "the adequacy of its safety culture and governance"? (Or recommended reading?)

Ideally criteria are objectively evaluable / minimize illegible judgment calls.

2Martin Vlach8h

So Alignment program is to be updated to 0 for OpenAI now that Superalignment team is no more? ( https://docs.google.com/document/d/1uPd2S00MqfgXmKHRkVELz5PdFRVzfjDujtu8XLyREgM/edit?usp=sharing )

Fund me please - I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Johannes C. Mayer

Thanks to Taylor Smith for doing some copy-editing this.

In this article, I tell some anecdotes and present some evidence in the form of research artifacts about how easy it is for me to work hard when I have collaborators. If you are in a hurry I recommend skipping to the research artifact section.

Bleeding Feet and Dedication

During AI Safety Camp (AISC) 2024, I was working with somebody on how to use binary search to approximate a hull that would contain a set of points, only to knock a glass off of my table. It splintered into a thousand pieces all over my floor.

A normal person might stop and remove all the glass splinters. I just spent 10 seconds picking up some of the largest pieces and then decided...

(Continue Reading – 1763 more words)

Emrik1h10

I think both "jog, don't sprint" and "sprint, don't jog" is too low-dimensional as advice. It's good to try to spend 100% of one's resources on doing good—sorta tautologically. What allows Johannes to work as hard as he does, I think, is not (just) that he's obsessed with the work, it's rather that he understands his own mind well enough to navigate around its limits. And that self-insight is also what enables him aim his cognition at what matters—which is a trait I care more about than ability to work hard.

People who are good at aiming their cognition at ... (read more)

3mesaoptimizer17h

I think what quila is pointing at is their belief in the supposed fragility of thoughts at the edge of research questions. From that perspective I think their rebuttal is understandable, and your response completely misses the point: you can be someone who spends only four hours a day working and the rest of the time relaxing, but also care a lot about not losing the subtle and supposedly fragile threads of your thought when working. Note: I have a different model of research thought, one that involves a systematic process towards insight, and because of that I also disagree with Johannes' decisions.

1Mo Putera3h

I think you're right that I missed their point, thanks for pointing it out. I have had experiences similar to Johannes' anecdote re: ignoring broken glass to not lose fragile threads of thought; they usually entailed extended deep work periods past healthy thresholds for unclear marginal gain, so the quotes above felt personally relevant as guardrails. But also my experiences don't necessarily generalize (as your hypothetical shows). I'd be curious to know your model, and how it compares to some of John Wentworth's posts on the same IIRC.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Open Thread Spring 2024

habryka

2mo

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

47Linch7h

Do we know if @paulfchristiano or other ex-lab people working on AI policy have non-disparagement agreements with OpenAI or other AI companies? I know Cullen doesn't, but I don't know about anybody else. I know NIST isn't a regulatory body, but it still seems like standards-setting should be done by people who have no unusual legal obligations. To be clear, I want to differentiate between Non-Disclosure Agreements, which are perfectly sane and reasonable in at least a limited form as a way to prevent leaking trade secrets, and non-disparagement agreements, which prevents you from saying bad things about past employers. The latter seems clearly bad to have for anybody in a position to affect policy. Doubly so if the existence of the non-disparagement agreement itself is secretive.

7gwern2h

Sam Altman appears to have been using non-disparagements at least as far back as 2017-2018, even for things that really don't seem to have needed such things at all, like a research nonprofit arm of YC. It's unclear if that example is also a lifetime non-disparagement (I've asked), but nevertheless, given that track record, you should assume the OA research non-profit also tried to impose it, followed by the OA LLC (obviously), and so Paul F. Christiano (and all of the Anthropic founders) would presumably be bound. This would explain why Anthropic executives never say anything bad about OA, and refuse to explain exactly what broken promises by Altman triggered their failed attempt to remove Altman and subsequent exodus. (I have also asked Sam Altman on Twitter, since he follows me, apropos of his vested equity statement, how far back these NDAs go, if the Anthropic founders are still bound, and if they are, whether they will be unbound.)

7habryka2h

It seems really quite bad for Paul to work in the U.S. government on AI legislation without having disclosed that he is under a non-disparagement clause for the biggest entity in the space the regulator he is working at is regulating. And if he signed an NDA that prevents him from disclosing it, then it was IMO his job to not accept a position in which such a disclosure would obviously be required. I am currently like 50% Paul has indeed signed such a lifetime non-disparagement agreement, so I do think I don't buy that a "presumably" is appropriate here (though I am not that far away from it).

gwern1h21

It would be bad, I agree. (An NDA about what he worked on at OA, sure, but then being required to never say anything bad about OA forever, as a regulator who will be running evaluations etc...?) Fortunately, this is one of those rare situations where it is probably enough for Paul to simply say his OA NDA does not cover that - then either it doesn't and can't be a problem, or he has violated the NDA's gag order by talking about it and when OA then fails to sue him to enforce it, the NDA becomes moot.

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex, Kaarel

11h

by Lucius Bushnaq, Jake Mendel, Kaarel Hänni, Stefan Heimersheim.

A short post laying out our reasoning for using integrated gradients as attribution method. It is intended as a stand-alone post based on our LIB papers [1] [2]. This work was produced at Apollo Research.

Context

Understanding circuits in neural networks requires understanding how features interact with other features. There's a lot of features and their interactions are generally non-linear. A good starting point for understanding the interactions might be to just figure out how strongly each pair of features in adjacent layers of the network interacts. But since the relationships are non-linear, how do we quantify their 'strength' in a principled manner that isn't vulnerable to common and simple counterexamples? In other words, how do we quantify how much the...

(Continue Reading – 1690 more words)

ryan_greenblatt2h20

[Not very confident, but just saying my current view.]

I'm pretty skeptical about integrated gradients.

As far as why, I don't think we should care about the derivative at the baseline (zero or the mean).

As far as the axioms, I think I get off the train on "Completeness" which doesn't seem like a property we need/want.

I think you just need to eat that there isn't any sensible way to do something reasonable that gets Completeness.

The same applies with attribution in general (e.g. in decision making).

2ryan_greenblatt2h

Maybe I'm confused, but isn't integrated gradients strictly slower than an ablation to a baseline?

2tailcalled8h

Maybe? One thing I've been thinking a lot recently is that building tools to interpret networks on individual datapoints might be more relevant than attributing over a dataset. This applies if the goal is to make statistical generalizations since a richer structure on an individual datapoint gives you more to generalize with, but it also applies if the goal is the inverse, to go from general patterns to particulars, since this would provide a richer method for debugging, noticing exceptions, etc.. And basically the trouble a lot of work that attempts to generalize ends up with is that some phenomena are very particular to specific cases, so one risks losing a lot of information by only focusing on the generalizable findings. Either way, cool work, seems like we've thought about similar lines but you've put in more work.

[Linkpost] Statement from Scarlett Johansson on OpenAI's use of the "Sky" voice, that was shockingly similar to her own voice.

Linch

This is a linkpost for https://variety.com/2024/digital/news/scarlett-johansson-responds-shocked-angered-openai-chatgpt-her-1236011135/

Scarlett Johansson makes a statement about the "Sky" voice, a voice for GPT-4o that OpenAI recently pulled after less than a week of prime time.

tl;dr: OpenAI made an offer last September to Johansson; she refused. They offered again 2 days before the public demo. Scarlett Johansson claims that the voice was so similar that even friend and family noticed. She hired legal counsel to ask OpenAI to "detail the exact process by which they created the ‘Sky’ voice," which resulted in OpenAI taking the voice down.

Full statement below:

Last September, I received an offer from Sam Altman, who wanted to hire me to voice the current ChatGPT 4.0 system. He told me that he felt that by my voicing the system, I could bridge the gap between tech...

(See More – 232 more words)

gwern2h40

Sam Altman has apparently provided a statement to NPR apropos of https://www.npr.org/2024/05/20/1252495087/openai-pulls-ai-voice-that-was-compared-to-scarlett-johansson-in-the-movie-her , quoted on Twitter by the NPR journalist (second):

...In response, Sam Altman has now issued a statement saying “Sky is not Scarlett Johansson’s, and it was never intended to resemble hers.”
“We cast the voice actor behind Sky’s voice before any outreach to Ms. Johansson. Out of respect for Ms. Johansson, we have paused using Sky’s voice in our products. We are sorry to Ms

... (read more)

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Introduction

Bleeding Feet and Dedication

Context

LessOnline Festival