Quick Takes

robo73

Our current big stupid: not preparing for 40% agreement

Epistemic status: lukewarm take from the gut (not brain) that feels rightish

The "Big Stupid" of the AI doomers 2013-2023 was AI nerds' solution to the problem "How do we stop people from building dangerous AIs?" was "research how to build AIs".  Methods normal people would consider to stop people from building dangerous AIs, like asking governments to make it illegal to build dangerous AIs, were considered gauche.  When the public turned out to be somewhat receptive to the idea of regulating ... (read more)

gilch20

the problem "How do we stop people from building dangerous AIs?" was "research how to build AIs".

Not quite. It was to research how to build friendly AIs. We haven't succeeded yet. What research progress we have made points to the problem being harder than initially thought, and capabilities turned out to be easier than most of us expected as well.

Methods normal people would consider to stop people from building dangerous AIs, like asking governments to make it illegal to build dangerous AIs, were considered gauche.

Considered by whom? Rationalists? T... (read more)

7Haiku
Strong agree and strong upvote. There are some efforts in the governance space and in the space of public awareness, but there should and can be much, much more. My read of these survey results is: AI Alignment researchers are optimistic people by nature. Despite this, most of them don't think we're on track to solve alignment in time, and they are split on whether we will even make significant progress. Most of them also support pausing AI development to give alignment research time to catch up. As for what to actually do about it: There are a lot of options, but I want to highlight PauseAI. (Disclosure: I volunteer with them. My involvement brings me no monetary benefit, and no net social benefit.) Their Discord server is highly active and engaged and is peopled with alignment researchers, community- and mass-movement organizers, experienced protesters, artists, developers, and a swath of regular people from around the world. They play the inside and outside game, both doing public outreach and also lobbying policymakers. On that note, I also want to put a spotlight on the simple action of sending emails to policymakers. Doing so and following through is extremely OP (i.e. has much more utility than you might expect), and can result in face-to-face meetings to discuss the nature of AI x-risk and what they can personally do about. Genuinely, my model of a world in 2040 that contains humans is almost always one in which a lot more people sent emails to politicians.

I promise I won't just continue to re-post a bunch of papers, but this one seems relevant to many around these parts. In particular @Elizabeth (also, sorry if you dislike being at-ed like that).

Associations of dietary patterns with brain health from behavioral, neuroimaging, biochemical and genetic analyses

Food preferences significantly influence dietary choices, yet understanding natural dietary patterns in populations remains limited. Here we identifiy four dietary subtypes by applying data-driven approaches to food-liking data from 181,990 UK Biobank pa

... (read more)

Very Spicy Take

Epistemic Note: 
Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion.

Premise 1: 
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.

Premise 2:
This was the default outcome. 

Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule. 

Premise 3:... (read more)

Showing 3 of 37 replies (Click to show all)

So basically, I think it is a bad idea and you think we can't do it anyway. In that case let's stop calling for it, and call for something more compassionate and realistic like a public apology.

I'll bet an apology would be a more effective way to pressure OpenAI to clean up its act anyways. Which is a better headline -- "OpenAI cofounder apologizes for their role in creating OpenAI", or some sort of internal EA movement drama? If we can generate a steady stream of negative headlines about OpenAI, there's a chance that Sam is declared too much of a PR and regulatory liability. I don't think it's a particularly good plan, but I haven't heard a better one.

1Ebenezer Dukakis
Sure, I think this helps tease out the moral valence point I was trying to make. "Don't allow them near" implies their advice is actively harmful, which in turn suggests that reversing it could be a good idea. But as you say, this is implausible. A more plausible statement is that their advice is basically noise -- you shouldn't pay too much attention to it. I expect OP would've said something like that if they were focused on descriptive accuracy rather than scapegoating. Another way to illuminate the moral dimension of this conversation: If we're talking about poor decision-making, perhaps MIRI and FHI should also be discussed? They did a lot to create interest in AGI, and MIRI failed to create good alignment researchers by its own lights. Now after doing advocacy off and on for years, and creating this situation, they're pivoting to 100% advocacy. Could MIRI be made up of good people who are "great at technical stuff", yet apt to shoot themselves in the foot when it comes to communicating with the public? It's hard for me to imagine an upvoted post on this forum saying "MIRI shouldn't be allowed anywhere near AI safety communications".
6Elizabeth
I like a lot of this post, but the sentence above seems very out of touch to me. Who are these third parties who are completely objective? Why is objective the adjective here, instead of "good judgement" or "predicted this problem at the time"?

Regarding the situation at OpenAI, I think it's important to keep a few historical facts in mind:

  1. The AI alignment community has long stated that an ideal FAI project would have a lead over competing projects. See e.g. this post:

Requisite resource levels: The project must have adequate resources to compete at the frontier of AGI development, including whatever mix of computational resources, intellectual labor, and closed insights are required to produce a 1+ year lead over less cautious competing projects.

  1. The scaling hypothesis wasn't obviously t
... (read more)

A Theory of Usable Information Under Computational Constraints

We propose a new framework for reasoning about information in complex systems. Our foundation is based on a variational extension of Shannon's information theory that takes into account the modeling power and computational constraints of the observer. The resulting \emph{predictive V-information} encompasses mutual information and other notions of informativeness such as the coefficient of determination. Unlike Shannon's mutual information and in violation of the data processing inequality, V-

... (read more)
4Alexander Gietelink Oldenziel
Can somebody explain to me what's happening in this paper ?

My reading is their definition of conditional predictive entropy is the naive generalization of Shannon's conditional entropy given that the way that you condition on some data is restricted to only being able to implement functions of a particular class. And the corresponding generalization of mutual information becomes a measure of how much more predictable does some piece of information become (Y) given evidence (X) compared to no evidence.

For example, the goal of public key cryptography cannot be to make the mutual information between a plaintext, and ... (read more)

1[comment deleted]

My timelines are lengthening. 

I've long been a skeptic of scaling LLMs to AGI *. To me I fundamentally don't understand how this is even possible. It must be said that very smart people give this view credence. davidad, dmurfet. on the other side are vanessa kosoy and steven byrnes. When pushed proponents don't actually defend the position that a large enough transformer will create nanotech or even obsolete their job. They usually mumble something about scaffolding.

I won't get into this debate here but I do want to note that my timelines have lengthe... (read more)

Showing 3 of 24 replies (Click to show all)
2Alexander Gietelink Oldenziel
In my mainline model there are only a few innovations needed, perhaps only a single big one to product an AGI which just like the Turing Machine sits at the top of the Chomsky Hierarchy will be basically the optimal architecture given resource constraints. There are probably some minor improvements todo with bridging the gap between theoretically optimal architecture and the actual architecture, or parts of the algorithm that can be indefinitely improved but with diminishing returns (these probably exist due to Levin and possibly.matrix.multiplication is one of these). On the whole I expect AI research to be very chunky. Indeed, we've seen that there was really just one big idea to all current AI progress: scaling, specifically scaling GPUs on maximally large undifferentiated datasets. There were some minor technical innovations needed to pull this off but on the whole that was the clinger. Of course, I don't know. Nobody knows. But I find this the most plausible guess based on what we know about intelligence, learning, theoretical computer science and science in general.

There are two kinds of relevant hypothetical innovations: those that enable chatbot-led autonomous research, and those that enable superintelligence. It's plausible that there is no need for (more of) the former, so that mere scaling through human efforts will lead to such chatbots in a few years regardless. (I think it's essentially inevitable that there is currently enough compute that with appropriate innovations we can get such autonomous human-scale-genius chatbots, but it's unclear if these innovations are necessary or easy to discover.) If autonomou... (read more)

2Alexander Gietelink Oldenziel
My timelines were not 2026. In fact, I made bets against doomers 2-3 years ago, one will resolve by next year. I agree iterative improvements are significant. This falls under "naive extrapolation of scaling laws". By nanotech I mean something akin to drexlerian nanotech or something similarly transformative in the vicinity. I think it is plausible that a true ASI will be able to make rapid progress (perhaps on the order of a few years or a decade) on nanotech. I suspect that people that don't take this as a serious possibility haven't really thought through what AGI/ASI means + what the limits and drivers of science and tech really are; I suspect they are simply falling prey to status-quo bias.

I'm surprised at people who seem to be updating only now about OpenAI being very irresponsible, rather than updating when they created a giant public competitive market for chatbots (which contains plenty of labs that don't care about alignment at all), thereby reducing how long everyone has to solve alignment. I still parse that move as devastating the commons in order to make a quick buck.

Showing 3 of 6 replies (Click to show all)

In the spirit of trying to understand what actually went wrong here -- IIRC, OpenAI didn't expect ChatGPT to blow up the way it did. Seems like they were playing a strategy of "release cool demos" as opposed to "create a giant competitive market".

2Garrett Baker
Who is updating? I haven't seen anyone change their mind yet.
10mesaoptimizer
You continue to model OpenAI as this black box monolith instead of trying to unravel the dynamics inside it and understand the incentive structures that lead these things to occur. Its a common pattern I notice in the way you interface with certain parts of reality. I don't consider OpenAI as responsible for this as much as Paul Christiano and Jan Leike and his team. Back in 2016 or 2017, when they initiated and led research into RLHF, they focused on LLMs because they expected that LLMs would be significantly more amenable to RLHF. This means that instruction-tuning was the cause of the focus on LLMs, which meant that it was almost inevitable that they'd try instruction-tuning on it, and incrementally build up models that deliver mundane utility. It was extremely predictable that Sam Altman and OpenAI would leverage this unexpected success to gain more investment and translate that into more researchers and compute. But Sam Altman and Greg Brockman aren't researchers, and they didn't figure out a path that minimized 'capabilities overhang' -- Paul Christiano did. And more important -- this is not mutually exclusive with OpenAI using the additional resources for both capabilities research and (what they call) alignment research. While you might consider everything they do as effectively capabilities research, the point I am making is that this is still consistent with the hypothesis that while they are misguided, they still are roughly doing the best they can given their incentives. What really changed my perspective here was the fact that Sam Altman seems to have been systematically destroying extremely valuable information about how we could evaluate OpenAI. Specifically, this non-disparagement clause that ex-employees cannot even mention without falling afoul of this contract, is something I didn't expect (I did expect non-disclosure clauses but not something this extreme). This meant that my model of OpenAI was systematically too optimistic about how cooperati

Sometimes I forget to take a dose of methylphenidate. As my previous dose fades away, I start to feel much worse than baseline. I then think "Oh no, I'm feeling so bad, I will not be able to work at all."

But then I remember that I forgot to take a dose of methylphenidate and instantly I feel a lot better.

Usually, one of the worst things when I'm feeling down is that I don't know why. But now, I'm in this very peculiar situation where putting or not putting some particular object into my mouth is the actual cause. It's hard to imagine something more tangibl... (read more)

Lorxus-20

Wait, some of y'all were still holding your breaths for OpenAI to be net-positive in solving alignment?

After the whole "initially having to be reminded alignment is A Thing"? And going back on its word to go for-profit? And spinning up a weird and opaque corporate structure? And people being worried about Altman being power-seeking? And everything to do with the OAI board debacle? And OAI Very Seriously proposing what (still) looks to me to be like a souped-up version of Baby Alignment Researcher's Master Plan B (where A involves solving physics and C invo... (read more)

Akash169

My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there's a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc.

Some quick thoughts:

  • Soft power– I think people underestimate the how strong the "soft power" of labs is, particularly in the Bay Area. 
  • JobsA large fraction of people getting involved in AI safety are interested in the
... (read more)
17Zach Stein-Perlman
Sorry for brevity, I'm busy right now. 1. Noticing good stuff labs do, not just criticizing them, is often helpful. I wish you thought of this work more as "evaluation" than "criticism." 2. It's often important for evaluation to be quite truth-tracking. Criticism isn't obviously good by default. Edit: 3. I'm pretty sure OP likes good criticism of the labs; no comment on how OP is perceived. And I think I don't understand your "good judgment" point. Feedback I've gotten on AI Lab Watch from senior AI safety people has been overwhelmingly positive, and of course there's a selection effect in what I hear, but I'm quite sure most of them support such efforts. 4. Conjecture (not exclusively) has done things that frustrated me, including in dimensions like being "'unilateralist,' 'not serious,' and 'untrustworthy.'" I think most criticism of Conjecture-related advocacy is legitimate and not just because people are opposed to criticizing labs. 5. I do agree on "soft power" and some of "jobs." People often don't criticize the labs publicly because they're worried about negative effects on them, their org, or people associated with them.
Akash52

RE 1& 2:

Agreed— my main point here is that the marketplace of ideas undervalues criticism.

I think one perspective could be “we should all just aim to do objective truth-seeking”, and as stated I agree with it.

The main issue with that frame, imo, is that it’s very easy to forget that the epistemic environment can be tilted in favor of certain perspectives.

EG I think it can be useful for “objective truth-seeking efforts” to be aware of some of the culture/status games that underincentivize criticism of labs & amplify lab-friendly perspectives.

RE 3:

Go... (read more)

simeon_c13075

Idea: Daniel Kokotajlo probably lost quite a bit of money by not signing an OpenAI NDA before leaving, which I consider a public service at this point. Could some of the funders of the AI safety landscape give some money or social reward for this?

I guess reimbursing everything Daniel lost might be a bit too much for funders but providing some money, both to reward the act and incentivize future safety people to not sign NDAs would have a very high value. 

Showing 3 of 24 replies (Click to show all)
11habryka
Yeah, at the time I didn't know how shady some of the contracts here were. I do think funding a legal defense is a marginally better use of funds (though my guess is funding both is worth it).
3Yonatan Cale
@habryka , Would you reply to this comment if there's an opportunity to donate to either? Me and another person are interested, and others could follow this comment too if they wanted to (only if it's easy for you, I don't want to add an annoying task to your plate)

Sure, I'll try to post here if I know of a clear opportunity to donate to either. 

William_SΩ731669

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to t... (read more)

Reply151032
Showing 3 of 35 replies (Click to show all)

They would not know if others have signed the SAME NDAs without trading information about their own NDAs, which is forbidden.

25tlevin
Kelsey Piper now reports: "I have seen the extremely restrictive off-boarding agreement that contains nondisclosure and non-disparagement provisions former OpenAI employees are subject to. It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it."
2wassname
Interesting! For most of us, this is outside our area of competence, so appreciate your input.

From my perspective, the only thing that keeps the OpenAI situation from being all kinds of terrible is that I continue to think they're not close to human-level AGI, so it probably doesn't matter all that much.

This is also my take on AI doom in general; my P(doom|AGI soon) is quite high (>50% for sure), but my P(AGI soon) is low. In fact it decreased in the last 12 months.

TurnTroutΩ7150

Apparently[1] there was recently some discussion of Survival Instinct in Offline Reinforcement Learning (NeurIPS 2023). The results are very interesting: 

On many benchmark datasets, offline RL can produce well-performing and safe policies even when trained with "wrong" reward labels, such as those that are zero everywhere or are negatives of the true rewards. This phenomenon cannot be easily explained by offline RL's return maximization objective. Moreover, it gives offline RL a degree of robustness that is uncharacteristic of its online RL count

... (read more)
Showing 3 of 4 replies (Click to show all)
Algon20

Because future rewards are discounted

Don't you mean future values? Also, AFAICT, the only thing going on here that seperates online from offline RL is that offline RL algorithms shape the initial value function to give conservative behaviour. And so you get conservative behaviour.

1Tao Lin
One lesson you could take away from this is "pay attention to the data, not the process" - this happened because the data had longer successes than failures. If successes were more numerous than failures, many algorithms would have imitated those as well with null reward.
6tailcalled
The paper sounds fine quality-wise to me, I just find it implausible that it's relevant for important alignment work, since the proposed mechanism is mainly an aversion to building new capabilities.

Several dozen people now presumably have Lumina in their mouths. Can we not simply crowdsource some assays of their saliva? I would chip money in to this. Key questions around ethanol levels, aldehyde levels, antibacterial levels, and whether the organism itself stays colonized at useful levels.

Showing 3 of 6 replies (Click to show all)
2Lorxus
Surely so! Hit me up if you ever end doing this - I'm likely getting the Lumina treatment in a couple months.
1wassname
A before and after would be even better!
Lorxus10

Any recommendations on how I should do that? You may assume that I know what a gas chromatograph is and what a Petri dish is and why you might want to use either or both of those for data collection, but not that I have any idea of how to most cost-effectively access either one as some rando who doesn't even have a MA in Chemistry.

  1. We inhabit this real material world, the one which we perceive all around us (and which somehow gives rise to perceptive and self-conscious beings like us).
  2. Though not all of our perceptions conform to a real material world. We may be fooled by things like illusions or hallucinations or dreams that mimic perceptions of this world but are actually all in our minds.
  3. Indeed if you examine your perceptions closely, you'll see that none of them actually give you representations of the material world, but merely reactions to it.
  4. In fact, since the only evidence we
... (read more)
Showing 3 of 11 replies (Click to show all)
2Richard_Kennaway
Dragging files around in a GUI is a familiar action that does known things with known consequences. Somewhere on the hard disc (or SSD, or somewhere in the cloud, etc.) there is indeed a "file" which has indeed been "moved" into a "folder", and taking off those quotation marks only requires some background knowledge (which in fact I have) of the lower-level things that are going on and which the GUI presents to me through this visual metaphor. Some explanations work better than others. The idea that there is stuff out there that gives rise to my perceptions, and which I can act on with predictable results, seems to me the obvious explanation that any other contender will have to do a great deal of work to topple from the plinth. The various philosophical arguments over doctrines such as "idealism", "realism", and so on are more like a musical recreation (see my other comment) than anything to take seriously as a search for truth. They are hardly the sort of thing that can be right or wrong, and to the extent that they are, they are all wrong. Ok, that's my personal view of a lot of philosophy, but I'm not the only one.
2David Gross
It sounds like you want to say things like "coherence and persistent similarity of structure in perceptions demonstrates that perceptions are representations of things external to the perceptions themselves" or "the idea that there is stuff out there seems the obvious explanation" or "explanations that work better than others are the best alternatives in the search for truth" and yet you also want to say "pish, philosophy is rubbish; I don't need to defend an opinion about realism or idealism or any of that nonsense". In fact what you're doing isn't some alternative to philosophy, but a variety of it.

Some philosophy is rubbish. Quite a lot, I believe. And with a statement such as "perceptions are caused by things external to the perceptions themselves", which I find unremarkable in itself as a prima facie obvious hypothesis to run with, there is a tendency for philosophers to go off the rails immediately by inventing precise definitions of words such as "perceptions", "are", and "caused", and elaborating all manner of quibbles and paradoxes. Hence the whole tedious catalogue of realisms.

Science did not get anywhere by speculating on whether there are four or five elements and arguing about their natures.

On an apparent missing mood - FOMO on all the vast amounts of automated AI safety R&D that could (almost already) be produced safely 

Automated AI safety R&D could results in vast amounts of work produced quickly. E.g. from Some thoughts on automating alignment research (under certain assumptions detailed in the post): 

each month of lead that the leader started out with would correspond to 15,000 human researchers working for 15 months.

Despite this promise, we seem not to have much knowledge when such automated AI safety R&D might happ... (read more)

Showing 3 of 5 replies (Click to show all)
1Bogdan Ionut Cirstea
Seems like probably the modal scenario to me too, but even limited exceptions like the one you mention seem to me like they could be very important to deploy at scale ASAP, especially if they could be deployed using non-x-risky systems (e.g. like current ones, very bad at DC evals). This seems good w.r.t. automated AI safety potentially 'piggybacking', but bad for differential progress. Sure, though wouldn't this suggest at least focusing hard on (measuring / eliciting) what might not come at the same time? 
2ryan_greenblatt
Why think this is important to measure or that this already isn't happening? E.g., on the current model organism related project I'm working on, I automate inspecting reasoning traces in various ways. But I don't feel like there is any particularly interesting thing going on here which is important to track (e.g. this tip isn't more important than other tips for doing LLM research better).

Intuitively, I'm thinking of all this as something like a race between [capabilities enabling] safety and [capabilities enabling dangerous] capabilities (related: https://aligned.substack.com/i/139945470/targeting-ooms-superhuman-models); so from this perspective, maintaining as large a safety buffer as possible (especially if not x-risky) seems great. There could also be something like a natural endpoint to this 'race', corresponding to being able to automate all human-level AI safety R&D safely (and then using this to produce a scalable solution to a... (read more)

keltan134

Note to self, write a post about the novel akrasia solutions I thought up before becoming a rationalist.

  • Figuring out how to want to want to do things
  • Personalised advertising of Things I Wanted to Want to Do
  • What I do when all else fails
5trevor
Have you tried whiteboarding-related techniques? I think that suddenly starting to using written media (even journals), in an environment without much or any guidance, is like pressing too hard on the gas; you're gaining incredible power and going from zero to one on things faster than you ever have before.  Depending on their environment and what they're interested in starting out, some people might learn (or be shown) how to steer quickly, whereas others might accumulate/scaffold really lopsided optimization power and crash and burn (e.g. getting involved in tons of stuff at once that upon reflection was way too much for someone just starting out).
keltan10

This seems incredibly interesting to me. Googling “White-boarding techniques” only gives me results about digitally shared idea spaces. Is this what you’re referring to? I’d love to hear more on this topic.

4keltan
Maybe I could even write a sequence on this?

Unfortunately, it looks like non-disparagement clauses aren't unheard of in general releases:

http://www.shpclaw.com/Schwartz-Resources/severance-and-release-agreements-six-6-common-traps-and-a-rhetorical-question

Release Agreements commonly include a “non-disparagement” clause – in which the employee agrees not to disparage “the Company.”

https://joshmcguirelaw.com/civil-litigation/adventures-in-lazy-lawyering-the-broad-general-release

The release had a very broad definition of the company (including officers, directors, shareholders, etc.), but a fairly reas

... (read more)
Wei Dai502

AI labs are starting to build AIs with capabilities that are hard for humans to oversee, such as answering questions based on large contexts (1M+ tokens), but they are still not deploying "scalable oversight" techniques such as IDA and Debate. (Gemini 1.5 report says RLHF was used.) Is this more good news or bad news?

Good: Perhaps RLHF is still working well enough, meaning that the resulting AI is following human preferences even out of training distribution. In other words, they probably did RLHF on large contexts in narrow distributions, with human rater... (read more)

Showing 3 of 8 replies (Click to show all)

Bad: AI developers haven't taken alignment seriously enough to have invested enough in scalable oversight, and/or those techniques are unworkable or too costly, causing them to be unavailable.

Turns out at least one scalable alignment team has been struggling for resources. From Jan Leike (formerly co-head of Superalignment at OpenAI):

Over the past few months my team has been sailing against the wind. Sometimes we were struggling for compute and it was getting harder and harder to get this crucial research done.

Even worse, apparently the whole Supera... (read more)

4ryan_greenblatt
I'm skeptical that increased scale makes hacking the reward model worse. Of course, it could (and likely will/does) make hacking human labelers more of a problem, but this isn't what the comment appears to be saying. Note that the reward model is of the same scale as the base model, so the relative scale should be the same. This also contradicts results from an earlier paper by Leo Gao. I think this paper is considerably more reliable than the comment overall, so I'm inclined to believe the paper or think that I'm misunderstanding the comment. Additionally, from first principles I think that RLHF sample efficiency should just increase with scale (at least with well tuned hyperparameters) and I think I've heard various things that confirm this.
2ryan_greenblatt
Oops, fixed.
Load More