LessWrong

We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"

Predicting the future is hard, so it’s no surprise that we occasionally miss important developments.

However, several times recently, in the contexts of Covid forecasting and AI progress, I noticed that I missed some crucial feature of a development I was interested in getting right, and it felt to me like I could’ve seen it coming if only I had tried a little harder. (Some others probably did better, but I could imagine that I wasn't the only one who got things wrong.)

Maybe this is hindsight bias, but if there’s something to it, I want to distill the nature of the mistake.

First, here are the examples that prompted me to take notice:

Predicting the course of the Covid pandemic:

I didn’t foresee the contribution from sociological factors (e.g., “people not wanting

...

(Continue Reading – 1307 more words)

mic17m10

Hindsight is 20/20. I think you're underemphasizing how our current state of affairs is fairly contingent on social factors, like the actions of people concerned about AI safety.

For example, I think this world is actually quite plausible, not incongruent:

A world where AI capabilities progressed far enough to get us to something like chat-gpt, but somehow this didn’t cause a stir or wake-up moment for anyone who wasn’t already concerned about AI risk.

I can easily imagine a counterfactual world in which:

ChatGPT shows that AI is helpful, safe, and easy to ali

Connor Kissane, robertzk, Arthur Conmy, Neel Nanda

Ω 354mo

This post is the result of a 2 week research sprint project during the training phase of Neel Nanda’s MATS stream.

Executive Summary

We replicate Anthropic's MLP Sparse Autoencoder (SAE) paper on attention outputs and it works well: the SAEs learn sparse, interpretable features, which gives us insight into what attention layers learn. We study the second attention layer of a two layer language model (with MLPs).
- Specifically, rather than training our SAE on attn_output, we train our SAE on “hook_z” concatenated over all attention heads (aka the mixed values aka the attention outputs before a linear map - see notation here). This is valuable as we can see how much of each feature’s weights come from each head, which we believe is a promising direction to investigate attention head

...

(Continue Reading – 5430 more words)

Ali Shehper17m10

Since the feature activation is just the dot product (plus encoder bias) of the concatenated z vector and the corresponding column of the encoder matrix, we can rewrite this as the sum of n_heads dot products, allowing us to look at the direct contribution from each head.

Nice work. But I have one comment.

The feature activation is the output of ReLU applied to this dot product plus the encoder bias, and ReLU is a non-linear function. So it is not clear that we can find the contribution of each head to the feature activation.

Is there a place to find the most cited LW articles of all time?

keltan

I expect it would be useful when developing an understanding of the language used on LW.

habryka34m20

What do you mean by "cited"? Do you mean "articles references in other articles on LW" or "articles cited in academic journals" or some other definition?

AI #64: Feel the Mundane Utility

Zvi

12h

It’s happening. The race is on.

Google and OpenAI both premiered the early versions of their fully multimodal, eventually fully integrated AI agents. Soon your phone experience will get more and more tightly integrated with AI. You will talk to your phone, or your computer, and it will talk back, and it will do all the things. It will hear your tone of voice and understand your facial expressions. It will remember the contents of your inbox and all of your quirky preferences.

It will plausibly be a version of Her, from the hit movie ‘Are we sure about building this Her thing, seems questionable?’

OpenAI won this round of hype going away, because it premiered, and for some modalities released, the new GPT-4o. GPT-4o is tearing up the Arena,...

(Continue Reading – 13914 more words)

npostavs41m10

half a billion gallons of fuel in 2023.

There was a correction: this should be half a million gallons.

2AnthonyC44m

Ah yes, strong "Verizon can't do math" vibes here.

9faul_sname7h

Are you using the same definition of "safe" in both places (i.e. "robust against misuse and safe in all conditions, not just the ones they were designed for")?

4Askwho8h

Thanks! one coming up for the other Zvi AI post shortly!

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Gunnar_Zarncke

14h

This is a linkpost for https://arxiv.org/abs/2405.06624

Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum

Abstract:

Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components:

...

(See More – 60 more words)

habryka41m31

I am quite interested in takes from various people in alignment on this agenda. I've engaged with both Davidad's and Bengio's stuff a bunch in the last few months, and I feel pretty confused (and skeptical) about a bunch of it, and would be interested in reading more of what other people have to say.

Ilya Sutskever and Jan Leike resign from OpenAI

165

Zach Stein-Perlman

This is a linkpost for https://www.nytimes.com/2024/05/14/technology/ilya-sutskever-leaving-openai.html

Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.

Reasons are unclear (as usual when safety people leave OpenAI).

The NYT piece and others I've seen don't really have details. Archive of NYT if you want to read it anyway.

OpenAI announced Sutskever's departure in a blogpost.

Sutskever and Leike confirmed their departures in tweets.

Linch41m20

I agree it's not a large commitment in some absolute sense. I think it'd still be instructive to see whether they're able to hit this (not very high) bar.

10Pablo9h

This, and see also Gwern's comment here.

1Jacob L11h

If a six month notice period was the key driver of the timing, I would have very much expected to see the departure announced slightly more than six months from the notable events, rather than (very) slightly less than six months before the notable events. Given Ilya was voting in the majority on November 17th, seems unlikely he would have already resigned six months before the public announcement.

18Arthur Malone15h

Pure speculation: The timing of these departures being the day after the big, attention-grabbing GPT-4o release makes me think that there was a fixed date for Ilya and Jan to leave, and OpenAI lined up the release and PR to drown out coverage. Especially in light of Ilya not (apparently) being very involved with GPT-4o.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

introduction to cancer vaccines

bhauth

12d

This is a linkpost for https://www.bhauth.com/blog/biology/cancer%20vaccines.html

cancer neoantigens

For cells to become cancerous, they must have mutations that cause uncontrolled replication and mutations that prevent that uncontrolled replication from causing apoptosis. Because cancer requires several mutations, it often begins with damage to mutation-preventing mechanisms. As such, cancers often have many mutations not required for their growth, which often cause changes to structure of some surface proteins.

The modified surface proteins of cancer cells are called "neoantigens". An approach to cancer treatment that's currently being researched is to identify some specific neoantigens of a patient's cancer, and create a personalized vaccine to cause their immune system to recognize them. Such vaccines would use either mRNA or synthetic long peptides. The steps required are as follows:

The cancer must develop neoantigens that are sufficiently distinct from human surface

...

(Continue Reading – 1445 more words)

2jmh7h

Following up on the current market state discussion related to Moderna, any thoughts on the Amgen treatment that FDA just approved today? Seems to be a much more targeted treatment but the general approach of targeting specific mutations seems to suggest a "family of drugs" that targets a number of different mutations. If that can cover mutations the cause 90% of cancers seems like it would be a huge win. (But I'm not sure if things work that way!)

bhauth41m20

That new Amgen drug targets a human protein that's mostly only used during embryonic development. I think it's expressed by most cancer cells in maybe around 0.2% of cancer cases. In many of those cases, some of the cancer cells will stop producing it.

Most potential targets have worse side effects and/or are less common.

quila's Shortform

quila

5mo

Emrik44m10

Epic Lizka post is epic.

Also, I absolutely love the word "shard" but my brain refuses to use it because then it feels like we won't get credit for discovering these notions by ourselves. Well, also just because the words "domain", "context", "scope", "niche", "trigger", "preimage" (wrt to a neural function/policy / "neureme") adequately serve the same purpose and are currently more semantically/semiotically granular in my head.

trigger/preimage ⊆ scope ⊆ domain

"niche" is a category in function space (including domain, operation, and codomain), "domain" is a set.

"scope" is great because of programming connotations and can be used as a verb. "This neural function is scoped to these contexts."

2Emrik5h

Aaron Bergman has a vid of himself typing new sentences in real-time, which I found really helpfwl.[1] I wish I could watch lots of people record themselves typing, so I could compare what I do. Being slow at writing can be sign of failure or winning, depending on the exact reasons why you're slow. I'd worry about being "too good" at writing, since that'd be evidence that your brain is conforming your thoughts to the language, instead of conforming your language to your thoughts. English is just a really poor medium for thought (at least compared to e.g. visuals and pre-word intuitive representations), so it's potentially dangerous to care overmuch about it. 1. ^ Btw, Aaron is another person-recommendation. He's awesome. Has really strong self-insight, goodness-of-heart, creativity. (Twitter profile, blog+podcast, EAF, links.) I haven't personally learned a whole bunch from him yet,[2] but I expect if he continues being what he is, he'll produce lots of cool stuff which I'll learn from later. 2. ^ Edit: I now recall that I've learned from him: screwworms (important), and the ubiquity of left-handed chirality in nature (mildly important). He also caused me to look into two-envelopes paradox, which was usefwl for me. Although I later learned about screwworms from Kevin Esvelt at 80kh podcast, so I would've learned it anyway. And I also later learned about left-handed chirality from Steve Mould on YT, but I may not have reflected on it as much.

1quila5h

Record yourself typing?

1weightt an8h

It's also partially the problem with the recipient of communicated message. Sometimes you both have very different background assumptions/intuitive understandings. Sometimes it's just skill issue and the person you are talking to is bad at parsing and all the work of keeping the discussion on the important things / away from trivial undesirable sidelines is left to you. Certainly it's useful to know how to pick your battles and see if this discussion/dialogue is worth what you're getting out of it at all.

Feeling (instrumentally) Rational

Pi Rogers

Contra this post from the Sequences

In Eliezer's sequence post, he makes the following (excellent) point:

I can’t find any theorem of probability theory which proves that I should appear ice-cold and expressionless.

This debunks the then-widely-held view that rationality is counter to emotions. He then goes on to claim that emotions have the same epistemic status as the beliefs they are based on.

For my part, I label an emotion as “not rational” if it rests on mistaken beliefs, or rather, on mistake-producing epistemic conduct. “If the iron approaches your face, and you believe it is hot, and it is cool, the Way opposes your fear. If the iron approaches your face, and you believe it is cool, and it is hot, the Way opposes your calm.”

I think Eliezer is...

(See More – 99 more words)

Mikhail Samin1h20

(From the top of my head, maybe I’ll change my mind if I think about it more or see a good point.) What can be destroyed by truth, shall be. Emotions and beliefs are entangled. If you don’t think about how high p(doom) actually is because on the back of your mind you don’t want to be sad, you end up working on things that don’t reduce p(doom).

As long as you know the truth, emotions are only important depending on your terminal values. But many feelings are related to what we end up believing, motivated cognition, etc.

2cubefox7h

It seems instrumental rationality is an even worse tool to classify "irrational" emotions. Instrumental rationality is about actions, or intentions and desires, but emotions are neither of those. We can decide what to do, but we can't decide what emotions to have.

1Pi Rogers7h

Emotions can be treated as properties of the world, optimized with respect to constraints like anything else. We can't edit our emotions directly but we can influence them.

2cubefox7h

We can "influence" them only insofar we can "influence" what we want or believe: to a very low degree.

LESSWRONG
LW

LessOnline Festival

May 31st - June 2nd, in Berkeley CA

Quick Takes

Popular Comments

Recent Discussion

Executive Summary

cancer neoantigens