LessOnline Festival

May 31st - June 2nd, in Berkeley CA

A festival of truth-seeking, optimization, and blogging. We'll have writing workshops, rationality classes, puzzle hunts, and thoughtful conversations across a sprawling fractal campus of nooks and whiteboards.

Buy Tickets

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
keltan70
1
Note to self, write a post about the novel akrasia solutions I thought up before becoming a rationalist. * Figuring out how to want to want to do things * Personalised advertising of Things I Wanted to Want to Do * What I do when all else fails
Several dozen people now presumably have Lumina in their mouths. Can we not simply crowdsource some assays of their saliva? I would chip money in to this. Key questions around ethanol levels, aldehyde levels, antibacterial levels, and whether the organism itself stays colonized at useful levels.
niplav62
0
Just checked who from the authors of the Weak-To-Strong Generalization paper is still at OpenAI: * Collin Burns * Jan Hendrick Kirchner * Leo Gao * Bowen Baker * Yining Chen * Adrian Ecoffet * Manas Joglekar * Yeff Wu Gone are: * Ilya Sutskever * Pavel Izmailov[1] * Jan Leike * Leopold Aschenbrenner ---------------------------------------- 1. Reason unknown ↩︎
quila94
4
(Personal) On writing and (not) speaking I often struggle to find words and sentences that match what I intend to communicate. Here are some problems this can cause: 1. Wordings that are odd or unintuitive to the reader, but that are at least literally correct.[1] 2. Not being able express what I mean, and having to choose between not writing it, or risking miscommunication by trying anyways. I tend to choose the former unless I'm writing to a close friend. Unfortunately this means I am unable to express some key insights to a general audience. 3. Writing taking lots of time: I usually have to iterate many times on words/sentences until I find one which my mind parses as referring to what I intend. In the slowest cases, I might finalize only 2-10 words per minute. Even after iterating, my words are often interpreted in ways I failed to foresee. These apply to speaking, too. If I speak what would be the 'first iteration' of a sentence, there's a good chance it won't create an interpretation matching what I intend to communicate. In spoken language I have no chance to constantly 'rewrite' my output before sending it. This is one reason, but not the only reason, that I've had a policy of trying to avoid voice-based communication. I'm not fully sure what caused this relationship to language. It could be that it's just a byproduct of being autistic. It could also be a byproduct of out-of-distribution childhood abuse.[2] 1. ^ E.g., once I couldn't find the word 'clusters,' and wrote a complex sentence referring to 'sets of similar' value functions each corresponding to a common alignment failure mode / ASI takeoff training story. (I later found a way to make it much easier to read) 2. ^ (Content warning) My primary parent was highly abusive, and would punish me for using language in the intuitive 'direct' way about particular instances of that. My early response was to try to euphemize and say-differently in a way that contradicted less the power dynamic / social reality she enforced. Eventually I learned to model her as a deterministic system and stay silent / fawn.
Epistemic status: not a lawyer, but I've worked with a lot of them. As I understand it, an NDA isn't enforceable against a subpoena (though the former employer can seek a protective order for the testimony).   Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...

Popular Comments

Recent Discussion

This is a linkpost for https://arxiv.org/abs/2405.06624

Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum

Abstract:

Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components:

...
3habryka
I am quite interested in takes from various people in alignment on this agenda. I've engaged with both Davidad's and Bengio's stuff a bunch in the last few months, and I feel pretty confused (and skeptical) about a bunch of it, and would be interested in reading more of what other people have to say.

I wrote up some of my thoughts on Bengio's agenda here.

TLDR: I'm excited about work on trying to find any interpretable hypothesis which can be highly predictive on hard prediction tasks (e.g. next token prediction).[1] From my understanding, the bayesian aspect of this agenda doesn't add much value.

I might collaborate with someone to write up a more detailed version of this view which engages in detail and is more clearly explained. (To make it easier to argue against and to exist as a more canonical reference.)

As far as Davidad, I think the "manually bui... (read more)

This post is the result of a 2 week research sprint project during the training phase of Neel Nanda’s MATS stream. 

Executive Summary

  • We replicate Anthropic's MLP Sparse Autoencoder (SAE) paper on attention outputs and it works well: the SAEs learn sparse, interpretable features, which gives us insight into what attention layers learn. We study the second attention layer of a two layer language model (with MLPs).
    • Specifically, rather than training our SAE on attn_output, we train our SAE on “hook_z” concatenated over all attention heads (aka the mixed values aka the attention outputs before a linear map - see notation here). This is valuable as we can see how much of each feature’s weights come from each head, which we believe is a promising direction to investigate attention head
...
1Ali Shehper
  Nice work. But I have one comment. The feature activation is the output of ReLU applied to this dot product plus the encoder bias, and ReLU is a non-linear function. So it is not clear that we can find the contribution of each head to the feature activation. 

This could also be the reason behind the issue mentioned in footnote 5. 

1Emrik
Epic Lizka post is epic. Also, I absolutely love the word "shard" but my brain refuses to use it because then it feels like we won't get credit for discovering these notions by ourselves. Well, also just because the words "domain", "context", "scope", "niche", "trigger", "preimage" (wrt to a neural function/policy / "neureme") adequately serve the same purpose and are currently more semantically/semiotically granular in my head. trigger/preimage ⊆ scope ⊆ domain "niche" is a category in function space (including domain, operation, and codomain), "domain" is a set. "scope" is great because of programming connotations and can be used as a verb. "This neural function is scoped to these contexts."
2Emrik
Aaron Bergman has a vid of himself typing new sentences in real-time, which I found really helpfwl.[1] I wish I could watch lots of people record themselves typing, so I could compare what I do. Being slow at writing can be sign of failure or winning, depending on the exact reasons why you're slow. I'd worry about being "too good" at writing, since that'd be evidence that your brain is conforming your thoughts to the language, instead of conforming your language to your thoughts. English is just a really poor medium for thought (at least compared to e.g. visuals and pre-word intuitive representations), so it's potentially dangerous to care overmuch about it. 1. ^ Btw, Aaron is another person-recommendation. He's awesome. Has really strong self-insight, goodness-of-heart, creativity. (Twitter profile, blog+podcast, EAF, links.) I haven't personally learned a whole bunch from him yet,[2] but I expect if he continues being what he is, he'll produce lots of cool stuff which I'll learn from later. 2. ^ Edit: I now recall that I've learned from him: screwworms (important), and the ubiquity of left-handed chirality in nature (mildly important). He also caused me to look into two-envelopes paradox, which was usefwl for me. Although I later learned about screwworms from Kevin Esvelt at 80kh podcast, so I would've learned it anyway. And I also later learned about left-handed chirality from Steve Mould on YT, but I may not have reflected on it as much.
1quila
Record yourself typing?

Predicting the future is hard, so it’s no surprise that we occasionally miss important developments.

However, several times recently, in the contexts of Covid forecasting and AI progress, I noticed that I missed some crucial feature of a development I was interested in getting right, and it felt to me like I could’ve seen it coming if only I had tried a little harder. (Some others probably did better, but I could imagine that I wasn't the only one who got things wrong.)

Maybe this is hindsight bias, but if there’s something to it, I want to distill the nature of the mistake.

First, here are the examples that prompted me to take notice:

Predicting the course of the Covid pandemic:

  • I didn’t foresee the contribution from sociological factors (e.g., “people not wanting
...
mic10

Hindsight is 20/20. I think you're underemphasizing how our current state of affairs is fairly contingent on social factors, like the actions of people concerned about AI safety.

For example, I think this world is actually quite plausible, not incongruent:

A world where AI capabilities progressed far enough to get us to something like chat-gpt, but somehow this didn’t cause a stir or wake-up moment for anyone who wasn’t already concerned about AI risk.

I can easily imagine a counterfactual world in which:

  • ChatGPT shows that AI is helpful, safe, and easy to ali
... (read more)

I expect it would be useful when developing an understanding of the language used on LW.

What do you mean by "cited"? Do you mean "articles references in other articles on LW" or "articles cited in academic journals" or some other definition?

It’s happening. The race is on.

Google and OpenAI both premiered the early versions of their fully multimodal, eventually fully integrated AI agents. Soon your phone experience will get more and more tightly integrated with AI. You will talk to your phone, or your computer, and it will talk back, and it will do all the things. It will hear your tone of voice and understand your facial expressions. It will remember the contents of your inbox and all of your quirky preferences.

It will plausibly be a version of Her, from the hit movie ‘Are we sure about building this Her thing, seems questionable?’

OpenAI won this round of hype going away, because it premiered, and for some modalities released, the new GPT-4o. GPT-4o is tearing up the Arena,...

half a billion gallons of fuel in 2023.

There was a correction: this should be half a million gallons.

3AnthonyC
Ah yes, strong "Verizon can't do math" vibes here.
9faul_sname
Are you using the same definition of "safe" in both places (i.e. "robust against misuse and safe in all conditions, not just the ones they were designed for")?
4Askwho
Thanks! one coming up for the other Zvi AI post shortly!
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist.

Reasons are unclear (as usual when safety people leave OpenAI).

The NYT piece and others I've seen don't really have details. Archive of NYT if you want to read it anyway.

OpenAI announced Sutskever's departure in a blogpost.

Sutskever and Leike confirmed their departures in tweets.

Linch20

I agree it's not a large commitment in some absolute sense. I think it'd still be instructive to see whether they're able to hit this (not very high) bar.

10Pablo
This, and see also Gwern's comment here.
1Jacob L
If a six month notice period was the key driver of the timing, I would have very much expected to see the departure announced slightly more than six months from the notable events, rather than (very) slightly less than six months before the notable events.  Given Ilya was voting in the majority on November 17th, seems unlikely he would have already resigned six months before the public announcement.
18Arthur Malone
Pure speculation: The timing of these departures being the day after the big, attention-grabbing GPT-4o release makes me think that there was a fixed date for Ilya and Jan to leave, and OpenAI lined up the release and PR to drown out coverage. Especially in light of Ilya not (apparently) being very involved with GPT-4o.

cancer neoantigens

For cells to become cancerous, they must have mutations that cause uncontrolled replication and mutations that prevent that uncontrolled replication from causing apoptosis. Because cancer requires several mutations, it often begins with damage to mutation-preventing mechanisms. As such, cancers often have many mutations not required for their growth, which often cause changes to structure of some surface proteins.

The modified surface proteins of cancer cells are called "neoantigens". An approach to cancer treatment that's currently being researched is to identify some specific neoantigens of a patient's cancer, and create a personalized vaccine to cause their immune system to recognize them. Such vaccines would use either mRNA or synthetic long peptides. The steps required are as follows:

  1. The cancer must develop neoantigens that are sufficiently distinct from human surface
...
2jmh
Following up on the current market state discussion related to Moderna, any thoughts on the Amgen treatment that FDA just approved today? Seems to be a much more targeted treatment but the general approach of targeting specific mutations seems to suggest a "family of drugs" that targets a number of different mutations. If that can cover mutations the cause 90% of cancers seems like it would be a huge win. (But I'm not sure if things work that way!)
bhauth20

That new Amgen drug targets a human protein that's mostly only used during embryonic development. I think it's expressed by most cancer cells in maybe around 0.2% of cancer cases. In many of those cases, some of the cancer cells will stop producing it.

Most potential targets have worse side effects and/or are less common.

Contra this post from the Sequences

In Eliezer's sequence post, he makes the following (excellent) point:

I can’t find any theorem of probability theory which proves that I should appear ice-cold and expressionless.

This debunks the then-widely-held view that rationality is counter to emotions. He then goes on to claim that emotions have the same epistemic status as the beliefs they are based on.

For my part, I label an emotion as “not rational” if it rests on mistaken beliefs, or rather, on mistake-producing epistemic conduct. “If the iron approaches your face, and you believe it is hot, and it is cool, the Way opposes your fear. If the iron approaches your face, and you believe it is cool, and it is hot, the Way opposes your calm.”

I think Eliezer is...

(From the top of my head, maybe I’ll change my mind if I think about it more or see a good point.) What can be destroyed by truth, shall be. Emotions and beliefs are entangled. If you don’t think about how high p(doom) actually is because on the back of your mind you don’t want to be sad, you end up working on things that don’t reduce p(doom).

As long as you know the truth, emotions are only important depending on your terminal values. But many feelings are related to what we end up believing, motivated cognition, etc.

2cubefox
It seems instrumental rationality is an even worse tool to classify "irrational" emotions. Instrumental rationality is about actions, or intentions and desires, but emotions are neither of those. We can decide what to do, but we can't decide what emotions to have.
1Pi Rogers
Emotions can be treated as properties of the world, optimized with respect to constraints like anything else. We can't edit our emotions directly but we can influence them.
2cubefox
We can "influence" them only insofar we can "influence" what we want or believe: to a very low degree.