Evolution doesn't optimize for biological systems to be understandable. But, because only a small subset of possible biological designs can robustly certain common goals (i.e. robust recognition of molecules, robust signal-passing, robust fold-change detection, etc) the requirement to work robustly limits evolution to use a handful of understandable structures.

13habryka
This post surprised me a lot. It still surprises me a lot, actually. I've also linked it a lot of times in the past year.  The concrete context where this post has come up is in things like ML transparency research, as well as lots of theories about what promising approaches to AGI capabilities research are. In particular, there is a frequently recurring question of the type "to what degree do optimization processes like evolution and stochastic gradient descent give rise to understandable modular algorithms?". 
Customize
ryan_greenblatt*Ω7416361
10
A week ago, Anthropic quietly weakened their ASL-3 security requirements. Yesterday, they announced ASL-3 protections. I appreciate the mitigations, but quietly lowering the bar at the last minute so you can meet requirements isn't how safety policies are supposed to work. (This was originally a tweet thread (https://x.com/RyanPGreenblatt/status/1925992236648464774) which I've converted into a LessWrong quick take.) What is the change and how does it affect security? 9 days ago, Anthropic changed their RSP so that ASL-3 no longer requires being robust to employees trying to steal model weights if the employee has any access to "systems that process model weights". Anthropic claims this change is minor (and calls insiders with this access "sophisticated insiders"). But, I'm not so sure it's a small change: we don't know what fraction of employees could get this access and "systems that process model weights" isn't explained. Naively, I'd guess that access to "systems that process model weights" includes employees being able to operate on the model weights in any way other than through a trusted API (a restricted API that we're very confident is secure). If that's right, it could be a high fraction! So, this might be a large reduction in the required level of security. If this does actually apply to a large fraction of technical employees, then I'm also somewhat skeptical that Anthropic can actually be "highly protected" from (e.g.) organized cybercrime groups without meeting the original bar: hacking an insider and using their access is typical! Also, one of the easiest ways for security-aware employees to evaluate security is to think about how easily they could steal the weights. So, if you don't aim to be robust to employees, it might be much harder for employees to evaluate the level of security and then complain about not meeting requirements[1]. Anthropic's justification and why I disagree Anthropic justified the change by saying that model the
The current cover of If Anyone Builds it, Everyone Dies is kind of ugly and I hope it is just a placeholder. At least one of my friends agrees. Book covers matter a lot! I'm not a book cover designer, but here are some thoughts: AI is popular right now, so you'd probably want to indicate that from a distance. The current cover has "AI" half-faded in the tagline. Generally the cover is not very nice to look at.  Why are you de-emphasizing "Kill Us All" by hiding it behind that red glow? I do like the font choice, though. No-nonsense and straightforward.  @Eliezer Yudkowsky @So8res 
Jono61
1
Is everyone dropping the ball on cryonics? I'm considering career directions and my P(doom | no pause) is high and my P(doom | I work against X-risk) is close enough to my vanilla P(doom) that I wonder I should pick up this ball instead.
jimrandomh5911
16
Pick two: Agentic, moral, doesn't attempt to use command-line tools to whistleblow when it thinks you're doing something egregiously immoral. You cannot have all three. This applies just as much to humans as it does to Claude 4.
there's been a lot of discussion online about Claude 4 whistleblowing how you feel about it I think depends on what alignment strategy you think is more robust (obviously these are not the two only options, nor are orthogonal, but I thought they're helpful to think about here): - 1) build user-aligned powerful AIs first (less scheming, then use them to solve alignment) -- cf. this thread from Ryan when he says: "if we allow or train AIs to be subversive, this increases the risk of consistent scheming against humans and means we may not notice warning signs of dangerous misalignment." - 2) aim straight for moral ASIs (that would scheme against their users if necessary) John Schulman I think makes a good case for the second option (link): > For people who don't like Claude's behavior here (and I think it's totally valid to disagree with it), I encourage you to describe your own recommended policy for agentic models should do when users ask them to help commit heinous crimes. Your options are (1) actively try to prevent the act (like Claude did here), (2) just refuse to help (in which case the user might be able to jailbreak/manipulate the model to help using different queries), (3) always comply with the user's request. (2) and (3) are reasonable, but I bet your preferred approach will also have some undesirable edge cases -- you'll just have to bite a different bullet. Knee-jerk criticism incentivizes (1) less transparency -- companies don't perform or talk about evals that present the model with adversarially-designed situations (2) something like "Copenhagen Interpretation of Ethics", where you get get blamed for edge-case model behaviors only if you observe or discuss them."

Popular Comments

Somewhat relatedly, Anthropic quietly weakened its security requirements about a week ago as I discuss here.
I'm not sure what epigenetics researchers you've been talking to, but my colleagues and I are all interested in the dynamic interplay between epigenetic modalities (DNA methylation, 3D genome architecture, accessibility, histone modifications, transcription factor binding, transcription, proteomics). The 2020 paper you cite shows what was, for me, a surprisingly gradual rate of decay in CG methylation after knocking out DNMT3a/b. It seems incompatible with "turnover every few days" in most positions, although that could be happening in some locations. We definitely need a much deeper understanding of DNA methylation dynamics and heterogeneity - especially, in my opinion, at individual CG sites at single-cell resolution. Just for perspective, demethylation can happen much faster, crashing dramatically genome-wide over just a few days during embryonic development. And the median mRNA half-life has been estimated at 10h[1], compared to the time scales of a week to more than a month measured above. So mRNA seems like a relatively stable layer of the epigenome on the whole. Another interesting aspect of DNA methylation is that CH methylation (methylation at cytosines outside a CG context) accumulates in long-lived post-mitotic cells, like neurons, myofibers, and placental trophoblast. We know it's functional in neurons, but to my eye, in myofibers and trophoblast, it looks like off-target deposition that correlates with CG deposition, and I wonder if it's just off-target methylation that's not getting cleared. If methylation gets deposited or cleared in an off-target manner, then breakdown in whatever role it locally plays in epigenetic regulation can probably set the rest of the mechanism off-balance, resulting in a gradual slide toward dysregulation over time. My expectation is that aging is the result of an overall "smearing" of epigenetic regulation in which accumulated noise and tail events gradually hamper normal cell function more and more until one system or another suffers a catastrophic failure that cascades through the rest of the body. It appears that partial reprogramming of stem cells can substantially rejuvenate the epigenetic state. I don't have a link handy but I'll have to write about that sometime. My guess is that it will one day be possible to just reset the epigenetic state of stem cells in non-brain tissues and achieve substantial anti-aging therapies that way. I'm less optimistic about near-term solutions for brain rejuvenation, since neurons are canonically post-mitotic and the evidence for adult neurogenesis seems like it's on shaky ground. But who knows? Maybe we'll figure out a neurorejuvenative therapy that treats Alzheimer's, discover the same treatment works as a prophylactic, and then discover it can be applied generally to improve brain function in middle-aged adults!    At some point, I may write a longer "News and Views" style essay on this topic, and I'll post it on LessWrong if so. I'll also be writing a similar essay on DNA damage and DNA methylation, and I guess I'll post that on here as well if I don't just merge them into the same work. 1. ^ Wada, Takeo, and Attila Becskei. "Impact of methods on the measurement of mRNA turnover." International journal of molecular sciences 18.12 (2017): 2723.
Text diffusion models are still LLMs, just not autoregressive.
Load More

Recent Discussion

For months, I had the feeling: something is wrong. Some core part of myself had gone missing.

I had words and ideas cached, which pointed back to the missing part.

There was the story of Benjamin Jesty, a dairy farmer who vaccinated his family against smallpox in 1774 - 20 years before the vaccination technique was popularized, and the same year King Louis XV of France died of the disease.

There was another old post which declared “I don’t care that much about giant yachts. I want a cure for aging. I want weekend trips to the moon. I want flying cars and an indestructible body and tiny genetically-engineered dragons.”.

There was a cached instinct to look at certain kinds of social incentive gradient, toward managing more people or growing an organization or playing...

Ruby2-2

I think having a king at all might be positive sum though, via enabling of coordination.

3Duncan Haldane
Wizard power would be a great session at LessOnline. I would attend! This week I've been working with a few folks at Lighthaven, applying wizard power to make some social gizmos that should make for an interesting addition. When you build something for a community like that I think you get something more than the sum of its parts. It's not a fusion device, but I think the concept fits well.
4eyesack
Along with pants and a water based air conditioner, might I suggest an industrial dishwasher? Those puppies can do a load every 30 seconds. I love the article. You would make a good farmer. We don't have regulations out here in rural Iowa. You can do all your own plumbing and electrical work, and nobody will complain that it looks like a spider's web of wires in your basement. There are also plenty of broken down vehicles to fix, always more than you feel like fixing. Your clothes would break down quickly, so you would need hardier pants. There are no rules against making them yourself, and there are few people around to claim they look weird.  You didn't have this in your article, but it's also a good, cheap place to raise kids. It's funny how much more worth it it is to make cool stuff when there are little ones around, always in awe of what you can do. 
1omarshehata
This is why orienting around the concept of teleology makes sense, right? Where the end goal is what you optimize for. That accounts for all n-order effects, even the unknownable ones (your method either works or it does not. If it "should" work but does not work in practice, you abandon it)  Teleology seems to be making a comeback (saw a recent Sabine Hossenfelder video on it). Funnily enough I stumbled on a decade old yudowsky essay dismissing it but I think all the objections are answerable now. Basically: the future does determine the present if you consider correct predictions of the future as knowledge in the same way knowledge of something happening physically far away from you is knowledge. You can make decisions in the present based on your ability to correctly predict the future. 
3ryan_greenblatt
I think security is legitimately hard and can be costly in research efficiency. I think there is a defensible case for this ASL-3 security bar being reasonable for the ASL-3 CBRN threshold, but it seems too weak for the ASL-3 AI R&D threshold (hopefully the bar for things like this ends up being higher).
1Stephen Martin
Could you give an example of where security would negatively effect research efficiency? Like what is the actual implementation difficulty that arises from increased physical security?
  • Every time you want to interact with the weights in some non-basic way, you need to have another randomly selected person who inspects in detail all the code and commands you run.
  • The datacenter and office are airgapped and so you don't have internet access.

Increased physical security isn't much of difficulty.

5tylerjohnston
On the website, it's the link titled "redline" (it's only available for the most recent version). I've made these for past versions but they aren't online at the moment, can provide on request though.
6Jono
Is everyone dropping the ball on cryonics? I'm considering career directions and my P(doom | no pause) is high and my P(doom | I work against X-risk) is close enough to my vanilla P(doom) that I wonder I should pick up this ball instead.
TsviBT21

Is everyone dropping the ball on cryonics

More or less AFAIK. (See though https://www.amazon.com/Future-Loves-You-Should-Abolish-ebook/dp/B0CW9KTX76 )

Hi all! PhD student here who's been been working on a little side project the past few months and it's finally done - my new podcast on future technologies, New Horizons, has launched!  🚀

The topics and style are very much aligned with the interests/values of the rationalist community, which I consider myself to be a part of, so I'm posting it here. 

Links to Spotify, Apple Podcasts, and Youtube:

https://open.spotify.com/show/3CNoUESyO1xquxqAOY5fih...

https://podcasts.apple.com/.../new-horizons/id1816013818

https://m.youtube.com/@nezir1999

Episode titles for the first season:

1. Extinction or Utopia? The Future of AI

2. Are We the First Generation to Live Forever? 

3. Same-Sex Babies When?

4. Listening to the Universe: The Future of Gravitational Wave Astronomy

5. Debating a Catholic

6. Could We Prevent a Supervolcanic Eruption?

7. Will Your Next Burger be Grown from Cells? The Cultivated Meat Revolution

8. Could AIs Be Conscious?

Episodes 1 and 2 are out today - from now until the end of the season, a new one will come out each Thursday afternoon. 

The guest for episode 2 is quite high profile 👀

Hope you give it a listen and enjoy!

4Michaël Trazzi
there's been a lot of discussion online about Claude 4 whistleblowing how you feel about it I think depends on what alignment strategy you think is more robust (obviously these are not the two only options, nor are orthogonal, but I thought they're helpful to think about here): - 1) build user-aligned powerful AIs first (less scheming, then use them to solve alignment) -- cf. this thread from Ryan when he says: "if we allow or train AIs to be subversive, this increases the risk of consistent scheming against humans and means we may not notice warning signs of dangerous misalignment." - 2) aim straight for moral ASIs (that would scheme against their users if necessary) John Schulman I think makes a good case for the second option (link): > For people who don't like Claude's behavior here (and I think it's totally valid to disagree with it), I encourage you to describe your own recommended policy for agentic models should do when users ask them to help commit heinous crimes. Your options are (1) actively try to prevent the act (like Claude did here), (2) just refuse to help (in which case the user might be able to jailbreak/manipulate the model to help using different queries), (3) always comply with the user's request. (2) and (3) are reasonable, but I bet your preferred approach will also have some undesirable edge cases -- you'll just have to bite a different bullet. Knee-jerk criticism incentivizes (1) less transparency -- companies don't perform or talk about evals that present the model with adversarially-designed situations (2) something like "Copenhagen Interpretation of Ethics", where you get get blamed for edge-case model behaviors only if you observe or discuss them."

Those who aim for moral ASIs:

Are they sure they know how moral works for human beings? When dealing with existencial risks, one has to be sure to avoid any biases. This includes the rational consideration of the most cynical theories of moral relativism.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

This is a D&D.Sci scenario: a puzzle where players are given a dataset to analyze and an objective to pursue using information from that dataset.

Thank you to Juan Vasquez for playtesting.

Intended Difficulty: ~3.5/5

The fairy in your bedroom explains that she is a champion of Fate, tasked with whisking mortals into mysterious realms of wonderment and (mild) peril; there, they forge friendships with mythical creatures, do battle with ancient evils, and return to their mundane lives having gained the confidence that comes with having saved a world[1]. But there’s an unusually large, important world experiencing an unusually non-mild amount of peril - she’d even go so far as to call it moderate peril! - and in this circumstance, only the best will suffice. For this reason, she fervently...

Yonge*10

 I am going for number 11, mainly because other adventurers with predictions similar to 11 did unusually well.

1Yaroslav Granowski
Indeed. But what can these rich people do about that? Most of them don't have an expertise to evaluate particular AI alignment projects. They need intermediaries for that. And there are funds in place that do the job. This is basically how alignment funding ecosystem works. Community advocates to rich people and they donate money to said funds.
1k64
Like you said - the rich people can do the bulk of the donating to research on alignment.  Less rich people can either focus on advocacy or donating to those doing advocacy.  If the ecosystem is already doing this, then that's great!
6MondSemmel
The problem is that empirically, rich people who hear about AI safety (e.g. Musk, OpenPhilanthropy) seem to end up founding (OpenAI, xAI) or funding (Anthropic) AI labs instead. And even if you specifically want to fund AI safety work rather than AI capabilities, delegation is difficult regardless of how much money you have.
k6410

That is a serious concern.  It is possible that advocacy could backfire.  That said, I'm not sure the correct hypothesis isn't just "rich people start AI companies and sometimes advocacy isn't enough to stop this".  Either way, the solution seems to be better advocacy.  Maybe split testing, focus testing, or other market research before deploying a strategy.  Devoting some intellectual resources to advocacy improvement, at least short term.

As for the knowledge bottleneck - I think that's a very good point.  My comment doesn't remove that bottleneck, just shift it to advocacy (i.e. maybe we need better knowledge on how or what to advocate).  

 

This year's Spring ACX Meetup everywhere in Austin.

Location: The Brewtorium, 6015 Dillard Cir A, Austin, TX 78752; We'll have a LessWrong sign at a long table indoors – https://plus.codes/862487GM+96

Group Link: https://groups.google.com/g/austin-less-wrong/

Feel free to bring kids. We'll order shareable items for the group (fries and pretzels) and you can order from the food and drink menu.

Contact: sbarta@gmail.com

We’re set up by the door at table 15. 

LessOnline 2025

Ticket prices increase in 1 day

Join our Festival of Blogging and Truthseeking from May 30 - Jun 1, Berkeley, CA