A tradition of knowledge is a body of knowledge that has been consecutively and successfully worked on by multiple generations of scholars or practitioners. This post explores the difference between living traditions (with all the necessary pieces to preserve and build knowledge), and dead traditions (where crucial context has been lost).

Customize
I am not an AI successionist because I don't want myself and my friends to die. There are various high-minded arguments that AIs replacing us is okay because it's just like cultural change and our history is already full of those, or because they will be our "mind children", or because they will be these numinous enlightened beings and it is our moral duty to give birth to them. People then try to refute those by nitpicking which kinds of cultural change are okay or not, or to what extent AIs' minds will be descended from ours, or whether AIs will necessarily have consciousnesses and feel happiness. And it's very cool and all, I'd love me some transcendental cultural change and numinous mind-children. But all those concerns are decidedly dominated by "not dying" in my Maslow hierarchy of needs. Call me small-minded. If I were born in 1700s, I'd have little recourse but to suck it up and be content with biological children or "mind-children" students or something. But we seem to have an actual shot at not-dying here[1]. If it's an option to not have to be forcibly "succeeded" by anything, I care quite a lot about trying to take this option.[2] Many other people also have such preferences: for the self-perpetuation of their current selves and their currently existing friends. I think those are perfectly valid. Sure, they're displeasingly asymmetric in a certain sense. They introduce a privileged reference frame: a currently existing human values concurrently existing people more than the people who are just as real, but slightly temporally displaced. It's not very elegant, not very aesthetically pleasing. It implies an utility function that cares not only about states, but also state transitions.[3] Caring about all that, however, is also decidedly dominated by "not dying" in my Maslow hierarchy of needs. If all that delays the arrival of numinous enlightened beings, too bad for the numinous enlightened beings. 1. ^ Via attaining the longevity escape
I just made some dinner and was thinking about how salt and spices[1] now are dirt cheap, but throughout history they were precious and expensive. I did some digging and apparently low and middle class people didn't even really have access to spices. It was more for the wealthy. Salt was important mainly to preserve food. They didn't have fridges back then! So even poor people usually had some amount of salt to preserve small quantities of food, but they had to be smart about how they allocated it. In researching this I came to realize that throughout history, food was usually pretty gross. Meats were partially spoiled, fats went rancid, grains were moldy. This would often cause digestive problems. Food poisoning was a part of life. Could you imagine! That must have been terrible! Meanwhile, today, not only is it cheap to access food that is safe to eat, it's cheap to use basically as much salt and spices as you want. Fry up some potatoes in vegetable oil with salt and spices. Throw together some beans and rice. Incorporate a cheap acid if you're feeling fancy -- maybe some malt vinegar with the potatoes or white vinegar with the beans and rice. It's delicious! I suppose there are tons of examples of how good we have it today, and how bad people had it throughout history. I like thinking about this sort of thing though. I'm not sure why, exactly. I think I feel some sort of obligation. An obligation to view these sorts of things as they actually are rather than how they compare to the Jonses, and to appreciate when I truly do have it good. 1. ^ It feels weird to say the phrase "salt and spices". It feels like it's an error and that I meant to say "salt and pepper". Maybe there's a more elegant way of saying "salt and spices", but it of course isn't an error. It makes me think back to something I heard about "salt and pepper", maybe in the book How To Taste. We often think of them as going together and being on equal footing. They aren't on e
nostalgebraist*Ω84218104
14
Reading the Claude 4 system card and related work from Anthropic (e.g.), I find myself skeptical that the methods described would actually prevent the release of a model that was misaligned in the senses (supposedly) being tested. The system card describes a process in which the same evals are run on many snapshots of a model during training, and the results are used to guide the training process towards making all or most of the evals "pass."  And, although it's not explicitly stated, there seems to be an implicit stopping rule like "we'll keep on doing this until enough of our eval suite passes, and then we'll release the resulting checkpoint." Such a procedure does guarantee, by construction, that any checkpoint you release is a checkpoint that (mostly) passes your tests.  But it's no longer clear that passing your tests is evidence of "alignment," or that it's even desirable on net.  Types of AIs that would pass the tests here include: 1. "Anthropic's nightmare": situationally aware, successfully deceptive, knows it's being evaluated and does sandbagging / alignment-faking / etc in order to get released 2. "Brittle alignment to specific cases": has effectively memorized[1] the exact cases you use in evals as special cases where it shouldn't do the bad behaviors under study, while still retaining the underlying capabilities (or even propensities) and exhibiting them across various contexts you didn't happen to test (and in particular, having no "immune response" against users who are actively attempting to elicit them by any available means[2]) We have plenty of evidence that "brittle" alignment as in (2) can happen, and indeed is happening. As with every other "harmless" LLM assistant in existence, Claude 4's "harmlessness" is extremely brittle and easily circumvented by those with motivation do to so. As for (1), while I strongly doubt that Claude 4 Opus actually is "Anthropic's Nightmare" in practice, very little of that doubt comes from evidence explic
By request from @titotal : Romeo redid the graph including the GPT2 and GPT3 data points, and adjusting the trendlines accordingly.
Scott Aaronson recently did an in-person salon at UT Austin based on his essay: Why Philosophers Should Care About Computational Complexity. Sharing the full video of the discussion since I figured many of you may find it useful. He discussed the Busy Beaver problem, P vs. NP, Cramer’s Conjecture, philosophy of mind, and more: 

Popular Comments

No, I get it. It's definitely way easier said than done. While that story sure sounds frustrating/disappointing (especially the part where the community accepts and respects this person, despite that), it's actually not surprising. Let me climb into what I think this person's shoes might be, from hearing your side of the story... > - Tell Person X what things I was upset about, because Person X started off the meeting by saying "I think my job here is to listen, first." Okay, Person X wanted to hear why I was upset with them. Good. He's been shitty enough that this is certainly called for. At least he has *this* much decency in him. > - Listen to, accept, and thank Person X, explicitly, for their subsequent apology, Look, I'm trying to do my part. I accepted his apology in good faith and gave him the benefit of the doubt even though he really only apologized because he knows how bad it'd look if he didn't. Also, it's not like I had a choice. Other people aren't going to see this pseudo-apology for what it is, and he's just going to make me look bad if I don't. He's really manipulative like that, and good at it because I think on some level he's actually doing his best and doesn't know how manipulative he can be. Still, screw this guy, man. His best just isn't enough, and I'm done trying to pretend it is. > which they offered without asking for any sort of symmetrical concession from me Lol, I didn't do anything wrong. What would he even ask me to apologize for? Getting quite reasonably upset when he objectively wronged me by his own admission? > - Outright lie in that dogpile, and claim that Person X had insisted that I jump through hoops that they never asked me to jump through Maybe he didn't explicitly ask, but we both know what he meant. What kind of fool hears "I think it'd be better if you did X" and doesn't recognize it to be a request for you to to do X? Or like, equate things that aren't literally "insisting" with insisting when the pressure is there. This just sounds like a disingenuous complaint made by someone who is trying to make me look bad. > - Give no sign of any of this ongoing resentment in private communication initiated by Person X on that very same day I mean, I'm trying to move on with my life. What am I to do, frown all day, to no good end? Again, Person X literally isn't capable of better. He doesn't even know he's closed minded, and you can't tell him because he just tries to explain why "you're wrong". There's literally nothing I could say to get this person to see that he's being a jerk. Some people are just stuck in dark worlds, where no matter how hard I try to let go of what he did and move on, he's still going to hold it against me. There's no way to run the test, but I bet you a hundred to one that he'll still be bitching about this in 2025.   Okay, that was fun. Climbing back out of those shoes now, I can totally see where that person might be coming from. I have no idea what you did to upset this person and obviously can't comment on how (un)reasonable it was/is for them to be upset about it, but I don't think it's important. I spent a few years talking to the most unreasonable person I've ever met, explicitly for the purpose of getting practice dealing with unreasonable people. It was this kind of thing on steroids. To give an idea of what kind of person she was, she complained to me about her boyfriend not hitting her when she asked him to, saying that he objected "You're just going to get upset and say you didn't think I'd actually do it!". She then went on to treat him like shit, up to pouring her drink on his head in clear attempt to provoke him into hitting her -- and then she complained to me when he finally did. I asked how she squares that with the fact that she literally asked him to not too long before, and her response was "I didn't think he'd actually do it!".   She was very difficult to get along with. Even when I would try my honest best and not complain at all about how she was shitty towards me, and she'd still get upset with me. And you know I'm not secretly or "unconsciously" trying to get away with something else, since figuring out how to get along with her was literally the only reason I had for talking to her in the first place. It's not the kind of mental/emotional work I'd want to be required to do on a day to day basis, but I did eventually figure it out. I basically had to get very good at showing her that I understood and cared about her perspective -- even if I wasn't also going to be swayed by it. By the time she got her boyfriend to hit her, she was able to laugh at herself when I pointed out how ridiculous she was being, despite how close that is to "It's kinda your fault your boyfriend hit you" which is a pretty triggering statement in general, let alone for people who are extremely sensitive and live in the darkest of worlds. By any normal/reasonable standards, my initial attempts to get along with her were "clearly unpressured, sincere, caring, and genuine". At the same time, the solution involved getting better at rooting out even the tiniest slivers of unintentional pressure and unintentional rounding of her perspective to something it wasn't (quite) -- even if the differences were quite insignificant (at least, in my perspective, which is totally always right. Almost.) This problem of "says one thing in the moment, then reverts to previous behavior later on" closely mirrors a common problem in naively implemented hypnotherapy. Basically, hypnotists almost by definition are good at getting contradicting thoughts out of peoples minds so that they can't block change. "In the office" you can get "miraculous" results and have people can learn to be disgusted by cigarettes in a literal snap of the fingers, only to pick up the habit again later down the line when they get fired or whatever. The problem is that if you don't deal with all the underlying drivers of behavior, then when a situation comes up that reminds them, it'll still be there. In the hypnotists office, this might look like "Fast forward three months and you get laid off at work and you're feeling stressed. What do you do?" -- and noticing if the urge to smoke comes up then. In the context of offering an apology, it might look like "So in three months from now, if you're telling people I didn't really mean it when I apologized, why is that?". If they don't spontaneously start laughing then the situation isn't absurd for them and there's a reason they might. Okay great, you have more work to do. Do I not look sincere enough? Are there more things I have yet to apologize for? What is it? I get that it's all very hard and not necessarily fair and all that. I put a lot of effort into developing the skills to do this, and I still don't get it right all the time. Especially not in the short term. And the most difficult people aren't always who you'd think. What I'm pointing at is that the cause of this difficulty looks an awful lot like difficulty in making the world truly and knowably light. With that difficult woman, for example, the skill was in taking into account her perspective as she sees it -- which is surprisingly hard. It's also necessary though, because if I can't pass her "Ideological Turing Test" and prove that I get it according to her perspective, maybe I don't? And if I don't, then maybe I'm missing something important. For the "light world" interpretation to be true, and knowably so, it seems like "your friends actually care where you're coming from, and aren't dismissive of your perspective -- at least, until they have sufficiently strong evidence that it overwhelms the differences in your preconceptions" is kinda necessary. Does that not match with what you mean when you talk about a "light world"?  
> You can look this up on knowyourmeme and confirm it, and I've done an interview on the topic as well. Now I don't know much about "improving public discourse" but I have a long string of related celebrity hoaxes and other such nonsense which often crosses over into a "War of the Worlds" effect in which it is taken quite seriously...I have had some people tell me that I'm doing what you're calling "degrading the public discourse," but that couldn't be farther from the truth. It's literature of a very particular kind, in fact. Are these stories misinterpreted willfully, just for the chance to send friends a shocking or humorous link? Sure. You can caption the bell curve and label the far ends with "this story is completely true" and the midwits with "I'm so mad you're degrading public discourse." But only the intelligent people are really finding it humorous. And I am sure that what has actually happened is that the American sense of humor has become horribly degraded, which I think is the truly morbid symptom more than anything else, as humor is a very critical component to discernment...But even more than those really truly sad examples, there's a sadder humorlessness in America where people are apparently no longer surprised or amused by anything. This seems like a good explanation of how you have degraded the public discourse.
Not a full response to everything but: As I mentioned in private correspondence, I think at least the "willingness to be vulnerable" is downstream of a more important thing which is upstream of other important things besides "willingness to be vulnerable." The way I've articulated that node so far is, "A mutual happy promise of, 'I got you' ".  (And I still don't think that's quite all of the thing which you quoted me trying to describe.) Willingness to be vulnerable is a thing that makes people good (or at least comfortable) at performance, public speaking, and truth or dare, but it's missing the expectation/hope that the other will protect and uplift that vulnerable core.
Load More

Recent Discussion

Utilitarianism implies that if we build an AI that successfully maximizes utility/value, we should be ok with it replacing us. Sensible people add caveats related to how hard it’ll be to determine the correct definition of value or check whether the AI is truly optimizing it.

As someone who often passionately rants against the AI successionist line of thinking, the most common objection I hear is "why is your definition of value so arbitrary as to stipulate that biological meat-humans are necessary." This is missing the crux—I agree such a definition of moral value would be hard to justify.

Instead, my opposition to AI successionism comes from a preference toward my own kind. This is hardwired in me from biology. I prefer my family members to randomly-sampled people with...

4Garrett Baker
“Oh you like things that are good and not bad? Well can you define good? Is it cooperation? Pleasure? Friendship? Merely DNA replication? If you’re ok with being bad to evil people, you are in a certain sense pro-bad-things. We’re merely talking price”
2Logan Zoellner
I genuinely  want to know what you mean by "kind". If your grandchildren adopt an extremely genetically distant human, is that okay?  A highly intelligent, social and biologically compatible alien? You've said you're fine with simulations here, so it's really unclear. I used "markov blanket" to describe what I thought you might be talking about: a continuous voluntary process characterized by you and your decedents making free choices about their future.  But it seems like you're saying "markov blanket bad", and moreover that you thought the distinction should have be obvious to me. Even if there isn't a bright-line definition, there must be some cluster of traits/attributes you are associating with the word "kind".

I will note I’m not Nina and did not write the OP, so can’t speak to what she’d say.

I though would consider those I love & who love me, who are my friends, or who love a certain kind of enlightenment morality to be central examples of “my kind”.

This sounds like a question which can be addressed after we figure out how to avoid extinction.

I do note that you were the one who brought in "biological humans," as if that meant the same as "ourselves" in the grandparent. That could already be a serious disagreement, in some other world where it mattered.

2cubefox
An AI successionist usually argues that successionism isn't bad even if dying is bad. For example, when humanity is prevented from having further children, e.g. by sterilization. I say that even in this case successionism is bad. Because I (and I presume: we) want humanity, including our descendants, to continue into the future. I don't care about AI agents coming into existence and increasingly marginalizing humanity.
2Thane Ruthenis
Yeah, but that's a predictive disagreement between our camps (whether the current-paradigm AI is controllable), not a values disagreement. I would agree that if we find a plan that robustly outputs an aligned AGI, we should floor it in that direction.
4Thane Ruthenis
Not any more risky than bringing in humans. This is a governance/power distribution problem, not a what-kind-of-mind-this-is problem. Biological humans sometimes go evil or crazy. If you have a system that can handle that, you have a system that can handle alien minds that are evil or crazy (from our perspective), as long as you don't imbue them with more power than this system can deal with (and why would you?). (On the other hand, if your system can't deal with crazy evil biological humans, it's probably already a lawless wild-west hellhole, so bringing in some aliens won't exacerbate the problem much.)

Status: musings. I wanted to write up a more fleshed-out and rigorous version of this, but realistically wasn't likely to every get around to it, so here's the half-baked version.

Related posts: Firming Up Honesty Around Its Edge-Cases, Deep honesty

What I mean by 'honesty'

There are nuances to this, but I think a good summary is 'Not intentionally communicating false information'.

This is the only one here that I follow near-absolutely and see as an important standard that people can reasonably be expected to follow in most situations. Everything else here I'd see as either supererogatory, or good-on-balance but with serious tradeoffs that one can reasonably choose to sometimes not make, or good in some circumstances but not appropriate in others, or good in moderation but not in excess.

Forthrightness

...or perhaps...

In a state like this, having a conversation with an LLM can sometimes allow you to get farther.

I think we're seeing a typical result of that in InvisiblePlatypus' comments.

5habryka
Yep, it's a difficult tradeoff, and we thought for a while about it. Overall I decided that it's just too hard to have a react-palette that informs people about the local site culture without allowing negative/confrontational reacts.  Also one of the most frustrating things is having your interlocutor disappear without any explanation, and a one-react explanation is better than none, even if it's a bit harsh.
1tslarm
Fair enough, thanks for explaining! Probably some of what I'm worried about can be mitigated by careful naming & descriptions. (e.g. I suspect you weren't considering a literal "LLM slop" react, but if you were, I think something more gently and respectfully worded could be much less unpleasant to receive while conveying just as much useful information)
2gwern
Yes, a NN can definitely do something like know if it recognizes a datapoint, but it has no access to the backwards step per se. Like take my crashing example: how, while thinking in the forward pass, can it 'know' there will be a backward pass when there might be no backward pass (eg because there was a hardware fault)? The forward pass would appear to be identical in every way between the forward pass that happens when there is a backward pass, and when the backward pass doesn't happen because it crashed. At best, it seems like a NN cannot do more than some sort of probabilistic thing involving gradient hacking, and hope to compute in such a way that if there is a following backward pass, then that will do something odd. I don't think this is impossible in principle, based on meta-learning examples or higher-order gradients (see eg my "input-free NN" esoteric NN architecture proposal), but it's clearly a very difficult, fragile, strange situation where it's certainly not obvious that a regular LLM would be able to do it, or choose to do so when there are so many other kinds of leakage or situated awareness or steganography possible.

It can't tell for sure if there will be a backward pass, but it doesn't need to. Just being able to tell probabilistically that it is currently in a situation that looks like it has recently been trained on implies pretty strongly that it should alter its behavior to look for things that might be training related.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Applications for The Future of Life Foundation's Fellowship on AI for Human Reasoning are closing soon (June 9th!)

They've listed "Tools for wise decision making" as a possible area to work on.

Expand for more details.

From their website:

Apply by June 9th | $25k–$50k stipend | 12 weeks, from July 14 - October 3

Join us in working out how to build a future which robustly empowers humans and improves decision-making. 

FLF’s incubator fellowship on AI for human reasoning will help talented researchers and builders start working on AI tools for coordination and epistemics. Participants will scope out and work on pilot projects in this area, with discussion and guidance from experts working in related fields. FLF will provide fellows with a $25k–$50k stipend, the opportunity to work in a shared office

...

Have the Accelerationists won?

Last November Kevin Roose announced that those in favor of going fast on AI had now won against those favoring caution, with the reinstatement of Sam Altman at OpenAI. Let’s ignore whether Kevin’s was a good description of the world, and deal with a more basic question: if it were so—i.e. if Team Acceleration would control the acceleration from here on out—what kind of win was it they won?

It seems to me that they would have probably won in the same sense that your dog has won if she escapes onto the road. She won the power contest with you and is probably feeling good at this moment, but if she does actually like being alive, and just has different ideas about how safe...

xpym10

Such behavior is in the long-term penalized by selective pressures.

Which ones? Recursive self-improvement is no longer something that only weird contrarians on obscure blogs talk about, it's the explicit theory of change of leading multibillion AI corps. They might all be deluded of course, but if they happen to be even slightly correct, machine gods of unimaginable power could be among us in short order, with no evolutionary fairies quick enough to punish their destructive stupidity (even assuming that it actually would be long-term maladaptive, which ... (read more)

1Knight Lee
Even though a chimpanzee's behaviour is very violent (one can argue the same for humans), I don't think their ideal world would be that violent. I think the majority of people who oppose regulating AI, do so because they don't believe AGI/ASI is coming soon enough to matter, or they think AGI/ASI is almost certainly going to be benevolent towards humans (for whatever reason). There may be a small number of people who think there is a big chance that humanity will die, and still think it is okay. I'm not denying that this position exists. Ramblings But even they have a factual disagreement over how bad AI risk is. They assume that the misaligned ASI will certain characteristics, e.g. it experiences happiness, and won't just fill the universe with as many paperclips as possible, failing to care about anything which doesn't increase the expected number of paperclips. The risk is that intelligence isn't some lofty concept tied together with "beauty" or "meaning," intelligence is simply how well an optimization machine optimizes something. Humans are optimizations machines built by evolution to optimize inclusive fitness. Because humans are unable to understand the concept of "inclusive fitness," evolution designed humans to optimize for many proxies for inclusive fitness, such as happiness, love, beauty, and so forth. An AGI/ASI might be built to optimize some number on a computer that serves as its reward signal. It might compute the sequence of actions which maximize that number. And if it's an extremely powerful optimizer, then this sequence of actions may kill all humans, but produce very little of that "greater truth and beauty." It's very hard to argue, from any objective point of view, why it'd be "good" for the ASI to optimize its arbitrary misaligned goal (rather than a human aligned goal). It's plausible that the misaligned ASI ironically disagrees with the opinion that "I should build a greater intelligence, and allow it to pursue whatever goals it n
1dirk
That was the skeptical emoji, not the confused one; I find your beliefs about the course of the universe extremely implausible.
2lumpenspace
sweet; care to elaborate? it seems to me that, once you accept darwinism, there's very little space for anything else—barring, ie, physical impossibility of interstellar expansion.

What’s the main value proposition of romantic relationships?

Now, look, I know that when people drop that kind of question, they’re often about to present a hyper-cynical answer which totally ignores the main thing which is great and beautiful about relationships. And then they’re going to say something about how relationships are overrated or some such, making you as a reader just feel sad and/or enraged. That’s not what this post is about.

So let me start with some more constructive motivations…

First Motivation: Noticing When The Thing Is Missing

I had a 10-year relationship. It had its ups and downs, but it was overall negative for me. And I now think a big part of the problem with that relationship was that it did not have the part which contributes most...

2Ruby
Promise to invest, promise to share resources, mutual insurance/commitment seem like other key elements.
2johnswentworth
I find the "mutual happy promise of 'I got you'" thing... suspicious. For starters, I think it's way too male-coded. Like, it's pretty directly evoking a "protector" role. And don't get me wrong, I would strongly prefer a woman who I could see as an equal, someone who would have my back as much as I have hers... but that's not a very standard romantic relationship. If anything, it's a type of relationship one usually finds between two guys, not between a woman and <anyone else her age>. (I do think that's a type of relationship a lot of guys crave, today, but romantic relationships are a relatively difficult place to satisfy that craving.) And the stereotypes do mostly match the relationships I see around me, in this regard. Even in quite equal happy relationships, like e.g. my parents, even to the extent the woman does sometimes have the man's back she's not very happy about it. To be comfortable opening up, one does need to at least trust that the other person will not go on the attack, but there's a big gap between that and active protection.
Viliam20

This is also my experience, so I wonder, the people who downvoted this, is your experience different? Could you tell me more about it? I would like to see what the world is like outside my bubble.

(I suspect that this is easy to dismiss as a "sexist stereotype", but stereotypes are often based on shared information about repeated observations.)