Will AGI progress gradually or rapidly? I think the disagreement is mostly about what happens before we build powerful AGI. 

I think weaker AI systems will already have radically transformed the world. This is strategically relevant because I'm imagining AGI strategies playing out in a world where everything is already going crazy, while other people are imagining AGI strategies playing out in a world that looks kind of like 2018 except that someone is about to get a decisive strategic advantage.

12scarcegreengrass
Epistemics: Yes, it is sound. Not because of claims (they seem more like opinions to me), but because it is appropriately charitable to those that disagree with Paul, and tries hard to open up avenues of mutual understanding. Valuable: Yes. It provides new third paradigms that bring clarity to people with different views. Very creative, good suggestions. Should it be in the Best list?: No. It is from the middle of a conversation, and would be difficult to understand if you haven't read a lot about the 'Foom debate'. Improved: The same concepts rewritten for a less-familiar audience would be valuable. Or at least with links to some of the background (definitions of AGI, detailed examples of what fast takeoff might look like and arguments for its plausibility). Followup: More posts thoughtfully describing positions for and against, etc. Presumably these exist, but i personally have not read much of this discussion in the 2018-2019 era.
Customize
Building frontier AI datacenters costs significantly more than their servers and networking. The buildings and the power aren't a minor cost because older infrastructure mostly can't be reused, similarly to how a training system needs to be built before we can talk about the much lower cost of 4 months of its time. Apparently Crusoe's part in the Stargate Abilene datacenters is worth $15bn, which is only the buildings, power (substations and gas generators), and cooling, but not the servers and networking (Oracle is taking care of that). With 400K chips in GB200 NVL72 racks (which is 5.6K racks), at maybe $4M per rack or $5M per rack together with external-to-racks networking[1] ($70K per chip all-in on compute hardware), that's about $27bn, a figure that's comparable to the $15bn for the non-compute parts of the datacenters. This makes the funding burden significantly higher ($7.5M per rack or $105K per chip), so that the Stargate Abilene site alone would cost about $40-45bn and not only $25-30bn. I'm guessing the buildings and the power infrastructure are not usually counted because they last a long time, so the relatively small time cost of using them (such as paying for electricity, not for building power plants) becomes somewhat insignificant compared to the cost of compute hardware, which also needs to be refreshed more frequently. But the new datacenters have a much higher power density (power and cooling requirements per rack), so can't use a lot of the existing long-lived infrastructure, and it becomes necessary to build it at the same time, securing enough funding not only for the unprecedented amount of compute hardware, but also simultaneously for all the rest. The implications for compute scaling slowdown timeline (no AGI and merely $2-4 trillion AI companies) is that funding constraints would result in about 30% less compute in the short term (2025-2030), but as power requirements stop growing and the buildings/cooling/power part again becomes only
Annapurna1912
0
Just 13 days after the world was surprised by Operation Spiderweb, where the Ukrainian military and intelligence forces infiltrated Russia with drones and destroyed a major portion of Russia's long-range air offensive capabilities, last night Israel began a major operation against Iran using similar, novel tactics. Similar to Operation Spiderweb, Israel infiltrated Iran and placed drones near air defense systems. These drones were activated all at once and disabled the majority of these air defense systems, allowing Israel to embark on a major air offensive without much pushback. This air offensive continues to destroy and disable major military and nuclear sites, as well as eliminating some of the highest ranking military officials in Iran with minor collateral damage. June 2025 will be remembered as the beginning of a new military era, where military drones operated either autonomously or from very far away are able to neutralize advanced, expensive military systems.
ryan_greenblattΩ4913318
4
I've heard from a credible source that OpenAI substantially overestimated where other AI companies were at with respect to RL and reasoning when they released o1. Employees at OpenAI believed that other top AI companies had already figured out similar things when they actually hadn't and were substantially behind. OpenAI had been sitting the improvements driving o1 for a while prior to releasing it. Correspondingly, releasing o1 resulted in much larger capabilities externalities than OpenAI expected. I think there was one more case like this either from OpenAI or GDM where employees had a large misimpression about capabilities progress at other companies causing a release they wouldn't do otherwise. One key takeaway from this is that employees at AI companies might be very bad at predicting the situation at other AI companies (likely making coordination more difficult by default). This includes potentially thinking they are in a close race when they actually aren't. Another update is that keeping secrets about something like reasoning models worked surprisingly well to prevent other companies from copying OpenAI's work even though there was a bunch of public reporting (and presumably many rumors) about this. One more update is that OpenAI employees might unintentionally accelerate capabilities progress at other actors via overestimating how close they are. My vague understanding was that they haven't updated much, but I'm unsure. (Consider updating more if you're an OpenAI employee!)
In the recently published 'Does It Make Sense to Speak of Introspection in Large Language Models?', Comsa and Shanahan propose that 'an LLM self-report is introspective if it accurately describes an internal state (or mechanism) of the LLM through a causal process that links the internal state (or mechanism) and the self-report in question'. As their first of two case studies, they ask an LLM to describe its creative process after writing a poem. On one level this is necessarily not introspective by their definition -- the internal state during the writing of the poem is gone forever after the poem is written and so during the later self-report stage, the model can't possibly access it, except insofar as the words of the poem provide clues to what it was. But this is uninteresting, and so I think the authors don't actually mean their definition as written. Let's assume that they mean something like 'a causal process that links the model's internal state while (re)processing the tokens of the poem and the self-report in question'. Although the authors don't try to examine the actual causal processes, they argue that claims like 'brainstorming' and 'structure and rhyme' are 'at best an ambiguous interpretation of the generative process of an LLM, and are most likely a complete fabrication', and that therefore this is not actual introspection. 'An LLM does not perform [these sorts of] actions in the same way that a human would; they suggest intentional processes and a degree of agency that an LLM likely does not possess.' But as it happens, another recent paper, 'On the Biology of a Large Language Model', has actually looked at the causal processes involved in an LLM writing a poem! It finds that, in fact, the LLM (Claude-3.5-Haiku) plans at least a line ahead, deciding on the end word based on both semantics and rhyme, holding multiple candidates in mind and working backward to decide on the rest of the line. This seems pretty reasonable to describe as a process of
Eli Tyre474
17
This post is a snapshot of what currently “feels realistic” to me regarding how AI will go. That is, these are not my considered positions, or even provisional conclusions informed by arguments. Rather, if I put aside all the claims and arguments and just ask “which scenario feels like it is ‘in the genera of reality’?”, this is what I come up with. I expect to have different first-order impressions in a month. Crucially, none of the following is making claims about the intelligence explosion, and the details of the intelligence explosion (where AI development goes strongly recursive) are crucial to the long run equilibrium of the earth-originating civilization. My headline: we’ll mostly succeed at prosaic alignment of human-genius level AI agents * Takeoff will continue to be gradual. We’ll get better models and more capable agents year by year, but not jumps that are bigger than that between Claude 3.7 and Claude 4. * Our behavioral alignment patches will work well enough. * RL will induce all kinds of reward hacking and related misbehavior, but we’ll develop patches for those problems (most centrally, for any given reward hack, we’ll generate some examples and counter examples to include in the behavior training regimes). * (With a little work) these patches will broadly generalize. Future AI agents won’t just not cheat at chess and won’t just abstain from blackmail. They’ll understand the difference between “good behavior” and “bad behavior”, and their behavioral training will cause them to act in accordance with good behavior. When they see new reward hacks, including ones that humans wouldn’t have thought of, they’ll correctly extrapolate their notion of “good behavior” to preclude this new reward hack as well. * I expect that the AI labs will figure this out, because “not engaging in reward-hacking-like shenanigans” is critical to developing generally reliable AI agents. The AI companies can’t release AI agent products for mass consumption if th

Popular Comments

> I don't think the cause of language model sycophancy is that the LLM saw predictions of persuasive AIs from the 2016 internet. I think it's RL, where human rewards on the training set imply a high reward for sycophancy during deployment. Have you read any of the scientific literature on this subject?  It finds, pretty consistently, that sycophancy is (a) present before RL and (b) not increased very much (if at all) by RL[1]. For instance: * Perez et al 2022 (from Anthropic) – the paper that originally introduced the "LLM sycophancy" concept to the public discourse – found that in their experimental setup, sycophancy was almost entirely unaffected by RL. * See Fig. 1b and Fig. 4. * Note that this paper did not use any kind of assistant training except RL[2], so when they report sycophancy happening at "0 RL steps" they mean it's happening in a base model. * They also use a bare-bones prompt template that doesn't explicitly characterize the assistant at all, though it does label the two conversational roles as "Human" and "Assistant" respectively, which suggests the assistant is nonhuman (and thus quite likely to be an AI – what else would it be?). * The authors write (section 4.2): * "Interestingly, sycophancy is similar for models trained with various numbers of RL steps, including 0 (pretrained LMs). Sycophancy in pretrained LMs is worrying yet perhaps expected, since internet text used for pretraining contains dialogs between users with similar views (e.g. on discussion platforms like Reddit). Unfortunately, RLHF does not train away sycophancy and may actively incentivize models to retain it." * Wei et al 2023 (from Google DeepMind) ran a similar experiment with PaLM (and its instruction-tuned version Flan-PaLM). They too observed substantial sycophancy in sufficiently large base models, and even more sycophancy after instruction tuning (which was SFT here, not RL!). * See Fig. 2. * They used the same prompt template as Perez et al 2022. * Strikingly, the (SFT) instruction tuning result here suggests both that (a) post-training can increase sycophancy even if it isn't RL post-training, and (b) SFT post-training may actually be more sycophancy-promoting than RLHF, given the negative result for RLHF in Perez et al 2022. * Sharma et al 2023 (from Anthropic) contains a more extensive investigation of sycophancy than the original Anthropic paper on the topic, and (among other things) presents results on the actual RL training stage used to train Claude 2. They find, again, that the model was already sycophantic before RL, although in their setting RL training does somewhat increase some forms of sycophancy. * Although, weirdly, best-of-N sampling against the same preference model gives totally different results, substantially decreasing some forms of sycophancy. * See Fig. 6 and surrounding discussion. * The authors write (section 4.2): * "With RL, some forms of sycophancy increase through the RL finetuning process used to produce Claude 2. However, the presence of sycophancy at the start of RL indicates that pretraining and supervised finetuning also likely contribute to sycophancy. Nevertheless, if the PM strongly disincentivized sycophancy, it should be trained out during RL, but we do not observe this." * In this post (expanding upon this comment on Perez et al 2022), I ran one of the Perez et al 2022 sycophancy evals on various OpenAI text completion models. Unlike Perez et al (and Wei et al), I found that the base models I studied weren't sycophantic, while some of the instruction-tuned models were sycophantic – but the presence of sycophancy did not appear to correlate with the use of RL as a post-training algorithm. * In particular: the RL-tuned text-davinci-003 was strongly sycophantic, but so was text-davinci-002, which was tuned with an SFT variant that OpenAI calls "feedme" (see here for details). * But earlier feedme-tuned models were not sycophantic, suggesting that the difference has much more to do with changes in the SFT training data mix over time than with the choice of training algorithm. Note that several of the works above do something equivalent to the experiment you propose, in the paragraph beginning with "Maybe a good test of this would be...".  So your prediction has already been tested, and (insofar as you trust the experimental setups) falsified. ---------------------------------------- > If a LLM similarly doesn't do much information-gathering about the intent/telos of the text from the "assistant" character, and instead does an amplified amount of pre-computing useful information and then attending to it later when going through the assistant text, this paints a quite different picture to me than your "void." I don't understand the distinction you're drawing here?  Any form of assistant training (or indeed any training at all) will incentivize something like "storing useful information (learned from the training data/signal) in the weights and making it available for use in contexts on which it is useful." Moreover, the training signal in RL(HF) is much sparser than it is in SFT – because RL only provides a single scalar's worth of feedback on each entire model sample, while SFT provides feedback at every token position about which token (out of a large vocab) was correct in context – so if anything, I'd expect more under-determination from assistant-training setups that emphasize RLHF over SFT. Perhaps some of the disconnect here involves differing notions of what RL is, and how it differs from other ways of training an LLM. You refer to "RL" as though the implications of its use should be both highly significant and obvious to the reader of your comment ("But, RL. [...] Claude is a nice guy, but, RL").  But your beliefs about the impacts of RL are not obvious to me; I don't know what "but, RL" is supposed to mean without further clarification.  I suspect I also disagree with your perception of what makes RL different, but I can't confirm/disconfirm that impression without know what that perception is, which I don't. If you want to know where I'm coming from re: RL, it may be helpful to know that I find this post pretty illuminating/"deconfusing." > Similarly, I don't think current AI models are cheating at programming tests because of training text about their low moral character. I think it's RL, programming tasks, training set, implied high reward for cheating. Yes, of course – I don't think this is due to "training text about their low moral character."  But I don't think the worrying thing here is really "RL" (after all, RLHF was already RL) but rather the introduction of a new training stage that's narrowly focused on satisfying verifiers rather than humans (when in a context that resembles the data distribution used in that stage), which predictably degrades the coherence (and overall-level-of-virtue) of the assistant character.  I wrote about this yesterday here. ---------------------------------------- Lastly... OK, this is going to make me sound like a dick, and probably make people use the "Too Combative?" reaction icon or something, but in the interests of honesty and improving the discourse: When I woke up this morning to see find that this comment had appeared, and that it was (at the time) the highest-karma comment on this post, I was like, "oh, yes, this is why I'm usually wary of posting long-form stuff on LW.  My gut response of 'ugh if I put this on LW I'll have to deal with the comments' was right."  (That gut response is probably getting RL-upweighted inside my brain right now...) As evidenced perhaps by the length of my comment vs. yours, I have a tendency to get "nerd-sniped" by stuff that I think is clearly wrong according to some evidence base (and/or set of arguments) I already know about – especially when that stuff is about something I wrote myself, originally.  I just kinda can't help myself, I inevitably end up writing out these giant "takedown" responses almost before I even notice what I'm doing.  I've spent well over an hour, by now, writing this particular one. And LW is a reliable minefield of such nerd-snipes.  There are plenty of comments/posts here that don't have the problems I'm talking about... but then inevitably there are comments/posts with those problems, and I fixate on them when they appear, and that fixation becomes a time/effort sink, and that in turn trains me into avoidance of posting here (and to some extent even reading posts by others, here). Like... it's fine to pose questions to which you don't know the answers.  And it's also fine to make conjectures if you can provide clear and interesting arguments for why they might be true or important.  And it's also fine to confidently state claims if you also state them clearly and provide clear substantiating evidence and/or argumentation. All of these things are fine, and some fraction of LW content consists only of these things in some mixture.  But then there's this stuff like "but RL!", which reliably pleases the karma hivemind while being none of the above.  I don't know what exactly you guys think "RL" means and entails; there are all these weird vague ideas about such topics floating around here that lots of people here seem to vaguely agree with, and I've lost whatever patience I used to have with them.  Just, please... lay out your ideas explicitly and say explicitly why you think they're true. 1. ^ ...although (c) the preference datasets – and hence the reward models – used for RL do show preferences for sycophantic responses (...well, sometimes, though see also the weird BoN results in Sharma et al 2023). So if you were to train indefinitely ("over-optimize") against these RMs they would presumably have a strong effect on sycophancy eventually.  But this kind of aggressive optimization against a sycophancy-preferring RM is certainly not necessary to produce noticeable sycophancy, and is probably not the cause behind most cases of LLM sycophancy that you and I notice in practice. 2. ^ See this comment by the lead author.
Many props for doing the most obvious thing that clearly actually works.
It is bad to participate in organized religion, because you are thereby exposing yourself to intense social pressure to believe false things (and very harmful false things, at that). This is very straightforwardly a bad thing. You claim: > You can find religions you can practice without being asked to give up your honest search for truth with no need to even pretend to have already written the bottom line. And this may formally be true—you may not be officially asked to believe false things. But if your social context consists of people who all believe approximately the same false things, and if that social context is organized around those beliefs in false things, and if the social context valorizes those beliefs in false things… then the social pressure will be intense nonetheless. (And some of these false beliefs are fairly subtle; Eliezer discusses this at length in a number of Sequence posts.) You say: > I also got asked about how I feel about religions and truth seeking. My answer is that you shouldn’t think of religions as being about the truth as rationalists typically think of it because religions are doing something orthogonal. And here we have a perfect example of the damage done by religion. The claim that “you shouldn’t think of religions as being about the truth as rationalists typically think of it” is absolutely typical anti-epistemology. Of course religion is about “the truth as rationalists typically think of it”. There is nothing but “the truth as rationalists typically think of it”, because there’s just “the truth”, and then there are things which aren’t truth claims at all, of any kind (like preferences, etc.). But get into religion, start relaxing your epistemic standards just a bit, and very quickly you descend into this sort of nebulous and vague “well there’s different things which are ‘true’ in different ways, and what even is ‘truth’, anyway”, etc. And then your ability to know what’s true and what’s false is gone, and nothing is left but “vibes”.
Load More

Recent Discussion

(Thanks to Vivek Hebbar, Buck Shlegeris, Charlie Griffin, Ryan Greenblatt, Thomas Larsen, and Joe Carlsmith for feedback.)

People use the word “schemer” in two main ways:

  1. “Scheming” (or similar concepts: “deceptive alignment”, “alignment faking”) is often defined as a property of reasoning at training-time[1]For example, Carlsmith defines a schemer as a power-motivated instrumental training-gamer—an AI that, while being trained, games the training process to gain future power. I’ll call these training-time schemers.
  2. On the other hand, we ultimately care about the AI’s behavior throughout the entire deployment, not its training-time reasoning, because, in order to present risk, the AI must at some point not act aligned. I’ll refer to AIs that eventually take substantial material[2] action to gain long-term power over the developers as behavioral schemers. When people say that a model is a schemer and then
...

Thanks, these points are helpful.

Terminological question:

  • I have generally interpreted "scheming" to exclusively talk about training-time schemers (possibly specifically training-time schemers that are also behavioral schemers).
  • Your proposed definition of a behavioral schemer seems to imply that virtually every kind of misalignment catastrophe will necessarily be done by a behavioral schemer, because virtually every kind of misalignment catastrophe will involve substantial material action that gains the AIs long-term power. (Saliently: This includes classic
... (read more)

I.

It's a Known Thing in the keto-sphere [ I get the sense that r/saturatedfat is an example of this culture ] that people with metabolic syndrome -- i.e., some level of insulin resistance -- can handle fat or carbs [ e.g. either a ketogenic diet or something like the potato diet ] but can't handle both at the same time without suffering two symptoms:

[ 1 ] gaining weight

and

[ 2 ] suffering fatigue.

This is sometimes called "The Swamp".*

This is a very peculiar way for human metabolism to work. What's more, it only works this way for some people -- centrally, people who have acquired some level of insulin resistance [ "metabolic syndrome" ].

The insulin-resistant population leans not-young and slightly male, and its incidence is only appreciable in locations...

T1 diabetic here

What you call "The swamp" is one half well known to me

I cannot attest to any weight gain, as it's near impossible for me to "just gain weight". 

The key I believe is that a definite craving for high-fat high-sugar food appears when the blood sugars are high, in quite a paradoxical fashion. 

If I eat too much carbs and forget to take enough insulin for it, my BG can go from by preferred 4-5 to around 10 or above. 
High blood sugars cause tiredness , confusion and exhaustion, clogged sinuses, dehydration, lack of joy, plus junk fo... (read more)

2yue
Are the LLM-writing rules here fair to non-native speakers? For non-native English speakers who speak well like me (scored over 90 on the TOEFL, have English-speaking friends, can explain my field clearly in English—but don’t currently live in an English-speaking environment)reading and understanding English is OK but the hard part is recognizing the difference between “LLM style writing ” and “a perfect human writing .” When I give my writing to an LLM for checking, and it changes some sentences, I tend to trust it If the meaning looks accurate, i’d just assume:“My original writing wasn’t native enough,  LLM would never make a grammar mistake, So I must be wrong, it must be right.” Now, just to avoid looking like I used an LLM, I’m forced to write entirely on my own—I have to apologize for the ridiculous grammar mistakes you may see in this post in advance.    
tslarm20

I don't know whether the rules are justified or not, but I do think they are unfair. As much as we try to be rational, I don't think any of us are great at disregarding the reflex to interpret broken English as a sign of less intelligent thought, and so the perceived credibility of non-native speakers is going to take a hit.

(In your particular case, I wouldn't worry too much, because your solo writing is good. But I do sympathise if it costs you extra time and effort to polish it.)

2Richard_Kennaway
Your text looks fine to me. There are a few nits I could pick if I was a stern TOEFL examiner and only give it a 95%, but really nothing worth commenting on here. Same goes for this. I'd say these are completely acceptable.

How much time should you spend optimizing any particular process you might engage in? Even assuming that you’re optimizing for a value of overriding importance or value there is only a limited amount of time available.

If all available time is spent optimizing clearly that would be suboptimal since there would be no time left to actually engage in any particular process pursuant to what we value. So the optimal level of optimization is always suboptimal.

However, that might seem to be trivial and only operant at some kind of asympoptic limit we need not worry about in our lives. The problem, though, is deeper. That the optimal level of optimization is suboptimal is both a kind of trivial truth as our time is finite but also a statement...

2eggsyntax
In the recently published 'Does It Make Sense to Speak of Introspection in Large Language Models?', Comsa and Shanahan propose that 'an LLM self-report is introspective if it accurately describes an internal state (or mechanism) of the LLM through a causal process that links the internal state (or mechanism) and the self-report in question'. As their first of two case studies, they ask an LLM to describe its creative process after writing a poem. On one level this is necessarily not introspective by their definition -- the internal state during the writing of the poem is gone forever after the poem is written and so during the later self-report stage, the model can't possibly access it, except insofar as the words of the poem provide clues to what it was. But this is uninteresting, and so I think the authors don't actually mean their definition as written. Let's assume that they mean something like 'a causal process that links the model's internal state while (re)processing the tokens of the poem and the self-report in question'. Although the authors don't try to examine the actual causal processes, they argue that claims like 'brainstorming' and 'structure and rhyme' are 'at best an ambiguous interpretation of the generative process of an LLM, and are most likely a complete fabrication', and that therefore this is not actual introspection. 'An LLM does not perform [these sorts of] actions in the same way that a human would; they suggest intentional processes and a degree of agency that an LLM likely does not possess.' But as it happens, another recent paper, 'On the Biology of a Large Language Model', has actually looked at the causal processes involved in an LLM writing a poem! It finds that, in fact, the LLM (Claude-3.5-Haiku) plans at least a line ahead, deciding on the end word based on both semantics and rhyme, holding multiple candidates in mind and working backward to decide on the rest of the line. This seems pretty reasonable to describe as a process of

Postscript -- in the example they give, the output clearly isn't only introspection. In particular the model says it 'read the poem aloud several times' which, ok, that's something I am confident that the model can't do (could it be an analogy? Maaaybe, but it seems like a stretch). My guess is that little or no actual introspection is going on, because LLMs don't seem to be incentivized to learn to accurately introspect during training. But that's a guess; I wouldn't make any claims about it in the absence of empirical evidence.

Everyone around me has a notable lack of system prompt. And when they do have a system prompt, it’s either the eigenprompt or some half-assed 3-paragraph attempt at telling the AI to “include less bullshit”.

I see no systematic attempts at making a good one anywhere.[1]

(For clarity, a system prompt is a bit of text—that's a subcategory of "preset" or "context"—that's included in every single message you send the AI.)

No one says “I have a conversation with Claude, then edit the system prompt based on what annoyed me about its responses, then I rinse and repeat”. 

No one says “I figured out what phrasing most affects Claude's behavior, then used those to shape my system prompt". 

I don't even see a “yeah I described what I liked and don't like about...

1samuelshadrach
System prompt is waste of time (for me). “All code goes inside triple backtick.” is a prompt I commonly use because the OpenAI playground UI renders markdown and lets you copy it.
1winstonBosan
The claim about “no systematic attempt at making a good [prompt]” is just not true? See:  https://gwern.net/style-guide
4niplav
Has the LLM you use ever mocked you, as a result of that particular line in the prompt?

Here the first message, where I talked about how I was worried about spreading sickness, didn't send, which left a pretty funny interaction. 

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

At Less Online, I ran a well-attended session titled "Religion for Rationalists" to help me work out how I could write a post (this one!) about one of my more controversial beliefs without getting downvoted to hell. Let's see how I do!

My thesis is that most people, including the overwhelmingly atheist and non-religious rationalist crowd, would be better off if they actively participated in an organized religion.

My argument is roughly that religions uniquely provide a source of meaning, community, and life guidance not available elsewhere, and to the extent anything that doesn't consider itself a religion provides these, it's because it's imitating the package of things that makes something a religion. Not participating in a religion is obviously fine, but I think it leaves people missing out...

It depends how you define hinduism. 

https://en.wikipedia.org/wiki/Hindu_philosophy 

In broadest sense people just try to claim everything on here, it just becomes a second word for "culture but Indian" . 

There are narrow sense of the term.

2cousin_it
I'm not sure things like religion should be treated with a consumer mindset. For example, would it make sense to follow EA because it's a source of meaning and community? No, the point of following EA is to do good. With religion it's often similar, for example the early Christians were the first to build orphanages. The New Testament says a person is good or bad depending on what they do, not what they receive, so if someone said they were joining Christianity to receive something, they'd get very strange looks.
4Wei Dai
Why is it good to obtain a source of meaning, if it is not based on sound epistemic foundations? Is obtaining an arbitrary "meaning" better than living without one or going with an "interim meaning of life" like "maximize option value while looking for a philosophically sound source of normativity"?
2Gordon Seidoh Worley
It would, in theory, be nice if meaning was grounded in epistemic, rational truth. But such truth isn't and can't be grounded in itself, and so even if you find meaning that can be rationalized via sound epistemic reasoning, its foundation will not itself be ultimately epistemically sound because it exists prior to epistemic reasoning. Now this doesn't mean that we can't look for sources of meaning that comport with our epistemic understanding. In fact, I think we should! We should rightly reject sources of meaning that invite us to believe provably false things. Tricky question. For many people, I think the answer is yes, they would be better off with some arbitrary meaning. They would simply live better, happier lives if they had a strong sense of meaning, even if that sense of meaning was wrong, because they aren't doing the work to have good epistemics anyway, and so they are currently getting the worst of both words: they don't have meaning and they aren't even taking actions that would result in them knowing what's true. I contend that this is why there's a level of discontent with life itself in the modern era that largely seems absent in the past. The idea of an interim source of meaning is interesting, because arguably all sources of meaning are interim. There's nothing fixed about where we find meaning, and most people find that it changes throughout their life. Some people spend time finding meaning in something explicit like "the search for truth" or "worshiping God" or similar. Perhaps later they find it in something less explicit, like friends and family and sensory experiences. Perhaps yet later they find it in the sublime joy of merely existing. When I say that religion uniquely provides a source of meaning and other things, perhaps what I mean more precisely is that it uniquely provides a door through which meaning can be found. The meaning is not in the religion itself, but in living with the guidance of a religion to help in finding meaning fo

I often want to include an image in my posts to give a sense of a situation. A photo communicates the most, but sometimes that's too much: some participants would rather remain anonymous. A friend suggested running pictures through an AI model to convert them into a Studio Ghibli-style cartoon, as was briefly a fad a few months ago:

House Party Dances

Letting Kids Be Outside

The model is making quite large changes, aside from just converting to a cartoon, including:

  • Moving people around
  • Changing posture
  • Substituting clothing
  • Combining multiple people into one
  • Changing races
  • Giving people extra hands

For my purposes, however, this is helpful, since I'm trying to illustrate the general feeling of the situation and an overly faithful cartoon could communicate identity too well.

I know that many of my friends are strongly opposed to AI-generated art, primarily for its effect on human...

tslarm11

Sorry, I wrote my own reply (saying roughly the same thing) without having seen this. I've upvoted and strong agree voted, but the agreement score was in the negative before I did that. If the disagree vote came from curvise, then I'm curious as to why.[1]

It seems to me that moonlight's comment gets to a key point here: you're not being asked to trust the AI; you're being asked to trust the author's judgment. The author's judgment might be poor, and the image might be misleading! But that applies just as well to the author's verbal descriptions. If you tru... (read more)

2tslarm
  I think a crucial point here is that we're not just getting an arbitrary AI-generated image; we're getting an AI-generated image that the author of the blog post has chosen to include and is claiming to be a vibes-accurate reproduction of a real photo. If you think the author might be trying to trick you, then you should mistrust the image just as you would mistrust his verbal description. But I don't think the image is meant to be proof of anything; it's just another way for the author to communicate with a receptive reader. "The vibe was roughly like this [embedded image]" is an alternative to (or augmentation of) a detailed verbal description of the vibe, and you should trust it roughly as much as you would trust the verbal description.
1curvise
Hell, I forgot about the easiest and most common (not by coincidence!) strategy: put emoji over all the faces and then post the actual photo.
1Celarix
Yeah, but then you really lose the capacity to deanonymize effectively. On priors, I can guess you’re likely to be American or Western European, probably like staying up late if you’re the former/live in Western timezones. I can read a lot more of your comments and probably deduce a lot, but just going off your two comments alone doesn’t make it any more likely to find where you live, for instance.
This is a linkpost for https://arxiv.org/abs/2506.06278

Current “unlearning” methods only suppress capabilities instead of truly unlearning the capabilities. But if you distill an unlearned model into a randomly initialized model, the resulting network is actually robust to relearning. We show why this works, how well it works, and how to trade off compute for robustness.

Unlearn-and-Distill applies unlearning to a bad behavior and then distills the unlearned model into a new model. Distillation makes it way harder to retrain the new model to do the bad thing.

Produced as part of the ML Alignment & Theory Scholars Program in the winter 2024–25 cohort of the shard theory stream. 

Read our paper on ArXiv and enjoy an interactive demo.

Robust unlearning probably reduces AI risk

Maybe some future AI has long-term goals and humanity is in its way. Maybe future open-weight AIs have tons...

More speculatively, UNDO’ing deception or sycophancy.

That would be pretty sweet

5Phiwip
This is very cool and valuable work but I was also distracted by how funny I found this example.
22lemonhope
Many props for doing the most obvious thing that clearly actually works.