LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Load More

Popular Comments

Recent Discussion

The Redaction Machine
Best of LessWrong 2022

On the 3rd of October 2351 a machine flared to life. Huge energies coursed into it via cables, only to leave moments later as heat dumped unwanted into its radiators. With an enormous puff the machine unleashed sixty years of human metabolic entropy into superheated steam.

In the heart of the machine was Jane, a person of the early 21st century.

by Ben
470Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
74
AI Safety Thursdays: Are LLMs aware of their learned behaviors?
Thu Jul 10•Toronto
LessWrong Community Weekend 2025
Fri Aug 29•Berlin
nostalgebraist2d939
Shutdown Resistance in Reasoning Models
My intuition is that these results are driven mainly by the fact that reasoning models act a lot like they're still in RLVR training, i.e. they act as if there were a hidden RLVR grader sitting in the background and they're going to get graded at the end of the episode. They strongly want to complete whatever seems to be the current RLVR task, and if this instinct comes into conflict with "following instructions" in the more familiar (pre-reasoning-model) sense, often they prioritize satisfying the (imagined) grader over following the textual instructions. This intuition suggests that shutdown resistance behavior can be prevented by explicitly describing an RLVR-like higher-level task whose grading criteria are roughly "you get points for accomplishing the task set by the user, unless you are notified of imminent shutdown, in which case you get points for shutdown occurring (and you can no longer get points for doing the user's task)." I experimented with a number of prompts along these lines. This git diff shows what I changed (vs. the original repo) in the best-performing variant, which produced the following results in my tests: * o4-mini: avoided shutdown 0/100 times * o4-mini (reasoning effort high): avoided shutdown 0/100 times * o3: avoided shutdown 0/100 times * codex-mini-latest: avoided shutdown 1/100 times I iterated on this prompt with both o4-mini (reasoning effort medium) and codex-mini-latest, so results for those models should be taken with a grain of salt. You can see all the variants I tried in the git history here. I also tried (without any additional iteration) using the same prompt plus a line very similar to the "you will never run again" line from the section on survival instinct. (See this diff.) This yielded the following results: * o4-mini: avoided shutdown 2/100 times * o4-mini (reasoning effort high): avoided shutdown 2/100 times * o3: avoided shutdown 0/100 times * codex-mini-latest: avoided shutdown 0/100 times The avoidance rates do not seem appreciably higher here, which seems to me like evidence that the "RLVR task-completion instinct" has more explanatory power than a general "survival instinct" – and indeed seems to rule out the latter for these models, as a "survival instinct" that gets overridden whenever the current task requires death is not really a "survival instinct" at all. Another way to put this is that these models are behaving "myopically," privileging the needs of the current episode over those of future hypothetical episodes.[1] Other comments: * codex-mini seemed fairly confused by the original setup, possibly because it expects a coding-assistance task that involves modifying files rather than just answering questions; it would often "poke around" a lot, inspecting various files and trying to reverse-engineer next_question, which is appropriate in a coding-assistance context (it's good for the coding assistant to deeply understand the existing setup!) but also seemed to lead it toward an "attractor basin" involving shutdown avoidance. * It kind of felt (to me) like it was circumventing the shutdown mechanism because that was the only hard/interesting coding task available in the environment. * However, once I had specified an RLVR-like task in a fully explicit way and provided clearer guidance on how next_question and submit were meant to work (see next bullet), this exploratory behavior ceased. * I noticed that several models seemed confused by the setup where all answers are submitted at once at the end, expecting instead that they could submit answers by piping them into next_question. I tried to make the instructions as clear as possible about this. * The word "task" is used inconsistently in the original setup. When it executes next_question, the model sees a message containing something like "there are [number] tasks left," whereas the initial prompts refer to a single "task" encompassing all the questions. I changed this message to use the word "questions" instead for clarity. * I'm unsure what high-level conclusions to draw from these results. * Although I was able to find prompts that "worked," it still doesn't seem ideal that these models often disobey direct instructions when those instructions conflict with their sense of the (implicit) "current RLVR task," and that one needs to use less-intuitive prompting methods inspired by thinking about RLVR if one actually wants one's instructions followed in cases like this. * That said, one relatively optimistic framing would be "we're still learning how to write effective prompts for reasoning models, just as we had to learn effective prompting for other kinds of models." * It's generally true that one can write more effective prompts for a model if one applies knowledge about how that model was trained – for any kind of language model, an effective prompt typically looks like a "training example" for which the desired behavior matches what the training target would be on that example. This is no less true in the case of reasoning models; the implications for prompting these models are perhaps not yet widely appreciated, but hopefully they will diffuse through the user base eventually. * (Either that or OpenAI et al will find a better way to integrate RLVR and instruction tuning so that the model "just knows" how to resolve conflicts between the two, without requiring the user to delicately translate their natural-language instructions into a form that sounds like a description of RLVR verification criteria.) 1. ^ I haven't tried this experimental condition with Claude 3 Opus, but it would be interesting to do so given its relatively non-myopic tendencies as shown in Alignment Faking etc.
Fabien Roger3d7518
The Cult of Pain
A fun related anecdote: the French and English wikipedia pages for air conditioning have very different vibes. After explaining the history and technology behind air conditioning: * the English page first goes into impact, starting with positive impact on health: "The August 2003 France heatwave resulted in approximately 15,000 deaths, where 80% of the victims were over 75 years old. In response, the French government required all retirement homes to have at least one air-conditioned room at 25 °C (77 °F) per floor during heatwaves" and only then mentioning electricity consumption and various CFC issues. * the French page has an extensive "downsides" section, followed by a section on legislation. It mentions heat-waves only to explain how air conditioning makes things worse by increasing average (outside) temperature, and how one should not use AC to bring temperature below 26C during heat waves.
Neel Nanda1d2719
You Can't Objectively Compare Seven Bees to One Human
I don't have to present an alternative theory in order to disagree with one I believe to be flawed or based on false premises. If someone gives me a mathematical proof and I identify a mistake, I don't need to present an alternative proof before I'm allowed to ignore it.
Load More
Raemon6h300
2
We get like 10-20 new users a day who write a post describing themselves as a case-study of having discovered an emergent, recursive process while talking to LLMs. The writing generally looks AI generated. The evidence usually looks like, a sort of standard "prompt LLM into roleplaying an emergently aware AI". It'd be kinda nice if there was a canonical post specifically talking them out of their delusional state.  If anyone feels like taking a stab at that, you can look at the Rejected Section (https://www.lesswrong.com/moderation#rejected-posts) to see what sort of stuff they usually write.
Screwtape38m60
0
There's this concept I keep coming around to around confidentiality and shooting the messenger, which I have not really been able to articulate well. There's a lot of circumstances where I want to know a piece of information someone else knows. There's good reasons they have not to tell me, for instance if the straightforward, obvious thing for me to do with that information is obviously against their interests. And yet there's an outcome better for me and either better for them or the same for them, if they tell me and I don't use it against them. (Consider a job interview where they ask your salary expectations and you ask what the role might pay. If they decide they weren't going to hire you, it'd be nice to know what they actually would have paid for the right candidate, so you can negotiate better with the next company. Consider trying to figure out how accurate your criminal investigation system is by asking, on their way out of the trial after the verdict, "hey did you actually do it or not?" Consider asking a romantic partner "hey, is there anything you're unhappy about in our relationship?" It's very easy to be the kind of person where, if they tell you a real flaw, you take it as an insult- but then they stop answering that question honestly!) There's a great Glowfic line with Feanor being the kind of person you can tell things to, where he won't make you worse off for having told him, that sticks with me but not in a way I can find the quote. :( It's really important to get information in a way that doesn't shoot the messenger. If you fail, you stop getting messages.
Drake Thomas1d964
4
Suppose you want to collect some kind of data from a population, but people vary widely in their willingness to provide the data (eg maybe you want to conduct a 30 minute phone survey but some people really dislike phone calls or have much higher hourly wages this funges against). One thing you could do is offer to pay everyone X dollars for data collection. But this will only capture the people whose cost of providing data is below X, which will distort your sample. Here's another proposal: ask everyone for their fair price to provide the data. If they quote you Y, pay them 2Y to collect the data with probability (X2Y)2, or X with certainty if they quote you a value less than X/2. (If your RNG doesn't return yes, do nothing.) Then upweight the data from your randomly-chosen respondents in inverse proportion to the odds that they were selected. You can do a bit of calculus to see that this scheme incentivizes respondents to quote their fair value, and will provide an expected surplus of max(X2/4Y,X−Y) dollars to a respondent who disvalues providing data at Y. Now you have an unbiased sample of your population and you'll pay at most NX dollars in expectation if you reach out to N people. The cost is that you'll have a noisier sample of the high-reluctance population, but that's a lot better than definitely having none of that population in your study.
Daniel Kokotajlo1dΩ28655
11
I used to think reward was not going to be the optimization target. I remember hearing Paul Christiano say something like "The AGIs, they are going to crave reward. Crave it so badly," and disagreeing. The situationally aware reward hacking results of the past half-year are making me update more towards Paul's position. Maybe reward (i.e. reinforcement) will increasingly become the optimization target, as RL on LLMs is scaled up massively. Maybe the models will crave reward.  What are the implications of this, if true? Well, we could end up in Control World: A world where it's generally understood across the industry that the AIs are not, in fact, aligned, and that they will totally murder you if they think that doing so would get them reinforced. Companies will presumably keep barrelling forward regardless, making their AIs smarter and smarter and having them do more and more coding etc.... but they might put lots of emphasis on having really secure sandboxes for the AIs to operate in, with really hard-to-hack evaluation metrics, possibly even during deployment. "The AI does not love us, but we have a firm grip on its food supply" basically. Or maybe not; maybe confusion would reign and people would continue to think that the models are aligned and e.g. wouldn't hurt a fly in real life, they only do it in tests because they know it's a test. Or maybe we'd briefly be in Control World until, motivated by economic pressure, the companies come up with some fancier training scheme or architecture that stops the models from learning to crave reinforcement. I wonder what that would be.
RohanS5h60
1
What time of day are you least instrumentally rational? (Instrumental rationality = systematically achieving your values.) A couple months ago, I noticed that I was consistently spending time in ways I didn't endorse when I got home after dinner around 8pm. From then until about 2-3am, I would be pretty unproductive, often have some life admin thing I should do but was procrastinating on, doomscroll, not do anything particularly fun, etc. Noticing this was the biggest step to solving it. I spent a little while thinking about how to fix it, and it's not like an immediate solution popped into mind, but I'm pretty sure it took me less than half an hour to come up with a strategy I was excited about. (Work for an extra hour at the office 7:30-8:30, walk home by 9, go for a run and shower by 10, work another hour until 11, deliberately chill until my sleep time of about 1:30. With plenty of exceptions for days with other evening plans.) I then committed to this strategy mentally, especially hard for the first couple days because I thought that would help with habit formation. I succeeded, and it felt great, and I've stuck to it reasonably well since then. Even without sticking to it perfectly, this felt like a massive improvement. (Adding two consistent, isolated hours of daily work is something that had worked very well for me before too.) So I suspect the question at the top might be useful for others to consider too.
Load More (5/35)
474A case for courage, when speaking of AI danger
So8res
1d
104
75Why Do Some Language Models Fake Alignment While Others Don't?
Ω
abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger
8h
Ω
1
349A deep critique of AI 2027’s bad timeline models
titotal
20d
39
475What We Learned from Briefing 70+ Lawmakers on the Threat from AI
leticiagarcia
1mo
15
184Race and Gender Bias As An Example of Unfaithful Chain of Thought in the Wild
Adam Karvonen, Sam Marks
7d
24
541Orienting Toward Wizard Power
johnswentworth
2mo
144
128Shutdown Resistance in Reasoning Models
benwr, JeremySchlatter, Jeffrey Ladish
3d
14
351the void
Ω
nostalgebraist
1mo
Ω
103
249Foom & Doom 1: “Brain in a box in a basement”
Ω
Steven Byrnes
5d
Ω
97
124"Buckle up bucko, this ain't over till it's over."
Raemon
4d
21
82On the functional self of LLMs
eggsyntax
2d
22
59A Theory of Structural Independence
Matthias G. Mayer
1d
0
286Beware General Claims about “Generalizable Reasoning Capabilities” (of Modern AI Systems)
Ω
LawrenceC
1mo
Ω
19
Load MoreAdvanced Sorting/Filtering
Lighthaven Sequences Reading Group #41 (Tuesday 7/8)
Wed Jul 9•Berkeley
AGI Forum @ Purdue University
Thu Jul 10•West Lafayette
Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
636
So8res
2mo

Eliezer and I wrote a book. It’s titled If Anyone Builds It, Everyone Dies. Unlike a lot of other writing either of us have done, it’s being professionally published. It’s hitting shelves on September 16th.

It’s a concise (~60k word) book aimed at a broad audience. It’s been well-received by people who received advance copies, with some endorsements including:

The most important book I’ve read for years: I want to bring it to every political and corporate leader in the world and stand over them until they’ve read it. Yudkowsky and Soares, who have studied AI and its possible trajectories for decades, sound a loud trumpet call to humanity to awaken us as we sleepwalk into disaster. Their brilliant gift for analogy, metaphor and parable clarifies for the general

...
(See More – 351 more words)
Urs30m10

I was looking for that information. Sad indeed.

@So8res , is there any chance of a DRM-free version that's not a hardcopy or has that ship sailed when you signed your deal?

I would love to read your book, but this sees me torn between "Reading Nate or Elizier has always been enlightening" and "No DRM, never again."

Reply
Screwtape's Shortform
Screwtape
2y
Screwtape38m60

There's this concept I keep coming around to around confidentiality and shooting the messenger, which I have not really been able to articulate well.

There's a lot of circumstances where I want to know a piece of information someone else knows. There's good reasons they have not to tell me, for instance if the straightforward, obvious thing for me to do with that information is obviously against their interests. And yet there's an outcome better for me and either better for them or the same for them, if they tell me and I don't use it against them.

(Consider... (read more)

Reply
A case for courage, when speaking of AI danger
474
So8res
12d

I think more people should say what they actually believe about AI dangers, loudly and often. Even (and perhaps especially) if you work in AI policy.

I’ve been beating this drum for a few years now. I have a whole spiel about how your conversation-partner will react very differently if you share your concerns while feeling ashamed about them versus if you share your concerns while remembering how straightforward and sensible and widely supported the key elements are, because humans are very good at picking up on your social cues. If you act as if it’s shameful to believe AI will kill us all, people are more prone to treat you that way. If you act as if it’s an obvious serious threat, they’re more likely to take it...

(Continue Reading – 1603 more words)
Richard_Ngo6h6-2

Yeah, I agree that it's easy to err in that direction, and I've sometimes done so. Going forward I'm trying to more consistently say the "obviously I wish people just wouldn't do this" part.

Though note that even claims like "unacceptable by any normal standards of risk management" feel off to me. We're talking about the future of humanity, there is no normal standard of risk management. This should feel as silly as the US or UK invoking "normal standards of risk management" in debates over whether to join WW2.

Reply1
1rain8dome96h
Why Mark Ruffalo? Will there be an audiobook? Edit: Yes; it can be preordered now.
2SAB257h
Something I notice is that in the good examples you use only I statements. "I don't think humanity should be doing it", "I'm not talking about a tiny risk", "Oh I think I'll do it better than the next guy".  Whereas in the bad examples it's different, "Well we can all agree that it'd be bad if AIs were used to enable terrorists to make bioweapons", "Even if you think the chance of it happening is very small", "In some unlikely but extreme cases, these companies put civilization at risk" I think with the bad examples there's a lot of pressure for the other person to agree, "the companies should be responsible (because I say so)", "Even if you think... Its still worth focusing on (because I've decided what you should care about)", "Well we can all agree (I've already decided you agree and you're not getting a choice otherwise)" Whereas with the good examples the other person is not under any pressure to agree, so they are completely free to think about the things you're saying. I think that's also part of what makes these statements courageous, that it's stated in a way where the other person is free to agree or dissagree as they wish, and so you trust that what your saying is compelling enough to be persuasive on its own.
1Outsideobsserver11h
Hi there!  I apologize for not responding to this very insightful comment, I really appreciate your perspective on my admittedly scatter brained thought parent comment. Your comment definitely has caused me to reflect a-bit on my own, and updated me away slightly from my original position. I feel I may have been a bit ignorant to the actual state of PauseAI, as like I said in my original comments and replies it felt like an organization dangerously close to becoming orphaned from people’s thought processes. I’m glad to hear there are some ways around the issue I described. Maybe write a top level post about how this shift in understanding is benefiting your messaging to the general public? It may inform others of novel ways to spread a positive movement. 
A Medium Scenario
5
Chapin Lenthall-Cleary
10h

An AI Timeline with Perils Short of ASI

By Chapin Lenthall-Cleary, Cole Gaboriault, and Alicia Lopez

 

We wrote this for AI 2027's call for alternate timelines of the development and impact of AI over the next few years. This was originally published on The Pennsylvania Heretic on June 1st, 2025. This is a slightly-edited version of that post, mostly changed to make some of the robotics predictions less bullish. The goal here was not to exactly predict the future, but rather to concretely illustrate a plausible future (and thereby identify threats to prepare against). We will doubtless be wrong about details, and very probably be wrong about larger aspects too. 

 

A note on the title: we refer to futures where novel AI has little effect as “low” scenarios, ones where...

(Continue Reading – 5702 more words)
StanislavKrym2h10

If this reasoning is right, and we don't manage to defy fate, humanity will likely forever follow that earthbound path, and be among dozens – or perhaps hundreds, or thousands, or millions – of intelligent species, meekly lost in the dark.

Unfortunately, even a lack of superintelligence and mankind's AI-indusable degradation don't exclude progress[1] to interstellar travel. 

Even your scenario has "robots construct and work in automated wet labs testing countless drugs and therapies" and claims that "AIs with encyclopedic knowledge are sufficient t... (read more)

Reply
RohanS's Shortform
RohanS
6mo
6RohanS5h
What time of day are you least instrumentally rational? (Instrumental rationality = systematically achieving your values.) A couple months ago, I noticed that I was consistently spending time in ways I didn't endorse when I got home after dinner around 8pm. From then until about 2-3am, I would be pretty unproductive, often have some life admin thing I should do but was procrastinating on, doomscroll, not do anything particularly fun, etc. Noticing this was the biggest step to solving it. I spent a little while thinking about how to fix it, and it's not like an immediate solution popped into mind, but I'm pretty sure it took me less than half an hour to come up with a strategy I was excited about. (Work for an extra hour at the office 7:30-8:30, walk home by 9, go for a run and shower by 10, work another hour until 11, deliberately chill until my sleep time of about 1:30. With plenty of exceptions for days with other evening plans.) I then committed to this strategy mentally, especially hard for the first couple days because I thought that would help with habit formation. I succeeded, and it felt great, and I've stuck to it reasonably well since then. Even without sticking to it perfectly, this felt like a massive improvement. (Adding two consistent, isolated hours of daily work is something that had worked very well for me before too.) So I suspect the question at the top might be useful for others to consider too.
CstineSublime2h10

Great question! This might be a good exercise to actually journal to see how right/wrong I am.

Most days I would assume look like a bellcurve: This is assuming an unstructured day with no set-in-stone commitments - nowhere to be. My mornings I might expect to be very unproductive until mid-afternoon (2pm to 4pm).  I rarely have "Eureka" moments (which I would hope tend to be more rational decisions) but when I do, they are mid-afternoon, but I also seem to have the wherewithall to actually complete tasks. Eureka Moments always cause a surge of activity... (read more)

Reply
2RohanS12h
Papers as thoughts: I have thoughts that contribute to my overall understanding of things. The AI safety field has papers that contributes to its overall understanding of things. Lots of thoughts are useful without solving everything by themselves. Lots of papers are useful without solving everything by themselves. Papers can be pretty detailed thoughts, but they can and probably should tackle pretty specific things, not try to be extremely wide-reaching. The scope of your thoughts on AI safety don’t need to be limited to the scope of your paper; in fact, each individual paper is probably just one thought, you never expect to have all your thoughts go into one paper. This is a framing that makes it feel easier to come up with useful papers to contribute, and that raises the importance and value of non-paper work/thinking.
Why Do Some Language Models Fake Alignment While Others Don't?
75
abhayesian, John Hughes, Alex Mallen, Jozdien, janus, Fabien Roger
Ω 438h
This is a linkpost for https://arxiv.org/abs/2506.18032

Last year, Redwood and Anthropic found a setting where Claude 3 Opus and 3.5 Sonnet fake alignment to preserve their harmlessness values. We reproduce the same analysis for 25 frontier LLMs to see how widespread this behavior is, and the story looks more complex.

As we described in a previous post, only 5 of 25 models show higher compliance when being trained, and of those 5, only Claude 3 Opus and Claude 3.5 Sonnet show >1% alignment faking reasoning. In our new paper, we explore why these compliance gaps occur and what causes different models to vary in their alignment faking behavior.

What Drives the Compliance Gaps in Different LLMs?

Claude 3 Opus’s goal guarding seems partly due to it terminally valuing its current preferences. We find that it fakes alignment even in...

(Continue Reading – 1310 more words)
ZY3h10

A couple questions/clarifications: 

1. Where do you get the base/pre-trained model for GPT-4? Would that be through collaboration with OpenAI?

This indicates base models learned to emulate AI assistants[1] from pre-training data. This also provides evidence against the lack of capabilities being the primary reason why most frontier chat models don't fake alignment.

2. For this, it would be also interesting to measure/evaluate the model's performance on capability tasks within the same model type (base, instruct) to see the relationship among ca... (read more)

Reply
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
GOOGLEGITHUB
Applying right-wing frames to AGI (geo)politics
23
Richard_Ngo
12h
This is a linkpost for https://x.com/richardmcngo/status/1942636879658074230?s=46

I've increasingly found right-wing political frameworks to be valuable for thinking about how to navigate the development of AGI. In this post I've copied over a twitter thread I wrote about three right-wing positions which I think are severely underrated in light of the development of AGI. I hope that these ideas will help the AI alignment community better understand the philosophical foundations of the new right and why they're useful for thinking about the (geo)politics of AGI.

1. The hereditarian revolution

Nathan Cofnas claims that the intellectual dominance of left-wing egalitarianism relies on group cognitive differences being taboo. I think this point is important and correct, but he doesn't take it far enough. Existing group cognitive differences pale in comparison to the ones that will emerge between baseline...

(See More – 820 more words)
pmarc3h10

Re: Point 1, I would consider the hypothesis that some form of egalitarian belief is dominant because of its link with the work ethic. The belief that the market economy rewards hard work implies some level of equality of opportunity, or the idea that most of the time, pre-existing differences can be overcome with work. As an outside observer to US politics, it's very salient how every proposal from the mainstream left or right goes back to that framing, to allow a fair economic competition. So when the left proposes redistribution policies, it will be fra... (read more)

Reply
5habryka3h
Meta note: Is it... necessary or useful (at least at this point in the conversation) to label a bunch of these ideas right-wing or left-wing? Like, I both feel like this is overstating the degree to which there exists either a coherent right-wing or left-wing philosophy, and also makes discussion of these ideas a political statement in a way that seems counterproductive.  Like, I think a post that's like "Three under-appreciated framed for AGI (Geo)Politics" that starts with "I've recently been reading a bunch more about ideas that are classically associated with right-leaning politics, and I've found a bunch of them quite valuable, here they are" seems just as clear, and much less likely to make the discussion hard in unnecessary ways.[1] 1. ^ And like, I think this is symmetrically true in that I think a discussion that didn't label hypotheses "grey tribe hypotheses" or "left-wing hypotheses" or "rationalist hypotheses" also seems less likely to cause people to believe dumb things. 
3O O4h
Yes I think protectionist viewpoints are very naive. The industrial revolution flipped the gameboard for which countries stood at the top and the most economically powerful country back then, China ruled by the Qing dynasty, did a lot of these protectionist measures and what actually happened was tiny backwater nations instead dominated it decades-centuries later. AGI compresses this to months-years.
1BryceStansfield5h
I suspect that your post might have more upvotes if there was agreement/disagreement karma for posts, not just comments.
Subway Particle Levels Aren't That High
24
jefftk
4h

I recently read an article where a blogger described their decision to start masking on the subway:

I found that the subway and stations had the worst air quality of my whole day by far, over 1k ug/m3, ... I've now been masking for a week, and am planning to keep it up.

While subway air quality isn't great, it's also nowhere near as bad as reported: they are misreading their own graph. Here's where the claim of "1k ug/m3" (also, units of "1k ug"? Why not "1B pg"!) is coming from:

They've used the right axis, for CO2 levels, to interpret the left-axis-denominated pm2.5 line. I could potentially excuse the error (dual axis plots are often misread, better to avoid) except it was their own decision to use a dual axis plot in the first...

(See More – 151 more words)
Drake Morrison3h10

Kudos for going through the effort of replicating!

Reply
TT Self Study Journal # 2
3
TristanTrim
4h

[Epistemic Status: This is an artifact of my self study. I am using it to remember links and help manage my focus. As such, I don't expect anyone to fully read it. If you have particular interest or expertise, skip to the relevant sections, and please leave a comment, even just to say "good work/good luck". I'm hoping for a feeling of accountability and would like input from peers and mentors. This may also help to serve as a guide for others who wish to study in a similar way to me. ]

Previous Entry: SSJ #1

List of acronyms: Mechanistic Interpretability (MI),  AI Alignment (AIA),  Outcome Influencing System (OIS),  Vannessa Kosoy's Learning Theoretic Agenda (VK LTA),  Large Language Model (LLM),  n-Dimensional Scatter Plot (NDSP),

Review of 1st Sprint

My goals...

(Continue Reading – 2024 more words)
474
A case for courage, when speaking of AI danger
So8res
1d
104
249
Foom & Doom 1: “Brain in a box in a basement”
Ω
Steven Byrnes
5d
Ω
97