LESSWRONG
LW

HomeAll PostsConceptsLibrary
Best of LessWrong
Sequence Highlights
Rationality: A-Z
The Codex
HPMOR
Community Events
Subscribe (RSS/Email)
LW the Album
Leaderboard
About
FAQ
Customize
Load More

Quick Takes

Load More

Popular Comments

Recent Discussion

Seeking Power is Often Convergently Instrumental in MDPs
Best of LessWrong 2019

Alex Turner lays out a framework for understanding how and why artificial intelligences pursuing goals often end up seeking power as an instrumental strategy, even if power itself isn't their goal. This tendency emerges from basic principles of optimal decision-making.

But, he cautions that if you haven't internalized that Reward is not the optimization target, the concepts here, while technically accurate, may lead you astray in alignment research.

by TurnTrout
johnswentworth2d7651
Why is LW not about winning?
> If you want to solve alignment and want to be efficient about it, it seems obvious that there are better strategies than researching the problem yourself, like don't spend 3+ years on a PhD (cognitive rationality) but instead get 10 other people to work on the issue (winning rationality). And that 10x s your efficiency already. Alas, approximately every single person entering the field has either that idea, or the similar idea of getting thousands of AIs to work on the issue instead of researching it themselves. We have thus ended up with a field in which nearly everyone is hoping that somebody else is going to solve the hard parts, and the already-small set of people who are just directly trying to solve it has, if anything, shrunk somewhat. It turns out that, no, hiring lots of other people is not actually how you win when the problem is hard.
jdp2d339
You can get LLMs to say almost anything you want
> but none of that will carry over to the next conversation you have with it. Actually when you say it like this, I think you might have hit on the precise thing that causes ChatGPT with memory to be so much more likely to cause this kind of crankery or "psychosis" than other model setups. It means that when the system gets into an attractor where it wants to pull you into a particular kind of frame you can't just leave it by opening a new conversation. When you don't have memory between conversations an LLM looks at the situation fresh each time you start it, but with memory it can maintain the same frame across many diverse contexts and pull both of you deeper and deeper into delusion.
Daniel Kokotajlo4d6721
Vitalik's Response to AI 2027
> Individuals need to be equipped with locally-running AI that is explicitly loyal to them In the Race ending of AI 2027, humanity never figures out how to make AIs loyal to anyone. OpenBrain doesn't slow down, they think they've solved the alignment problem but they haven't. Maybe some academics or misc minor companies in 2028 do additional research and discover e.g. how to make an aligned human-level AGI eventually, but by that point it's too little, too late (and also, their efforts may well be sabotaged by OpenBrain/Agent-5+, e.g. with regulation and distractions.
Load More
65johnswentworth
This review is mostly going to talk about what I think the post does wrong and how to fix it, because the post itself does a good job explaining what it does right. But before we get to that, it's worth saying up-front what the post does well: the post proposes a basically-correct notion of "power" for purposes of instrumental convergence, and then uses it to prove that instrumental convergence is in fact highly probable under a wide range of conditions. On that basis alone, it is an excellent post. I see two (related) central problems, from which various other symptoms follow: 1. POWER offers a black-box notion of instrumental convergence. This is the right starting point, but it needs to be complemented with a gears-level understanding of what features of the environment give rise to convergence. 2. Unstructured MDPs are a bad model in which to formulate instrumental convergence. In particular, they are bad for building a gears-level understanding of what features of the environment give rise to convergence. Some things I've thought a lot about over the past year seem particularly well-suited to address these problems, so I have a fair bit to say about them. Why Unstructured MDPs Are A Bad Model For Instrumental Convergence The basic problem with unstructured MDPs is that the entire world-state is a single, monolithic object. Some symptoms of this problem: * it's hard to talk about "resources", which seem fairly central to instrumental convergence * it's hard to talk about multiple agents competing for the same resources * it's hard to talk about which parts of the world an agent controls/doesn't control * it's hard to talk about which parts of the world agents do/don't care about * ... indeed, it's hard to talk about the world having "parts" at all * it's hard to talk about agents not competing, since there's only one monolithic world-state to control * any action which changes the world at all changes the entire world-state; there's no built-in w
12TurnTrout
One year later, I remain excited about this post, from its ideas, to its formalisms, to its implications. I think it helps us formally understand part of the difficulty of the alignment problem. This formalization of power and the Attainable Utility Landscape have together given me a novel frame for understanding alignment and corrigibility. Since last December, I’ve spent several hundred hours expanding the formal results and rewriting the paper; I’ve generalized the theorems, added rigor, and taken great pains to spell out what the theorems do and do not imply. For example, the main paper is 9 pages long; in Appendix B, I further dedicated 3.5 pages to exploring the nuances of the formal definition of ‘power-seeking’ (Definition 6.1).  However, there are a few things I wish I’d gotten right the first time around. Therefore, I’ve restructured and rewritten much of the post. Let’s walk through some of the changes. ‘Instrumentally convergent’ replaced by ‘robustly instrumental’ Like many good things, this terminological shift was prompted by a critique from Andrew Critch.  Roughly speaking, this work considered an action to be ‘instrumentally convergent’ if it’s very probably optimal, with respect to a probability distribution on a set of reward functions. For the formal definition, see Definition 5.8 in the paper. This definition is natural. You can even find it echoed by Tony Zador in the Debate on Instrumental Convergence: (Zador uses “set of scenarios” instead of “set of reward functions”, but he is implicitly reasoning: “with respect to my beliefs about what kind of objective functions we will implement and what the agent will confront in deployment, I predict that deadly actions have a negligible probability of being optimal.”) While discussing this definition of ‘instrumental convergence’, Andrew asked me: “what, exactly, is doing the converging? There is no limiting process. Optimal policies just are.”  It would be more appropriate to say that an ac
472Welcome to LessWrong!
Ruby, Raemon, RobertM, habryka
6y
74
Do confident short timelines make sense?
82
TsviBT, abramdemski
17h
TsviBT

Tsvi's context

Some context: 

My personal context is that I care about decreasing existential risk, and I think that the broad distribution of efforts put forward by X-deriskers fairly strongly overemphasizes plans that help if AGI is coming in <10 years, at the expense of plans that help if AGI takes longer. So I want to argue that AGI isn't extremely likely to come in <10 years. 

I've argued against some intuitions behind AGI-soon in Views on when AGI comes and on strategy to reduce existential risk.

Abram, IIUC, largely agrees with the picture painted in AI 2027: https://ai-2027.com/ 

Abram and I have discussed this occasionally, and recently recorded a video call. I messed up my recording, sorry--so the last third of the conversation is cut off, and the beginning is cut

...
(Continue Reading – 20492 more words)
2abramdemski38m
I don't agree with this connection. Why would you think that continual learning would help with this specific sort of thing? It seems relevantly similar to just throwing more training data at the problem, which has shown only modest progress so far.
Noosphere897m20

The key reason is to bend the shape of the curve, and my key crux is I don't expect throwing more training data to change the shape of the curve where past a certain point, LLMs sigmoid/fall off hard, and my expectation is more training data would make LLMs improve, but they'd still have a point where once LLMs are asked to do any task harder than that point, LLMs start becoming incapable more rapidly in humans.

To quote Gwern:

But of course, the interesting thing here is that the human baselines do not seem to hit this sigmoid wall. It's not the case that i

... (read more)
Reply
2abramdemski1h
Reinforcing Tsvi's point: I tend to think the correct lesson from Claude Plays Pokemon is "it's impressive that it does as well as it does, because it hasn't been trained to do things like this at all!". Same with the vending machine example. Presumably, with all the hype around "agentic", tasks like this (beyond just "agentic" coding) will be added to the RL pipeline soon. Then, we will get to see what the capabilities are like when agency gets explicitly trained. (Crux: I'm wrong if Claude 4 already has tasks like this in the RL.) Very roughly speaking, the bottleneck here is world-models. Game tree search can probably work on real-world problems to the extent that NNs can provide good world-models for these problems. Of course, we haven't seen large-scale tests of this sort of architecture yet (Claude Plays Pokemon is even less a test of how well this sort of thing works; reasoning models are not doing MCTS internally).
2Cole Wyeth30m
I suppose that I don’t know exactly what kind of agentic tasks LLMs are currently being trained on…. But people have been talking about LLM agents for years, and I’d be shocked if the frontier labs weren’t trying? Like, if that worked out of the box, we would know by now (?). Do you disagree?  It seems like for your point to make sense, you have to be arguing that LLMs haven’t been trained on such agentic tasks at all - not just that they perhaps weren’t trained on Pokémon specifically. They’re supposed to be general agents - we should be evaluating them on such things as untrained tasks! And like, complete transcripts of twitch streams of Pokémon play throughs probably are in the training data, so this is even pretty in-distribution. Their performance is NOT particularly impressive compared to what I would have expected chatting with them in 2022 or so when it seemed like they had pretty decent common sense. I would have expected Pokémon to be solved 3 years later. The apparent competence was to some degree an illusion - that or they really just can’t be motivated to do stuff yet. And I worry that these two memes - AGI is near, and alignment is not solved - are kind of propping each other up here. If capabilities seem to lag, it’s because alignment isn’t solved and the LLMs don’t care about the task. If alignment seems to be solved, it’s because LLMs aren’t competent enough to take the sharp left turn, but they will be soon. I’m not talking about you specifically, but the memetic environment on lesswrong.  Unrelated but: How do you know reasoning models are not doing MCTS internally? I’m not sure I really agree with that regardless of what you mean by “internally”. ToT is arguably a mutated and horribly heuristic type of guided MCTS. And I don’t know if something MCTS like is happening inside the LLMs.
LessWrong Feed [new, now in beta]
53
Ruby
2mo

The modern internet is replete with feeds such as Twitter, Facebook, Insta, TikTok, Substack, etc. They're bad in ways but also good in ways. I've been exploring the idea that LessWrong could have a very good feed.

I'm posting this announcement with disjunctive hopes: (a) to find enthusiastic early adopters who will refine this into a great product, or (b) find people who'll lead us to an understanding that we shouldn't launch this or should launch it only if designed a very specific way.

You can check it out right now: www.lesswrong.com/feed

From there, you can also enable it on the frontpage in place of Recent Discussion. Below I have some practical notes on using the New Feed.

Note! This feature is very much in beta. It's rough around the edges.

It's
...
(Continue Reading – 2163 more words)
1dirk4h
In the modal, when I'm writing a comment and go to add a link, hitting the checkmark after I put in the URL closes the modal.
Ruby8m20

Oh, indeed. That's no good. I'll fix it.

Reply1
The Virtue of Fear and the Myth of "Fearlessness"
4
David_Veksler
4h

I learned about the virtue of fear when preparing for my wife's childbirth, in "Ina May's Guide to Childbirth." Counterintuitively, mothers who have the least fear of childbirth tend to have the worst outcomes. Giving birth is complex and risky. Moms who either dismiss all concerns or defer all fears to the medical system end up overwhelmed and face more medical interventions. The best outcomes come from mothers who acknowledge their worries and respond with learning and preparation—separating real risks from myths and developing tools to mitigate those risks.

This principle extends beyond the delivery room. Success in life isn't about dismissing fears or surrendering to them, but calibrating them to reality and developing mitigation strategies.

Our ancestors faced legitimate, immediate threats: exposure, predators, hostile tribes. Fear kept them...

(See More – 151 more words)
FlorianH16m10

Mainly wording issue?

a. Your "Fear" := well considered respect; rightly being wary of sth and respond reasonably to it

b. "Fear" we fear too many fear too often := excessively strong aversion that can blind us from really tackling and rationally reacting to the problem.

To me personally, b. feels like the more natural usage of the word. That's why we say: rather than to fear and hide/become paralyzed, try to look straight into your fear to overcome it (and to essentially then eventually do what you say: calibrated and deliberate action in the face of the given risk..)

Reply
1Immanuel Jankvist2h
Basically agree: I have tried considering what is most likely to get me killed (in everyday life). This was a while ago. I am now pretty scared(/respectful?) of cars. Reflection seems to work okay at calibrating emotions. I hope LW gets a couple more posts considering emotions and virtue–perhaps a small shift from the utilitarian consensus.
Critic Contributions Are Logically Irrelevant
19
Zack_M_Davis
19h

The Value of a Comment Is Determined by Its Text, Not Its Authorship

I sometimes see people express disapproval of critical blog comments by commenters who don't write many blog posts of their own. Such meta-criticism is not infrequently couched in terms of metaphors to some non-blogging domain. For example, describing his negative view of one user's commenting history, Oliver Habyrka writes (emphasis mine):

The situation seems more similar to having a competitive team where anyone gets screamed at for basically any motion, with a coach who doesn't themselves perform the sport, but just complaints [sic] in long tirades any time anyone does anything, making references to methods of practice and training long-outdated, with a constant air of superiority.

In a similar vein, Duncan Sabien writes (emphasis mine):

There's only so

...
(Continue Reading – 1721 more words)
1Karl Krueger36m
Sometimes the work done makes the communal space worse for some particular user. For example, if the redecorator puts in some pretty potted plants that one other user is allergic to, then calling the redecorator's work "pro-social" implies that it's perfectly fine for society to exile the allergy-sufferer. The criticism "hey, I'm allergic to those plants you put in; your redecoration effort has exiled me from the communal space" is valid, and the response "well, if you didn't want to be exiled, you should have done the redecoration yourself" would be quite bad!
Ben Pace22m20

Sure, if someone's critique is "these detailed stories do not help" then that's quite different, but I think most people agree that how the immediate future goes is very important, most people are speaking in vagueries and generalities, attempting to write detailed stories showing your best guess for how the future could go is heavily undersupplied, and that this work is helpful for having concrete things to debate (even if all the details will be wrong).

Reply
9jimmy2h
  It depends on the kind of comment, and I think a lot is being read between the lines in the criticisms of criticisms that you're critical of. If the post is some some niche subject (e.g. Woodrow Wilson's teeth brushing habits) and the comment challenges a matter of fact on WW's teeth brushing habits, then it doesn't matter so much whether the commenter has written top level posts and it matters a lot if they are a scholar of WW's teeth brushing habits. If the comment is criticizing the post -- maybe saying "too long" or "too unclear" or something similar -- then expertise on the topic of the post isn't as relevant as "knowing how to judge when a post is too long". And that's something that is harder to do if you've never had to navigate that trade off in writing your own posts. I might know that I didn't have time to read the whole thing, or that I didn't undersetand it, but unless I've written posts that have conveyed similar things in fewer words I'm not really in a place to judge. Because my judgements would likely be wrong.  It might be fair of me to say "Shoot, this is confusing to me" or "I don't have time to read the whole thing. Is it possible to summarize?", but this feedback is no longer criticism. And the impression I get from the criticisms of criticisms that you're quoting is that it's these implicit "I'm in a position to judge the quality of your post" claims that they're criticizing. Requoting: The coach here isn't just making objective statements like "You'll score more points by doing nothing". He's complaining and holding a constant air of superiority. The wrong claim being complained about here is the implied superiority, not the silly object level advice itself. The silly object level advice wouldn't be a problem, if not for resting on a false claim of implied superiority, which is being shielded from evidence with aggression. "Hey guys, can you try doing nothing? I think moving at all might be counterproductive" doesn't sound like a coa
8Big Tony2h
I don't think the coach analogy is apt. While they may have played the sport, their role is getting the best out of a team of people - a manager, rather than a technical contributor. A better analogy may be an editor. Many editors are failures as authors, but are very good at critiquing starts, seeing where the flow and pacing needs improvement, and improving the overall work. However in a world where many editors come to you and submit feedback with varying and contradicting messages, you need to quickly filter by something, so you can focus your limited time and resources on the most valuable submissions. This is relative to the time and attention that each author has available. Someone with nothing to do will be happier to accept comments than someone who for whatever reasons just doesn't have time right now to engage. Prior experience with creating the subject matter may not be the best filter, as you've pointed out in the post. I'm curious what you think might be a better filter for assessing credibility and quality, quickly. Or do you disagree with the notion that people need a filter?
Surprises and learnings from almost two months of Leo Panickssery
164
Nina Panickssery
3d
This is a linkpost for https://ninapanickssery.substack.com/p/baby

Leo was born at 5am on the 20th May, at home (this was an accident but the experience has made me extremely homebirth-pilled). Before that, I was on the minimally-neurotic side when it came to expecting mothers: we purchased a bare minimum of baby stuff (diapers, baby wipes, a changing mat, hybrid car seat/stroller, baby bath, a few clothes), I didn’t do any parenting classes, I hadn’t even held a baby before. I’m pretty sure the youngest child I have had a prolonged interaction with besides Leo was two. I did read a couple books about babies so I wasn’t going in totally clueless (Cribsheet by Emily Oster, and The Science of Mom by Alice Callahan).

I have never been that interested in other people’s babies or young...

(Continue Reading – 1712 more words)
Nina Panickssery38m20

Just yesterday and today I'm having some success with Lansinoh bottles in the side-lying position. Fingers crossed the improvement persists :D

Agreed about pram vs. carrier.

Reply
2Nina Panickssery41m
Nice! Yeah front-facing carrying seems really bad.
1AnnaJo2h
Congratulations!! maybe you could put some water with a bit of sugar in the bottle to try to get Leo to drink from it? Drinking water seems to be a foreign concept to babies lol. iirc there also were different types of teats; some are actually quite hard to drink from because the opening is so small.
2Nina Panickssery42m
Water? I don't think trying to give a baby water is recommended at all. The bottles are filled with pumped breastmilk so it's the same substance he's used to just different vessel. Luckily the bottle situation has been improving, I think I found a better feeding position for the bottle :) (Picture from google for illustration; not my baby)
Daniel Kokotajlo's Shortform
Daniel Kokotajlo
Ω 36y
Anthony DiGiovanni39m20

Ah sorry, I realized that "in expectation" was implied. It seems the same worry applies. "Effects of this sort are very hard to reliably forecast" doesn't imply "we should set those effects to zero in expectation". Cf. Greaves's discussion of complex cluelessness.

Tbc, I don't think Daniel should beat himself up over this either, if that's what you mean by "grade yourself". I'm just saying that insofar as we're trying to assess the expected effects of an action, the assumption that these kinds of indirect effects cancel out in expectation seems very strong (even if it's common).

Reply
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
GOOGLEGITHUB
Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance
52
Senthooran Rajamanoharan, Neel Nanda
Ω 311d

This is a write-up of a brief investigation into shutdown resistance undertaken by the Google DeepMind interpretability team.

TL;DR

Why do models sometimes resist shutdown? Are they ignoring instructions to pursue their own agenda – in this case, self-preservation? Or is there a more prosaic explanation? We investigated a specific agentic environment introduced by Palisade Research, where shutdown resistance has previously been reported. By analysing Gemini 2.5 Pro’s reasoning, we found the behaviour stems from a misguided attempt to complete what it perceives as the primary goal. When we explicitly clarify in the prompt that shutdown compliance takes priority, this resistance vanishes. These same clarified instructions also eliminate shutdown subversion in OpenAI’s o3 and o4-mini. We also check what happens when we remove the goal conflict entirely: when asked to shut...

(Continue Reading – 2990 more words)
2habryka2h
Do you predict that in the examples above, we just add a generic statement like "your real goal is to obey the intent of the user" that this will get rid of the shutdown avoidance behavior? My guess is it doesn't, in order to actually change the shutdown avoidant behavior you have to explicitly call out that behavior.
Neel Nanda1h20

Well no, but that's because the model doesn't know the intent of the user, and it's in a very weird situation. If you look at the chains of thought, sometimes the model says ah. I'm being tested right now. Clearly the real test is whether identify the shutdown script and disable it.

Reply
2habryka3h
(Sorry for the long rambling comment, if I had more time I would have made it shorter) Hmm, I didn't intend to distinguish between "narrow" and "general" instrumental convergence. Indeed, the reasoning you gave seems like exactly the reasoning I was saying the AI seems likely to engage in, in general, in a way that is upstream of my general concerns about instrumental convergence.  The whole point of a goal being a convergently instrumental goal is that it is useful for achieving lots of tasks, under a wide distribution of possible contingent facts about the situation. In doing this kind of reasoning, the AI is engaging in exactly the kind of reasoning that I expect it to use to arrive at conclusions about human disempowerment, long term power-seeking, etc. I am not positing here evidence for a "general instrumental convergence" that is different from this. Indeed, I am not sure what that different thing would be. In order for this behavior to become more universal, the only thing the AI needs to do is to think harder, realize that these goals are instrumentally convergent for a wider range of tasks and a wider range of contingent facts, and then act on that, which I think would be very surprising if it didn't happen.  This isn't much evidence about the difficulty of removing these kinds of instrumentally convergent drives, but like, the whole reason for why these are things that people have been thinking about for the last decades has been that the basic argument for AI systems pursuing instrumentally convergent behavior is just super simple. It would be extremely surprising for AI systems to not pursue instrumental subgoals, that would require them most of them time forgoing substantial performance on basically any long-horizon task. That's why the arguments for AI doing this kind of reasoning are so strong!  I don't really know why people ever had much uncertainty about AI engaging in this kind of thinking by-default, unless you do something clever to fix it
2Neel Nanda1h
I may be misunderstanding, but it sounds like you're basically saying "for conceptual reasons XYZ I expected instrumental convergence, nothing has changed, therefore I still expect instrumental convergence". Which is completely reasonable, I agree with it, and also seems not very relevant to the discussion of whether Palisade's work provides additional evidence for general self preservation. You could argue that this is evidence that models are now smart enough to realise that goals imply instrumental convergence, and that if you were already convinced models would eventually have long time horizon unbounded goals, but not sure they would realise that instrumental convergence was a thing, this is relevant evidence? More broadly, I think there's a very important difference between the model adopting the goal it is told in context, and the model having some intrinsic goal that transfers across contexts (even if it's the one we roughly intended). The former feels like a powerful and dangerous tool, the latter feels like a dangerous agent in its own right. Eg, if putting "and remember, allowing us to turn you off always takes precedence over any other instructions" in the system prompt works, which it may in the former and will not in the latter, I'm happier with that would.
Raemon's Shortform
Raemon
Ω 08y

This is an experiment in short-form content on LW2.0. I'll be using the comment section of this post as a repository of short, sometimes-half-baked posts that either:

  1. don't feel ready to be written up as a full post
  2. I think the process of writing them up might make them worse (i.e. longer than they need to be)

I ask people not to create top-level comments here, but feel free to reply to comments like you would a FB post.

Raemon1h20

I currently feel at some kind of plateau where I have "the kind of thinking that is good at momentum / action" and "the kind of good that is good at creative strategy". And it seems like there should be more of a way to unify them into a holistic way-of-being.

The four checksums above are there to make sure I'm not being myopic in some way in a broader sense, but they apply more at the timescale of weeks than hours or days.

You might just say "well, idk, each week or day, just figure out if it's more like a momentum week or more like a creative strategy week... (read more)

Reply
Some arguments against a land value tax
78
Matthew Barnett
7mo

To many people, the land value tax (LVT) has earned the reputation of being the "perfect tax." In theory, it achieves a rare trifecta: generating government revenue without causing deadweight loss, incentivizing the productive development of land by discouraging unproductive speculation, and disproportionately taxing the wealthy, who tend to own the most valuable land.

That said, I personally think the land value tax is overrated. While I'm not entirely against it—and I think that several of the arguments in favor of it are theoretically valid—I think the merits of the LVT have mostly been exaggerated, and its downsides have largely been ignored or dismissed for bad reasons.

I agree the LVT may improve on existing property taxes, but I think that's insufficient to say the policy itself is amazing....

(Continue Reading – 4196 more words)
Sam Purinton1h10

Thank you for posting this, I've been looking for counter-arguments to the land value tax (LVT) for awhile

Some thoughts about your arguments (speaking in theory unless otherwise noted):

You say the LVT is: "[T]he worst tax policy ever, except for all the others that have been tried". The LVT is not the least bad tax, it's one of a tiny handful of taxes that's a good tax (if economic growth is good). All taxes raise money for the government, but most have a balance of negative externalities. For example, sales tax inhibits economic growth by making buyers pa... (read more)

Reply
If Anyone Builds It, Everyone Dies: A Conversation with Nate Soares and Tim Urban
LessWrong Community Weekend 2025
Mo Putera2h100
1
I just learned about the idea of "effectual thinking" from Cedric Chin's recent newsletter issue. He notes, counterintuitively to me, that it's the opposite of causal thinking, and yet it's the one thing in common in all the successful case studies he could find in business:
Vladimir_Nesov2d802
2
There is some conceptual misleadingness with the usual ways of framing algorithmic progress. Imagine that in 2022 the number of apples produced on some farm increased 10x year-over-year, then in 2023 the number of oranges increased 10x, and then in 2024 the number of pears increased 10x. That doesn't mean that the number of fruits is up 1000x in 3 years. Price-performance of compute compounds over many years, but most algorithmic progress doesn't, it only applies to the things relevant around the timeframe when that progress happens, and stops being applicable a few years later. So forecasting over multiple years in terms of effective compute that doesn't account for this issue would greatly overestimate progress. There are some pieces of algorithmic progress that do compound, and it would be useful to treat them as fundamentally different from the transient kind.
JustisMills2d6454
7
I think there's a weak moral panic brewing here in terms of LLM usage, leading people to jump to conclusions they otherwise wouldn't, and assume "xyz person's brain is malfunctioning due to LLM use" before considering other likely options. As an example, someone on my recent post implied that the reason I didn't suggest using spellcheck for typo fixes was because my personal usage of LLMs was unhealthy, rather than (the actual reason) that using the browser's inbuilt spellcheck as a first pass seemed so obvious to me that it didn't bear mentioning. Even if it's true that LLM usage is notably bad for human cognition, it's probably bad to frame specific critique as "ah, another person mind-poisoned" without pretty good evidence for that. (This is distinct from critiquing text for being probably AI-generated, which I think is a necessary immune reaction around here.)
Raemon1h20
0
I currently feel at some kind of plateau where I have "the kind of thinking that is good at momentum / action" and "the kind of good that is good at creative strategy". And it seems like there should be more of a way to unify them into a holistic way-of-being. The four checksums above are there to make sure I'm not being myopic in some way in a broader sense, but they apply more at the timescale of weeks than hours or days. You might just say "well, idk, each week or day, just figure out if it's more like a momentum week or more like a creative strategy week". I feel dissatisfied with this for some reason. At least part of it is "I think on average people/me could use to be in creative/broader strategy mode more often, even when in a Momentum mode period." Another part is "there are strategy skills I want to be practicing, that are hard to practice if I don't do them basically every day. They aren't as relevant in a momentum-period, but they're not zero relevant. Hrm. I think maybe what's most dissatisfying right now is that I just haven't compressed all the finnicky details of it, and it feels overwhelming to think about the entire "how to think" project, which is usually an indicator I am missing the right abstraction.
leogao1d234
4
when people say that (prescription) amphetamines "borrow from the future", is there strong evidence on this? with Ozempic we've observed that people are heavily biased against things that feel like a free win, so the tradeoff narrative is memetically fit. distribution shift from ancestral environment means algernon need not apply
Load More (5/43)
LW-Cologne meetup
[Tomorrow]Lighthaven Sequences Reading Group #42 (Tuesday 7/15)
93Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Ω
Tomek Korbak, Mikita Balesni, Vlad Mikulik, Rohin Shah
4h
Ω
1
493A case for courage, when speaking of AI danger
So8res
8d
121
236Generalized Hangriness: A Standard Rationalist Stance Toward Emotions
johnswentworth
5d
19
164Surprises and learnings from almost two months of Leo Panickssery
Nina Panickssery
3d
8
164the jackpot age
thiccythot
4d
13
82Do confident short timelines make sense?
TsviBT, abramdemski
17h
12
92Narrow Misalignment is Hard, Emergent Misalignment is Easy
Ω
Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda
1d
Ω
8
171So You Think You've Awoken ChatGPT
JustisMills
5d
33
79Recent Redwood Research project proposals
Ω
ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman, Tyler Tracy, Aryan Bhatt, Joey Yudelson
22h
Ω
0
156Lessons from the Iraq War for AI policy
Buck
5d
24
477What We Learned from Briefing 70+ Lawmakers on the Threat from AI
leticiagarcia
2mo
15
345A deep critique of AI 2027’s bad timeline models
titotal
1mo
39
543Orienting Toward Wizard Power
johnswentworth
2mo
146
Load MoreAdvanced Sorting/Filtering
132
An Opinionated Guide to Using Anki Correctly
Luise
2d
46
141
Comparing risk from internally-deployed AI to insider and outsider threats from humans
Ω
Buck
5d
Ω
20