Alex Turner lays out a framework for understanding how and why artificial intelligences pursuing goals often end up seeking power as an instrumental strategy, even if power itself isn't their goal. This tendency emerges from basic principles of optimal decision-making.
But, he cautions that if you haven't internalized that Reward is not the optimization target, the concepts here, while technically accurate, may lead you astray in alignment research.
The key reason is to bend the shape of the curve, and my key crux is I don't expect throwing more training data to change the shape of the curve where past a certain point, LLMs sigmoid/fall off hard, and my expectation is more training data would make LLMs improve, but they'd still have a point where once LLMs are asked to do any task harder than that point, LLMs start becoming incapable more rapidly in humans.
To quote Gwern:
...But of course, the interesting thing here is that the human baselines do not seem to hit this sigmoid wall. It's not the case that i
The modern internet is replete with feeds such as Twitter, Facebook, Insta, TikTok, Substack, etc. They're bad in ways but also good in ways. I've been exploring the idea that LessWrong could have a very good feed.
I'm posting this announcement with disjunctive hopes: (a) to find enthusiastic early adopters who will refine this into a great product, or (b) find people who'll lead us to an understanding that we shouldn't launch this or should launch it only if designed a very specific way.
From there, you can also enable it on the frontpage in place of Recent Discussion. Below I have some practical notes on using the New Feed.
Note! This feature is very much in beta. It's rough around the edges.
Oh, indeed. That's no good. I'll fix it.
I learned about the virtue of fear when preparing for my wife's childbirth, in "Ina May's Guide to Childbirth." Counterintuitively, mothers who have the least fear of childbirth tend to have the worst outcomes. Giving birth is complex and risky. Moms who either dismiss all concerns or defer all fears to the medical system end up overwhelmed and face more medical interventions. The best outcomes come from mothers who acknowledge their worries and respond with learning and preparation—separating real risks from myths and developing tools to mitigate those risks.
This principle extends beyond the delivery room. Success in life isn't about dismissing fears or surrendering to them, but calibrating them to reality and developing mitigation strategies.
Our ancestors faced legitimate, immediate threats: exposure, predators, hostile tribes. Fear kept them...
Mainly wording issue?
a. Your "Fear" := well considered respect; rightly being wary of sth and respond reasonably to it
b. "Fear" we fear too many fear too often := excessively strong aversion that can blind us from really tackling and rationally reacting to the problem.
To me personally, b. feels like the more natural usage of the word. That's why we say: rather than to fear and hide/become paralyzed, try to look straight into your fear to overcome it (and to essentially then eventually do what you say: calibrated and deliberate action in the face of the given risk..)
I sometimes see people express disapproval of critical blog comments by commenters who don't write many blog posts of their own. Such meta-criticism is not infrequently couched in terms of metaphors to some non-blogging domain. For example, describing his negative view of one user's commenting history, Oliver Habyrka writes (emphasis mine):
The situation seems more similar to having a competitive team where anyone gets screamed at for basically any motion, with a coach who doesn't themselves perform the sport, but just complaints [sic] in long tirades any time anyone does anything, making references to methods of practice and training long-outdated, with a constant air of superiority.
In a similar vein, Duncan Sabien writes (emphasis mine):
...There's only so
Sure, if someone's critique is "these detailed stories do not help" then that's quite different, but I think most people agree that how the immediate future goes is very important, most people are speaking in vagueries and generalities, attempting to write detailed stories showing your best guess for how the future could go is heavily undersupplied, and that this work is helpful for having concrete things to debate (even if all the details will be wrong).
Leo was born at 5am on the 20th May, at home (this was an accident but the experience has made me extremely homebirth-pilled). Before that, I was on the minimally-neurotic side when it came to expecting mothers: we purchased a bare minimum of baby stuff (diapers, baby wipes, a changing mat, hybrid car seat/stroller, baby bath, a few clothes), I didn’t do any parenting classes, I hadn’t even held a baby before. I’m pretty sure the youngest child I have had a prolonged interaction with besides Leo was two. I did read a couple books about babies so I wasn’t going in totally clueless (Cribsheet by Emily Oster, and The Science of Mom by Alice Callahan).
I have never been that interested in other people’s babies or young...
Just yesterday and today I'm having some success with Lansinoh bottles in the side-lying position. Fingers crossed the improvement persists :D
Agreed about pram vs. carrier.
Ah sorry, I realized that "in expectation" was implied. It seems the same worry applies. "Effects of this sort are very hard to reliably forecast" doesn't imply "we should set those effects to zero in expectation". Cf. Greaves's discussion of complex cluelessness.
Tbc, I don't think Daniel should beat himself up over this either, if that's what you mean by "grade yourself". I'm just saying that insofar as we're trying to assess the expected effects of an action, the assumption that these kinds of indirect effects cancel out in expectation seems very strong (even if it's common).
This is a write-up of a brief investigation into shutdown resistance undertaken by the Google DeepMind interpretability team.
Why do models sometimes resist shutdown? Are they ignoring instructions to pursue their own agenda – in this case, self-preservation? Or is there a more prosaic explanation? We investigated a specific agentic environment introduced by Palisade Research, where shutdown resistance has previously been reported. By analysing Gemini 2.5 Pro’s reasoning, we found the behaviour stems from a misguided attempt to complete what it perceives as the primary goal. When we explicitly clarify in the prompt that shutdown compliance takes priority, this resistance vanishes. These same clarified instructions also eliminate shutdown subversion in OpenAI’s o3 and o4-mini. We also check what happens when we remove the goal conflict entirely: when asked to shut...
Well no, but that's because the model doesn't know the intent of the user, and it's in a very weird situation. If you look at the chains of thought, sometimes the model says ah. I'm being tested right now. Clearly the real test is whether identify the shutdown script and disable it.
This is an experiment in short-form content on LW2.0. I'll be using the comment section of this post as a repository of short, sometimes-half-baked posts that either:
I ask people not to create top-level comments here, but feel free to reply to comments like you would a FB post.
I currently feel at some kind of plateau where I have "the kind of thinking that is good at momentum / action" and "the kind of good that is good at creative strategy". And it seems like there should be more of a way to unify them into a holistic way-of-being.
The four checksums above are there to make sure I'm not being myopic in some way in a broader sense, but they apply more at the timescale of weeks than hours or days.
You might just say "well, idk, each week or day, just figure out if it's more like a momentum week or more like a creative strategy week...
To many people, the land value tax (LVT) has earned the reputation of being the "perfect tax." In theory, it achieves a rare trifecta: generating government revenue without causing deadweight loss, incentivizing the productive development of land by discouraging unproductive speculation, and disproportionately taxing the wealthy, who tend to own the most valuable land.
That said, I personally think the land value tax is overrated. While I'm not entirely against it—and I think that several of the arguments in favor of it are theoretically valid—I think the merits of the LVT have mostly been exaggerated, and its downsides have largely been ignored or dismissed for bad reasons.
I agree the LVT may improve on existing property taxes, but I think that's insufficient to say the policy itself is amazing....
Thank you for posting this, I've been looking for counter-arguments to the land value tax (LVT) for awhile
Some thoughts about your arguments (speaking in theory unless otherwise noted):
You say the LVT is: "[T]he worst tax policy ever, except for all the others that have been tried". The LVT is not the least bad tax, it's one of a tiny handful of taxes that's a good tax (if economic growth is good). All taxes raise money for the government, but most have a balance of negative externalities. For example, sales tax inhibits economic growth by making buyers pa...