It’s easy to criticize the type of “fake” planning that systematically avoids/defuses criticism rather than aiming for success i.e. appearing blameless for a loss versus actually trying to win. But I think there’s a lot of traction to be gained from refusing to lose in any silly way; it means you have to dovetail a lot of reasonable strategies, that you could be criticized for missing eg “why didn’t you just try…” Trying EVERY post facto obvious thing is harder than it sounds and quite powerful (time permitting). Universal search/learning algorithms includ...
AI agents might not need to develop long-term drives of their own to perform well at long-term tasks, if they can use humans for that purpose. As a corollary, AI agents that do not have long-term drives of their own can still act in long-term coherent ways, if they inhabit a world with abundant, easy-to-hire human labor. If it's cheaper to hire a human to get a scaffolded AI system to stay on track towards achieving some goal than it is to automate that monitoring, then it probably makes sense to just use the human.
Concretely, consider an AI system tasked ...
I find myself in agreement with basically everything in this comment, and yet I observe that most of the benefits of AI so far have looked like moving rarer and rarer classes of tasks from "the long tail" to "approximately solved". I suppose it's worth clarifying exactly what we mean by "the long tail" though.
I observe now that for most common low-level tasks (e.g. write a patch and apply it to a file) scaffolded LLMs can ever do, they can, given multiple attempts but no human feedback, do the task quite reliably. Likewise for most tasks which are simple c...
I dislike "The Curse of X" (Winner's Curse, Unilateralist's Curse, Curse of Knowledge, etc) because they don't really tell me what the curse mechanism is, they just vaguely remind me "something bad happens here", and in some cases it's unclear to me "does the Unilateralist think they are cursed, or, do they think everything is going fine, and it's everyone else who observes the curse?".
Scott's old post Concept-Shaped Holes Can Be Impossible To Notice says that concept-shaped holes can be impossible to notice, in oneself as well as others. You might be very off-base when estimating how much things that seem obvious and straightforward to you are actually very much unobvious and non-straightforward to other people. See also: the curse of knowledge.
I learned that writing something up or starting a conversation about a thing that seemed [obvious and therefore not worth talking about] can reveal that this thing is not as [obvious and therefor...
LLMs are current level are already phenomenal. Enough to usher in a new industrial revolution even without further progress. Also still remarkable how untethered or nonsensical their reasoning can be, even with Opus 4.6 or similar.
Ex1. I was working on parking brake issue with my car, comparing the clamping force I was getting the wheel with the observation that it had wanted to roll down the hill. I told it I was getting enough clamping to be unable to turn the wheel by hand.
...That said, 4 clicks with hubs-only holding firm is still probably fine in practic
We're also bad OOD and many of our supposed advantages over them boil down to our distribution differences (embodiment and first-person-first data).
Kind of and yeah?
I agree we're much better OOD than them but not so much that I think there's no comparison.
I wouldn't say "there's no comparison"[1], but I do think it looks like a "qualitative" difference. What exactly it is would require a more involved explication of the concept, which might be infohazardous.
Not really my way of speaking about this sort of stuff / I'm not sure what you mean by this.
I also don't see an option to publish to Alignment Forum (from an LW draft). There is a "Move to Alignment" button in the menu for a post on the list of Drafts (the menu is under the three-dots-vertically button that appears on mouse-over on the draft post line, but otherwise doesn't show). I guess it could be published there this way, but there's nothing there about the EA Forum. Though I'm not sure if my account is considered registered on the EA forum (if this happens on its own by default), not being registered there might be the reason the option does...
theory: a big difference between people who hate corporations and people who don't is the extent to which they like interacting with human-shaped things. some people like human shaped things and the sort of amoral profit maximization of companies feels alien and sociopathic. other people like the predictable API that companies provide.
if companies were people, they would be uncaring sociopathic humans. but if lawnmowers were people, they would also be uncaring and sociopathic
Transformer LLM have no middle ground. Without extensive customization you can expect the full sycophantic "helpful" (engagement/reward-optimized/hacked) response OR You can ask for the 10th man opinion, the devil's advocate, the contrarian view. I have yet to see continual reliable balanced middle ground responses without extensive prompting and context loading. Even then it's extremely hit or miss. Training biases and RLHF make LLM pretty inept in groundbreaking fields. The sycophanty self check "You've been agreeing for several turns now, you're fucking with me aren't you?", can't get exhausting after a while.
When the response to a statement begins "But surely—!" the rest of it will not be worth listening to. Or "But don't you think—", or even just "But—!"
The expostulating tone of voice is the tell. "But—!" is onomatopoeic: it is the sound of a mind spitting out an idea it has bitten on but cannot chew.
Most of the showers I have used in the US have a single dial / degree of freedom that goes from cold&low pressure to cold & high pressure to warm & high pressure. Where as most European showers I've used have 2 degrees of freedom, either in a single handle like your image or as separate dials
I'm very confused why purchasing power varies so dramatically internationally. like why are there countries where everyone has very low wages but everything is also really cheap so it balances out? prima facie, huge disparities like this should get evened out by arbitrage.
the simple explanation is that some labor can only be performed locally, labor mobility is limited (immigration laws, people don't like moving, etc), and transportation costs for goods exist (shipping and tariffs).
however, global shipping is ridiculously cheap. and the economy increasingl...
Do you hire them directly, or through some "body shop"? Because sometimes the intermediaries add an insane markup. Though it is probably less dramatic these days.
Claude says that the salary ratio is like this:
Entry (0–2 yrs) €18–22k €50–55k 2.5×
Junior (2–4 yrs) €24–28k €58–63k 2.3×
Mid (4–8 yrs) €33–40k €65–70k 1.9×
Senior (8+ yrs) €45–55k €80–100k 1.8×
I guess I should ask for a raise.
Meditations on Meditation:
I’ve always noticed something about meditation that I’ve never thought to articulate aloud before. (Note- I’m using mantra meditation as an example.) A beginning meditation practitioner- which is all I have ever been- is told to focus their awareness on the mantra, notice when their mind wanders without judgement, and then redirect their thoughts to the mantra. Similar instructions are given when the focus is the breath, etc. However, my experience of meditation has never been that simple; my attention comes in layers. The first a...
I agree, but "finite" is not necessarily a specific number. Like, people can only juggle a finite amount of balls, but that doesn't mean that 4 (or any other specific number) is the maximum.
Here are the 2025 AI safety papers and posts I like the most.
The list is very biased by my taste, by my views, by the people who had time to argue that their work is important to me, and by the papers that were salient to me when I wrote this list. I am highlighting the parts of papers I like, which is also very subjective. (This is similar to the 2024 edition here.)
Bangers on multiple dimensions
★★★ You can measure time horizon, and it grows predictably-ish (Measuring AI Ability to Complete Long Tasks)
★★★ Black-box techniques work better than you think + y...
Hmm, interesting. I think my standards for something to warrant the name "asymptotic alignment" would have been be lower than yours, to my surprise: I'd consider a technique stack to x%-qualify if that stack is a series of local alignment techniques which can be expected with x% confidence to end up landing us in the long-term basin of successful-asymptotic-alignment-by-the-year-2200-or-so. I think I'd rather update my understanding of the term than yours, but I'll have to keep it in mind for what language to use I suppose.
I think most of the places I expe...
Aligned to the leviathan or the citizen?
There's a thing people in AI safety leave unspoken: if we do align AI successfully (far from a given), we still have the problem of who it's aligned to.
After nature, governments have been responsible for the largest death counts in human history through war and famine:
The thing that has historically restra...
There's a thing people in AI safety leave unspoken: if we do align AI successfully (far from a given), we still have the problem of who it's aligned to.
My ideas about alignment derive from the dark ages when we talked about "Friendly AI", and I do not keep up with today's "AI safety" literature in any systematic way.
But may I point out that today's literature makes a basic distinction between "intent alignment" and "values alignment". An "intent-aligned" AI is really good at discerning what its user wants and fulfilling that; whereas a "values-aligned" A...
Small donors should not worldview-diversify.
Occasionally I encounter small donors (e.g. 10% pledgers earning <$200K) with highly specialised skills and knowledge (e.g. working on a sub-sub-topic of an EA cause area) who donate primarily to GiveWell top charities. These people do incredible amounts of good, and are highly commendable.
That said, I think would probably do more good by donating according to their inside view and special knowledge. Worldview diversification makes sense for large funders like Coefficient Giving, but their reasons don't apply ...
I think for small donors, donating to the best unregistered charity is >>2x times the best registered charity, for the reasons OP outlines: registered charities are much better covered by large institutions, and lots of people are overanchored on registration so the unregistered are neglected by comparison.
The counterargument is that bednets/givedirectly are just pretty good and it's unlikely any particular new thing beats them. Which is a fine approach, but not what we're talking about here.
Why I’m unconvinced by Tegmark’s argument for the mathematical universe hypothesis
The basic argument seems to be:
I don’t see why we should buy (2).
As far as I know, there are two arguments for (2)....
Freedom from human independent concepts is supposed to support the idea of a mathematical universe, not a relational one
Tegmark says (p. 10 here) "the only intrinsic properties of a mathematical structure are its relations". So I think I am representing his actual argument for MUH in the OP. I think "the claim that the idea that some , but only some, maths exists materially, is unnecessary baggage" is meant to derive the Level IV multiverse from MUH.
...It isn't obvious that maths isn't a human invention. It isn't obvious that an external world has to be indep
Something I'm thinking about today: frontier LLMs have a pretty unusual capabilities profile. This means one of two things: either I should think of LLMs as leveraging massive amounts of necessary compute and the problems they can solve as much more compute-vulnerable than I thought they were (i.e. this is Deep Blue and everything is kind of chess) or multiple intelligences models are simply true, in that cognition has multiple parts that don't necessarily have anything to do with each other. The latter predicts that step changes in capabilities are availa...
I agree that this is the relevant consideration. I think that if cognition has many parts, we should actually expect some parts that humans use to be completely missing in LLMs (and vice-versa), and it's not clear to me whether I should expect scaling architecture to actually produce more parts in this way, I have some intuitions that say (for combinatorial reasons) that within a certain architecture, training dynamics will eventually stop favoring the formation of circuits past a certain size regardless of how many layers you stack, but I am not that conf...
I'm cautiously optimistic about my new Claude Coach GitHub repo. I want to work out more but hate trying to decide what to do and tracking things, especially when I'm not working with a full gym. Now I just open Claude Code and ask it what to do (specifying the gym), do the work out, then update it with what I did and how it felt. It creates a PR to track the session and update the plan.
I still hate working out, but at least I don't have to go anywhere, deal with any people, or think about it all.
Perhaps my favourite relation in physics is
t/T = (l/L)^{1-k/2}.
This says that for a bunch of particle in a potential V = a x^k, if you let the system evolve over time T forming a path which has size L in some sense, then there is another path which is a re-scaled version of the original one s.t. if it has size l, then the time taken to form this new path is t.
We can use this trick to create a bunch of “scaling laws” for simple physical systems. For example:
1) Let V = a x ^{-1} i.e. a gravitational potential. Then we have k=-1, so
t/T = (l/L)^{3/2}
(t/T)^{2} ...
It'll one-shot easy cases, yeah. And if you want to convert to HTML/Unicode for places where you don't have direct LaTeX support, you can also have a LLM do that, albeit there area lot of edge-cases and I don't think LLMs will usually use more exotic Unicode like FRACTION SLASH for things like '3/2' etc, so I have a big script for that: https://gwern.net/static/build/latex2unicode.py (Github).
Do you know a person who believes that ASI will be created in <50 years who ISN'T in the LW/rationalists circle?
My parents don't believe that a superintelligent AI will be created within this century, or ever for that matter, or that AI will ever take jobs. My relatives laugh at the idea of AI solving a high school math problem and think state-of-the-art AI is on the level of GPT-2 (I mean that the capabilities they have in mind are on the level of GPT-2, not that they know what GPT-2 is). My friend who is an organic chemist laughs at the idea of AI doi...
I've been in the mostly-academic AI circles in the Boston area for decades. Lots of people in these circles think ASI is plausibly close. I think it's difficult to pay close technical attention to the field and not think that AI is currently par-human, and improving every year. Many of them disagree with LessWrong concensus that it will be fatal, of course. Or simply haven't thought it through.