If you know nothing about a thing, the first example or sample gives you a disproportionate amount of information, often more than any subsequent sample. It lets you locate the idea in conceptspace, get a sense of what domain/scale/magnitude you're dealing with, and provides an anchor for further thinking.
TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate here, or send me an email, DM signal message (+1 510 944 3235), or leave a comment, if you want to support what we do. We are a registered 501(c)3, have big plans for the next year, and due to a shifting funding landscape need support from a broader community more than in any previous year. [1]
I've been running LessWrong/Lightcone Infrastructure for the last 7 years. During that time we have grown into the primary infrastructure provider for the rationality and AI safety communities. "Infrastructure" is a big fuzzy word, but in our case, it concretely means:
I am working on it! What country would you want it for? Not all countries have charity tax-deductability, IIRC.
The ‘model organisms of misalignment’ line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he’s worked on at Anthropic under this agenda: “Sleeper Agents” and “Sycophancy to Subterfuge”.
Topics we discuss:
I'm at ~50-50 for large amounts of machine-translated being present in the dataset.
Having worked in Chinese academia myself, "use Google Translate on the dataset" just seems like something we're extremely likely to do. It's a hard-to-explain gut feeling. I'll try poking around in the tokenizer to see if "uncommon Chinese phrases that would only appear in machine-translated COT" are present as tokens. (I think this is unlikely to be true even if they did do it, however)
I've done a cursory internet search, and it seems that there aren't many native Chi...
TLDR: In this post, I argue that if you are a junior AI safety researcher, you should consider applying to PhD programs in ML soon, especially if you have recently participated in an AI safety upskilling or research program like MATS or ARENA and might be interested in working on AI safety long term, but don't have immediate career plans. It is relatively cheap to apply and provides good future option value. I don’t argue that you should necessarily do a PhD, but some other posts do. I also point out that starting a PhD does not lock you in if better opportunities arise. PhD application deadlines for Fall 2025 start are coming up soon; many application deadlines are December 15th, though some are as early as next week. For the uninitiated, I provide a...
I'll have to push back on this. I think if there's one specific program that you'd like to go to, especially if there's an advisor you have in mind, it's good to tailor your application to that program. However, this might not apply to the typical reader of this post.
I followed a k strategy with my PhD statements of purpose (and recommendations) rather than an r strategy. I tailored my applications to the specific schools, and it seemed to work pretty decently well. I know of more qualified people who were rejected from a much higher proportion of schools who spent much less time on each application.
(Disclaimer: this is all anecdotal. Also, I was applying for chemistry programs, not AI)
This is the second in a sequence of posts scrutinizing computational functionalism (CF). In my last post, I defined a concrete claim that computational functionalists tend to make:
Practical CF: A simulation of a human brain on a classical computer, capturing the dynamics of the brain on some coarse-grained level of abstraction, that can run on a computer small and light enough to fit on the surface of Earth, with the simulation running at the same speed as base reality[1], would cause the same conscious experience as that brain.
I contrasted this with “theoretical CF”, the claim that an arbitrarily high-fidelity simulation of a brain would be conscious. In this post, I’ll scrutinize the practical CF claim.
To evaluate practical CF, I’m going to meet functionalists where they (usually) stand...
Practical CF, more explicitly: A simulation of a human brain on a classical computer, capturing the dynamics of the brain on some coarse-grained level of abstraction, that can run on a computer small and light enough to fit on the surface of Earth, with the simulation running at the same speed as base reality, would cause the same conscious experience as that brain, in the specific sense of thinking literally the exact same sequence of thoughts in the exact same order, in perpetuity.
I... don't think this is necessarily what @EuanMcLean meant? At the risk o...
Suppose you’ve trained a really clever AI model, and you’re planning to deploy it in an agent scaffold that allows it to run code or take other actions. You’re worried that this model is scheming, and you’re worried that it might only need to take a small number of actions to get to a dangerous and hard-to-reverse situation like exfiltrating its own weights. Problems like these that the AI can cause with small numbers of actions are particularly hard to address with standard safety techniques, because by default you are unlikely to notice the problem until the hard-to-reverse outcome has already occurred.
One of the main classes of techniques that you could apply in the hope of preventing these catastrophic actions is what we call untrusted monitoring (introduced...
One aspect of untrusted monitoring which this post neglects, but which seems very important to me, is that we'll be uncertain about which model should be untrusted and the extent to which these models are untrusted.
This suggests that ensembling over untrusted monitors could be very promising (and much of the benefits of ensembling can be obtained without preventing collusion). Further, I expect that various types of ensembling make collusion harder even if all relevant models in the ensemble are scheming.
Ensembling untrusted monitors increases variance som...
Summary: Comfort with really big and really small quantities is very useful for understanding the world and is perfectly doable with practice. Therefore, you should (1) try to familiarize yourself with the sizes of various things using e.g. spaced repetition and (2) comment some of your favorite numerical facts about the world that give a sense of scale so others can learn from them. I have a list of links at the bottom that I'd love for you to add to, in a similar spirit to The Best Textbooks on Every Subject.
One of the greatest lies that science fiction and popular science told me was that there were features of the universe that are incomprehensibly small or big. Sure, femtoseconds, angstroms, parsecs and electron-volts are foreign to...
I've been wanting to write a nice post for a few months, but should probably just write one sooner instead. This is a top-level post not because it's a long post, but because it's an important one.
Anyways. Cryonics is pretty much money-free now (ie. subsidized technically)—one of the most affordable ways to dispose of your body post-mortem.
Don't die because you didn't double-check whether the argument you came up against cryonics in 5 seconds checks out.
In the west coast in the USA, from Oregon Brain Preservation, as of around May 2024 I think:
...Our research program is open to individuals in Washington, Oregon, and Northern California. This is the same brain preservation procedure, with the goal of future revival if this ever becomes feasible and humane. The difference is
Oh wow! Damn ☹️ Well, I'm super grateful for the time it was active. If you don't mind sending me a copy of your exchanges, I'd be interested.