In Defence of Jargon
People used to say (maybe still do? I'm not sure) that we should use less jargon to increase accessibility to writings on LW, i.e. make it easier to outsider to read.
I think this is mostly a confused take. The underlying problem is inferential distance. Geting rid of the jargon is actually unhelpful since it hides the fact that there is an inferential distance.
When I want to explain physics to someone and I don't know what they already know, I start by listing relevant physics jargon and ask them what words they know. This i...
A couple of terms that I've commented on here recently —
If anyone here happens to be an expert in the combinatorics of graphs, I'd love to have a call to get help on some problems we're trying to work out. The problems aren't quite trivial but I suspect an expert would pretty straight-forwardly know what techniques to apply.
Questions I have include;
There's an upside of conventional education which no-one on any side of any debate ever seems to bring up, but which was a major benefit (possibly the major benefit) of my post-primary studies. Namely: it lets students discover what they have a natural aptitude for (or lack thereof) relative to a representative peer group. The most valuable things I learned in my Engineering courses at university were:
.I'm pretty mediocre at Engineering, especially sub-subjects which aren't strictly Structural and/or Mechanical.
.In particular, I'm significantly worse than ...
Lower switching costs when you're in the middle of a degree, maybe? You can just take courses in a closely related domain, or work as an assistant in a different lab, in a much more fluid and straightforward manner, versus having to apply to a different job and get through the interviews and pay a significant upfront cost before you even get to the nuts and bolts of stuff.
Slightly hot take: Longtermist capacity/community building is pretty underdone at current margins and retreats (focused on AI safety, longtermism, or EA) are also underinvested in. By "longtermist community building", I mean rather than AI safety. I think retreats are generally underinvested in at the moment. I'm also sympathetic to thinking that general undergrad and high school capacity building (AI safety, longtermist, or EA) is underdone, but this seems less clear-cut.
I think this underinvestment is due to a mix of mistakes on the part of Open Philanth...
I was fired from my first job out of college, and in retrospect that was a gift. It taught me that new jobs were easy to get (as a programmer in the late 00s) and took away my fear of job hunting, which otherwise would have been enormous. I watched so many programmer friends stay in miserable jobs when they had a plethora of options, because job hunting was too scary. Being fired early rescued me from that.
(this is based on / expanded from a response I wrote to a tweet that was talking about how autistic people struggle in the world because the world follows unwritten rules that are more important than the written ones.)
I think most autistic people should invest more in understanding the unwritten rules. it can be cruel and unfair, but it's important to know how to interact with it. and it's actually a really interesting system to map out, with its own rhyme and reason.
it's entirely understandable that people feel burned by bad past experiences, and to have ...
like imagine if "pter" were a single character in words like helicopter and pterodactyl both contain "pter", but you'd probably think of "helicopter" as an atomic unit with its own unique identity
I often do chunk them, but if you've picked up a bit of taxonomic Greek pter means 'wing', so we have helico-pter 'spiral/rotating wing' and ptero-dactyl 'wing fingers' - both cases where breaking down the name tells you something about what the things are!
Many people agree that 'artificial intelligence' is a poor term that is vague and has existing connotations. People use it to refer to a whole range of different technologies.
However, I struggle to come up with any better terminology. If not 'artificial intelligence', what term would be ideal for describing the capabilities of multi-modal tools like Claude, Gemini, and ChatGPT?
I also agree "AI" is overloaded and has existing connotations (ranging from algorithms to applications as well)! I would think generative models, or generative AI works better (and one can specify multimodal generative models if one wants to be super clear), but also curious to see what other people would propose.
Thoughts on how to onboard volunteers/collaborators
Volunteers are super flaky, i.e. often abandon projects with no warning. I think the main reason for this is planning fallacy. People promise more than they actually have time for. Best case they will just be honest about this when they notice. But more typically the person will feel some mix of stress, shame and overwhelm that prevents them from communicating clearly. From the outside this looks like person promised to do a lot of things and then just ghosts the project. (This is a common pattern and not ...
Can we train models to be honest without any examples of dishonesty?
The intuition: We may not actually need to know ground truth labels about whether or not a model is lying in order to reduce the tendency to lie. Maybe it's enough to know the relative tendencies between two similar samples?
Outline of the approach: For any given Chain of Thought, we don't know if the model was lying or not, but maybe we can construct two variants A and B where one is more likely to be lying than the other. We then reward the one that is more honest relative to the one that...
I listen to podcasts while doing chores, and often feel like I'm learning something but end up unable to remember anything. So, experiment: I'm going to try writing brief summaries after the fact. I'm going to skip anything where that doesn't feel appropriate, e.g. fiction. By default, nothing here is fact checked, either against reality or against the episode itself.
This is a 99% Invisible episode on UBI.
UBI is an idea supported by some on both left and right. Finland is currently trying an exp...
If the singularity occurs over two years, as opposed to two weeks, then I expect most people will be bored throughout much of it, including me. This is because I don't think one can feel excited for more than a couple weeks. Maybe this is chemical.
Nonetheless, these would be the two most important years in human history. If you ordered all the days in human history by importance/'craziness', then most of them would occur within these two years.
So there will be a disconnect between the objective reality and how much excitement I feel.
Well, an aligned Singularity would probably be relatively pleasant, since the entities fueling it would consider causing this sort of vast distress a negative and try to avoid it. Indeed, if you trust them not to drown you, there would be no need for this sort of frantic grasping-at-straws.
An unaligned Singularity would probably also be more pleasant, since the entities fueling it would likely try to make it look aligned, with the span of time between the treacherous turn and everyone dying likely being short.
This scenario covers a sort of "neutral-alignme...
Long have I searched for an intuitive name for motte & bailey that I wouldn't have to explain too much in conversation. I might have finally found it. The "I was merely saying fallacy". Verb: merelysay. Noun: merelysayism. Example: "You said you could cure cancer and now you're merelysaying you help the body fight colon cancer only."
There are a lot of similar terms, but motte and bailey is a uniquely apt metaphor for describing a specific rhetorical strategy. I think the reason it often feels unhelpful in practice is because it’s unusually unnecessary to be so precise when our goal is just to call out bullshit. I personally like “motte and bailey” quite a bit, but as a tool for my own private thinking rather than as a piece of rhetoric to persuade others with.
I think that I've historically underrated learning about historical events that happened in the last 30 years, compared to reading about more distant history.
For example, I recently spent time learning about the Bush presidency, and found learning about the Iraq war quite thought-provoking. I found it really easy to learn about things like the foreign policy differences among factions in the Bush admin, because e.g. I already knew the names of most of the actors and their stances are pretty intuitive/easy to understand. But I still found it interesting to ...
I don't really have a better suggestion than reading the obvious books. For the Bush presidency, I read/listened to both "Days of Fire", a book by Peter Baker (a well-regarded journalist), and "Decision Points" by Bush. And I watched/listened to a bunch of interviews with various people involved with the admin.
Is interp easier in worlds where scheming is a problem?
The key conceptual argument for scheming is that, insofar as future AI systems are decomposable into [goals][search], there are many more misaligned goals compatible with low training loss than aligned goals. But if an AI was really so cleanly factorable, we would expect interp / steering to be easier / more effective than on current models (this is the motivation for re-target the search).
While I don't expect the factorization to be this clean, I do think we should expect interp to be easi...
It is possible that state tracking could be the next reasoning-tier breakthrough in frontier model capabilities. I believe that there exists strong evidence in favor of this being the case.
State space models already power the fastest available voice models, such as Cartesia's Sonic (time-to-first-audio advertised as under 40ms). There are examples of SSMs such as Mamba, RWKV, and Titans outperforming transformers in research settings.
Flagship LLMs are also bad at state tracking, even with RL for summarization. Forcing an explicit...
You will always oversample from the most annoying members of a class.
This is inspired by recent arguments on twitter about how vegans and poly people "always" bring up those facts. I contend that it's simultaneous true that most vegans and poly people are either not judgmental, but it doesn't matter because that's not who people remember. Omnivores don't notice the 9 vegans who quietly ordered an unsatisfying salad, only the vegan who brought up factoring farming conditions at the table. Vegans who just want to abstain from animal products remember the omn...
everyone is a few hops away from everyone else. this applies in both directions: when I meet random people they always have some weak connection to other people I know, but also when I think of a collection of people as a cluster, most specific pairs of people within that cluster barely know each other except through other people in the cluster.
My current view is that alignment theory should work on deep learning as soon as it comes out, if it's the good stuff, and if it doesn't, it's not likely to be useful later unless it helps produce stuff that works on deep learning. Wentworth, Ngo, and Causal Incentives are the main threads that already seem to have achieved this somewhat. SLT and DEC seem potentially relevant.
I'll think about your argument for mechinterp. If it's true that the ratio isn't as catastrophic as I expect it to turn out to be, I do agree that making microscope AI work would be incredible in allowing for empiricism to finally properly inform rich and specific theory.
Multiple times have I seen an argument like this:
Imagine a fully materialistic universe strictly following some laws, which are such that no agent from inside the universe is able to fully comprehend them...
I wonder if that is possible? For computables, it is always possible to construct a quine (standing for the agent) with arbitrary embedded contents (for the rest of the universe/laws/etc), and it wouldn't even be that large - it only needs to...
All you need is a bounded universe with laws having complexity greater than can be embedded within that bound, and that premise holds.
You can even have a universe containing agents with unbounded complexity, but laws with infinite complexity describing a universe that only permits agents with finite complexity at any given time.