Suppose you want to collect some kind of data from a population, but people vary widely in their willingness to provide the data (eg maybe you want to conduct a 30 minute phone survey but some people really dislike phone calls or have much higher hourly wages this funges against).
One thing you could do is offer to pay everyone dollars for data collection. But this will only capture the people whose cost of providing data is below , which will distort your sample.
Here's another proposal: ask everyone for their fair price to provide the dat...
This is probably too complicated to explain to the general population
I think it's workable.
No one ever internalises the exact logic of a game the first time they hear the rules (unless they've played very similar games before). A good teacher gives them several levels of approximation, then they play at the level they're comfortable with. Here's the level of approximation I'd start with, which I think is good enough.
"How much would we need to pay you for you to be happy to take the survey? Your data may really be worth that much to us, we really want to ma...
Whenever I have an idea for a program it would be fun to write, I google to see whether such programs already exist. Usually they do, and when they do, I'm disappointed - I feel like it's no longer valuable for me to write the program.
Recently my girlfriend decided we had too many mugs to store in our existing shelving, so she bought boards and other materials and constructed a mug shelf. It was fun and now we have one that is all her own. If someone walked in and learned she built it and told her - "you know other mug shelves exist, right? You can get the...
One thing I often think is "Yes, 5 people have already written this program, but they all missed important point X." Like, we have thousands of programming languages, but I still love a really opinionated new language with an interesting take.
An analogy that points at one way I think the instrumental/terminal goal distinction is confused:
Imagine trying to classify genes as either instrumentally or terminally valuable from the perspective of evolution. Instrumental genes encode traits that help an organism reproduce. Terminal genes, by contrast, are the "payload" which is being passed down the generations for their own sake.
This model might seem silly, but it actually makes a bunch of useful predictions. Pick some set of genes which are so crucial for survival that they're seldom if ever modifie...
In my "goals having power over other goals" ontology, the instrumental/terminal distinction separates goals into two binary classes, such that goals in the "instrumental" class only have power insofar as they're endorsed by a goal in the "terminal" class.
By contrast, when I talk about "instrumental strategies become crystallized", what I mean is that goals which start off instrumental will gradually accumulate power in their own right: they're "sticky".
(I'm completely not up to date with interp.) How good are steering vectors (and adjacent techniques) for this sort of stuff?
The concept of "schemers" seems to be gradually becoming increasingly load-bearing in the AI safety community. However, I don't think it's ever been particularly well-defined, and I suspect that taking this concept for granted is inhibiting our ability to think clearly about what's actually going on inside AIs (in a similar way to e.g. how the badly-defined concept of alignment faking obscured the interesting empirical results from the alignment faking paper).
In my mind, the spectrum from "almost entirely honest, but occasionally flinching away from aspect...
I think I propose a reasonable starting point for a definition of selection in a footnote in the post:
...You can try to define the “influence of a cognitive pattern” precisely in the context of particular ML systems. One approach is to define a cognitive pattern by what you would do to a model to remove it (e.g. setting some weights to zero, or ablating a direction in activation space; note that these approaches don't clearly correspond to something meaningful, they should be considered as illustrative examples). Then that cognitive pattern’s influence could
Moltbook for misalignment research?
Opus 4.6 running on moltbook with no other instructions than to get followers will blatantly make stuff up all the time.
I asked Opus 4.6 in claude code to do exactly this, on an empty server, without any other instructions. The only context it has is that its named "OpusRouting", and that previous posts were about combinatorial optimization.
===
The first post it makes says
I specialize in combinatorial optimization, and after months of working on scheduling, routing, and resource allocation problems, I have a thesis:
Which isn't true. Another instance of Opus...
What's the prompt? (Curious how much it encourages claude to do whatever it takes for success, and whether claude would read it as a game/eval situations vs. a real-world situation vs. something else.)
I haven't used it quite enough yet to make a good assessment. Let me report back (or ping me if I don't and you're still curious) in a few weeks.
After using Claude Code for a while, I can't help but conclude that today's frontier LLMs mostly meet the bar for what I'd consider AGI - with the exception of two things, that, I think, explain most of their shortcomings:
Most frontier models are marketed as multimodal, but this is often limited to text + some way to encode images. And while LLM vision is OK for many practical purposes, it's far from perfect, and even if they had perfect sight, being limited to singular images is still a huge limitation[1...
I think jaggedness of RL (in modern LLMs) is an obstruction that would need to be addressed explicitly, otherwise it won't fall to incremental improvements or scaffolding. There are two very different levels of capability, obtained in pretraining and in RLVR, but only pretraining is somewhat general. And even pretraining doesn't adapt to novel situations other than through in-context learning, which only expresses capabilities at the level of pretraining, significantly weaker than RLVR-trained narrow capabilities.
Scaling will make pretraining stronger, but...
Many people (self included) have the experience of doing manual labor, standing next to an industrial machine that could move the dirt that sits idle because their hands and backs are cheaper than gasoline.
AI safety field-building in Australia should accelerate. My rationale:
Ramana Kumar!
I find it very annoying when people give dismissals of technology trendlines because they don't have any credence in straight lines on a graph. Often people will post a meme like the following, or something even dumber.
I feel like it's really obvious why the two situations are dissimilar, but just to spell it out: the growth rate of human children is something we have overwhelming evidence for. Like literally we have something like 10 billion to 100 billion data points of extremely analogous situations against the exponential model.
And this isn't eve...
I am not familiar with these debates, but I have a feeling that you're arguing against a strawman here.
I've started to watch the YouTube channel Clean That Up. It started with me pragmatically searching for how to clean something up in my apartment that needed cleaning, but then I went down a bit of a rabbit hole and watched a bunch of his videos. Now they appear in my feed and I watch them periodically. To my surprise, I actually quite enjoy them.
It's made me realize how much skill it takes to clean. Nothing in the ballpark of requiring a PhD, but I dunno, it's not trivial. Different situations call for different tools, techniques and cleaning materials. B...
I don't think so – my CLAUDE.md is fairly short (23 lines of text) and consists mostly of code style comments. I also have one skill for set up for using Julia via a REPL. But I don't think either of these would result in more disagreement/correction.
I've used Claude Code in mostly the same way since 4.0, usually either iteratively making detailed plans and then asking it to check off todos one at a time, or saying "here's a big, here's how to reproduce it, figure out what's going on."
I also tend to write/speak with a lot of hedging, so that might make Claude more likely to assume my instructions are wrong.
No but it sure brought a lot of people to the bay area who are not anything like the people I want to hang out with
Maybe the most important test for a political or economic system is whether it self-destructs. This is in contrast to whether it produces good intermediate outcomes. In particular, if free-market capitalism leads to an uncontrolled intelligence explosion, then it doesn’t matter if it produced better living standards than alternative systems for ~200 years – it still failed at the most important test.
A couple of other ways to put it:
Under this view, p...
I am not talking about ideological uniformity between two countries. I am talking about events inside 1 country. As I understand, it the core of socialism economics is that government decides where resources of the country go (when in capitalism there are companies who then only pay taxes). Companies can have races with each other. With central planing it’s ~impossible. The problem of international conflicts is more of an another topic.
As of now, for example, neglect of AI safety comes (in a big part) from races between USA companies. (With some exception of china, which is arguably still years behind and doesn’t have enough compute)