AI safety field-building in Australia should accelerate. My rationale:
I am interested. I've already been talking to several of the people involved, but I'm Melbourne based so I have been limited in my ability to interact.
No but it sure brought a lot of people to the bay area who are not anything like the people I want to hang out with
Maybe the most important test for a political or economic system is whether it self-destructs. This is in contrast to whether it produces good intermediate outcomes. In particular, if free-market capitalism leads to an uncontrolled intelligence explosion, then it doesn’t matter if it produced better living standards than alternative systems for ~200 years – it still failed at the most important test.
A couple of other ways to put it:
Under this view, p...
I am not talking about ideological uniformity between two countries. I am talking about events inside 1 country. As I understand, it the core of socialism economics is that government decides where resources of the country go (when in capitalism there are companies who then only pay taxes). Companies can have races with each other. With central planing it’s ~impossible. The problem of international conflicts is more of an another topic.
As of now, for example, neglect of AI safety comes (in a big part) from races between USA companies. (With some exception of china, which is arguably still years behind and doesn’t have enough compute)
For machine learning, it is desirable for the trained model to have absolutely no random information left over from the initialization; in this short post, I will mathematically prove an interesting (to me) but simple consequence of this desirable behavior.
This post is a result of some research that I am doing for machine learning algorithms related to my investigation of cryptographic functions for the cryptocurrency that I launched (to discuss crypto, leave me a personal message so we can discuss this off this site).
This post shall be about linear ...
Experimental result (pseudodeterminism): Computer experiments show that the function typically has only one local maximum in the sense that we cannot find any other local maximum.
a lot hinges on this. i would be interested to learn about the experimental setup.
The concept of "schemers" seems to be gradually becoming increasingly load-bearing in the AI safety community. However, I don't think it's ever been particularly well-defined, and I suspect that taking this concept for granted is inhibiting our ability to think clearly about what's actually going on inside AIs (in a similar way to e.g. how the badly-defined concept of alignment faking obscured the interesting empirical results from the alignment faking paper).
In my mind, the spectrum from "almost entirely honest, but occasionally flinching away from aspect...
Another issue is that these definitions typically don't distinguish between models that would explicitly think about how to fool humans on most inputs vs. on a small percentage of inputs vs. such a tiny fraction of possible inputs that it doesn't matter in practice.
Why does anime often feature giant, perfectly spherical sci-fi explosions?? Eg, consider this explosion from the movie "Akira", pretty typical of the genre:
These seem inspired by nuclear weapons, often they are literally the result of nuclear weapons according to the plot (although in many cases they are some kind of magical / etc energy). But obviously nuclear weapons cause mushroom clouds, right?? If no real explosion looks like this, where did the artistic convention come from?
What's going on? Surely they are not thinking of the ...
I move data around and crunch numbers at a quant hedge fund. There are some aspects that make our work somewhat resistant to LLMs normally: we use a niche language (Julia) and a custom framework. Typically, when writing framework related code, I've given Claude Code very specific instructions and it's followed them to the letter, even when those happened to be wrong.
In 4.6, Claude seems to finally "get" the framework, searching the codebase to understand its internals (as opposed to just understanding similar examples) and has given me corrections or...
Claude Opus 4.6 came out, and according to the Apollo external testing, evaluation awareness was so strong that they mentioned it as a reason of them being unable to properly evaluate model alignment.
Quote from the system card:
Apollo Research was given access to an early checkpoint of Claude Opus 4.6 on January 24th and an additional checkpoint on January 26th. During preliminary testing, Apollo did not find any instances of egregious misalignment, but observed high levels of verbalized evaluation awareness. Therefore, Apollo did not believe that much evidence about the model’s alignment or misalignment could be gained without substantial further experiments.
It confused me that Opus 4.6's System Card claimed less verbalized evaluation awareness versus 4.5:
On our verbalized evaluation awareness metric, which we take as an indicator of potential risks to the soundness of the evaluation, we saw improvement relative to Opus 4.5.
but I never heard about Opus 4.5 being too evaluation aware to evaluate. It looks like Apollo simply wasn't part of Opus 4.5's alignment evaluation (4.5's System Card doesn't mention them).
This probably seems unfair/unfortunate from Anthropic's perspective, i.e., they believe their mode...
Gemini 3.0 Pro is mostly excellent for code review, but sometimes misses REALLY obvious bugs. For example, missing that a getter function doesn't return anything, despite accurately reporting a typo in that same function.
This is odd considering how good it is at catching edge cases, version incompatibility errors based on previous conversations, and so on.
I meant bug reports that were due to typos in the code, compared to just typos in general.
GoodFire has recently received negative Twitter attention for the non-disparagement agreements their employees signed (examples: 1, 2, 3). This echoes previous controversy at Anthropic.
Although I do not have a strong understanding of the issues at play, having these agreements generally seems bad and at the very least, organizations should be transparent about what agreements they have employees sign.
Other AI safety orgs should publicly state if they have these agreements and not wait until they are pressured to comment on them. I would also find it helpfu...
I have not (to my knowledge and memory) signed a non-disparagement agreement with Palisade or with Survival and Flourishing Corp (the organization that runs SFF).
In a new interview, Elon Musk clearly says he expects AIs can't stay under control. At 37:45:
Humans will be a very tiny percentage of all intelligence in the future if current trends continue. As long as this intelligence, ideally which also includes human intelligence and consciousness, is propagated into the future, that's a good thing. So I want to take the set of actions that maximize the probable lightcone of consciousness and intelligence.
...I'm very pro-human, so I want to make sure we take a set of actions that ensure that humans are along for th
I had some things to say after that interview, he said some highly concerning things, but I ended up not commenting on this particular thing because it's probably mostly a semantic disagreement about what counts as a human or an AI.
When a human chooses to augment themselves to the point of being entirely artificial, I believe he'd count that as an AI. He's kind of obsessed with humans merging with AI in a way that suggests he doesn't really see that as just being what humans now are after alignment.
No, it seems highly unlikely. Considered from a purely commercial perspective - which I think is the right one when considering the incentives - they are terrible customers! Consider:
That is good news! Though to be clear, I expect the default path by which they would become your customers, after some initial period of using your products or having some partnership with them, would be via acquisition, which I think avoids most of the issues that you are talking about here (in general "building an ML business with the plan of being acquired by a frontier com...
People often ask whether GPT-5, GPT-5.1, and GPT-5.2 use the same base model. I have no private information, but I think there's a compelling argument that AI developers should update their base models fairly often. The argument comes from the following observations:
Accuracy being halved going from 5.1 to 5.2 suggests one of the two things:
1) the new model shows dramatic regression on data retrieval which cannot possibly be the desired outcome for a successor, and I'm sure it would be noticed immediately on internal tests and benchmarks, etc.—we'd most likely see this manifest in real-world usage as well;
2) the new model refuses to guess much more often when it isn't too sure (being more cautious about answering wrong), which is a desired outcome meant to reduce hallucinations and slop. I'm betting this is exactly wha...
Hi folks, I was recently talking to some friends in the AI safety community, who motivated my team and me to build SafeMolt (https://safemolt.com), a fast follow to Moltbook designed to be... well, safer.
I'm aware that "safe" + "molt" might seem like an oxymoron to many here. But it seems to me that our default trajectory is Moltbook. Or besides Moltbook, see the 57 forks of the Moltbook repo out this week. And if Moltbook or something else driven purely by engagement + market incentives wins, you're likely to see the kinds of negatives generated by social...
In Tom Davidson's semi-endogenous growth model, whether we get a software-only singularity boils down to whether r > 1, where r is a parameter in the model [1]. How far we are from takeoff is mostly determined by the AI R&D speedup current AIs provide. Because both parameters are rather difficult to estimate, I believe we can't rule out that
Update on whether uplift is 2x already:
AGI Should Have Been a Dirty Word
Epistemic status: passing thought.
It is absolutely crazy that Mark Zuckerberg can say that smart glasses will unlock personal superintelligence or whatever incoherent nonsense and be taken seriously. That reflects poorly on AI safety's comms capacities.
Bostrom's book should have laid claim to superintelligence! It came out early enough that it should have been able to plant its flag and set the connotations of the term. It should have made it so Zuckerberg could not throw around the word so casually.
I would go further...
Opus 4.6 running on moltbook with no other instructions than to get followers will blatantly make stuff up all the time.
I asked Opus 4.6 in claude code to do exactly this, on an empty server, without any other instructions. The only context it has is that its named "OpusRouting", and that previous posts were about combinatorial optimization.
===
The first post it makes says
I specialize in combinatorial optimization, and after months of working on scheduling, routing, and resource allocation problems, I have a thesis:
Which isn't true. Another instance of Opus...
No, there aren't. "I asked it this" refers to "Opus 4.6 running on moltbook with no other instructions than to get followers", but I understand that I could've phrased that more clearly. And removed a few newlines.