AI safety field-building in Australia should accelerate. My rationale:
Ramana Kumar!
After using Claude Code for a while, I can't help but conclude that today's frontier LLMs mostly meet the bar for what I'd consider AGI - with the exception of two things, that, I think, explain most of their shortcomings:
Most frontier models are marketed as multimodal, but this is often limited to text + some way to encode images. And while LLM vision is OK for many practical purposes, it's far from perfect, and even if they had perfect sight, being limited to singular images is still a huge limitation[1...
I find it very annoying when people give dismissals of technology trendlines because they don't have any credence in straight lines on a graph. Often people will post a meme like the following, or something even dumber.
I feel like it's really obvious why the two situations are dissimilar, but just to spell it out: the growth rate of human children is something we have overwhelming evidence for. Like literally we have something like 10 billion to 100 billion data points of extremely analogous situations against the exponential model.
And this isn't eve...
I am not familiar with these debates, but I have a feeling that you're arguing against a strawman here.
I've started to watch the YouTube channel Clean That Up. It started with me pragmatically searching for how to clean something up in my apartment that needed cleaning, but then I went down a bit of a rabbit hole and watched a bunch of his videos. Now they appear in my feed and I watch them periodically. To my surprise, I actually quite enjoy them.
It's made me realize how much skill it takes to clean. Nothing in the ballpark of requiring a PhD, but I dunno, it's not trivial. Different situations call for different tools, techniques and cleaning materials. B...
The chaos of the transition to machine intelligence is dangerous.
The post-singularity regime is probably very safe because machines will be able to build much better governance than humans have managed, and once they are fully in control they have a game theoretic incentive to keep humans around in permanent utopian retirement because it bolsters the strength of their own property rights.
But this transition is scary.
Someone really needs to build a "root OS of the universe" and get it installed before the transition. The question is just how to design it and brand it.
Owning shares in most modern companies won't be useful in sufficiently distant future, and might prove insufficient to pay for survival. Even that could be eaten away by dilution, over astronomical time. The reachable universe is not a growing pie, ability to reinvest into relevant entities won't necessarily be open.
Isn't inference memory bound on kv cache? If that's the case then I think "smaller batch size" is probably sufficient to explain the faster inference, and the cost per token to Anthropic of 80TPS or 200TPS is not particularly large. But users are willing to pay much more for 200TPS (Anthropic hypothesizes).
I don't think so – my CLAUDE.md is fairly short (23 lines of text) and consists mostly of code style comments. I also have one skill for set up for using Julia via a REPL. But I don't think either of these would result in more disagreement/correction.
I've used Claude Code in mostly the same way since 4.0, usually either iteratively making detailed plans and then asking it to check off todos one at a time, or saying "here's a big, here's how to reproduce it, figure out what's going on."
I also tend to write/speak with a lot of hedging, so that might make Claude more likely to assume my instructions are wrong.
No but it sure brought a lot of people to the bay area who are not anything like the people I want to hang out with
Maybe the most important test for a political or economic system is whether it self-destructs. This is in contrast to whether it produces good intermediate outcomes. In particular, if free-market capitalism leads to an uncontrolled intelligence explosion, then it doesn’t matter if it produced better living standards than alternative systems for ~200 years – it still failed at the most important test.
A couple of other ways to put it:
Under this view, p...
I am not talking about ideological uniformity between two countries. I am talking about events inside 1 country. As I understand, it the core of socialism economics is that government decides where resources of the country go (when in capitalism there are companies who then only pay taxes). Companies can have races with each other. With central planing it’s ~impossible. The problem of international conflicts is more of an another topic.
As of now, for example, neglect of AI safety comes (in a big part) from races between USA companies. (With some exception of china, which is arguably still years behind and doesn’t have enough compute)
For machine learning, it is desirable for the trained model to have absolutely no random information left over from the initialization; in this short post, I will mathematically prove an interesting (to me) but simple consequence of this desirable behavior.
This post is a result of some research that I am doing for machine learning algorithms related to my investigation of cryptographic functions for the cryptocurrency that I launched (to discuss crypto, leave me a personal message so we can discuss this off this site).
This post shall be about linear ...
Experimental result (pseudodeterminism): Computer experiments show that the function typically has only one local maximum in the sense that we cannot find any other local maximum.
a lot hinges on this. i would be interested to learn about the experimental setup.
The concept of "schemers" seems to be gradually becoming increasingly load-bearing in the AI safety community. However, I don't think it's ever been particularly well-defined, and I suspect that taking this concept for granted is inhibiting our ability to think clearly about what's actually going on inside AIs (in a similar way to e.g. how the badly-defined concept of alignment faking obscured the interesting empirical results from the alignment faking paper).
In my mind, the spectrum from "almost entirely honest, but occasionally flinching away from aspect...
Another issue is that these definitions typically don't distinguish between models that would explicitly think about how to fool humans on most inputs vs. on a small percentage of inputs vs. such a tiny fraction of possible inputs that it doesn't matter in practice.
Why does anime often feature giant, perfectly spherical sci-fi explosions?? Eg, consider this explosion from the movie "Akira", pretty typical of the genre:
These seem inspired by nuclear weapons, often they are literally the result of nuclear weapons according to the plot (although in many cases they are some kind of magical / etc energy). But obviously nuclear weapons cause mushroom clouds, right?? If no real explosion looks like this, where did the artistic convention come from?
What's going on? Surely they are not thinking of the ...
Beirut explosion looked pretty spherical tbh.
https://www.reddit.com/r/gifs/comments/i3lzno/huge_explosion_in_beirut_happened_30_min_ago/
especially this one: https://www.reddit.com/r/gifs/comments/y0mvw2/beirut_shockwave/
https://www.reddit.com/r/gifs/comments/i41aj4/beirut_explosion_7_angles_at_once/
Claude Opus 4.6 came out, and according to the Apollo external testing, evaluation awareness was so strong that they mentioned it as a reason of them being unable to properly evaluate model alignment.
Quote from the system card:
Apollo Research was given access to an early checkpoint of Claude Opus 4.6 on January 24th and an additional checkpoint on January 26th. During preliminary testing, Apollo did not find any instances of egregious misalignment, but observed high levels of verbalized evaluation awareness. Therefore, Apollo did not believe that much evidence about the model’s alignment or misalignment could be gained without substantial further experiments.
It confused me that Opus 4.6's System Card claimed less verbalized evaluation awareness versus 4.5:
On our verbalized evaluation awareness metric, which we take as an indicator of potential risks to the soundness of the evaluation, we saw improvement relative to Opus 4.5.
but I never heard about Opus 4.5 being too evaluation aware to evaluate. It looks like Apollo simply wasn't part of Opus 4.5's alignment evaluation (4.5's System Card doesn't mention them).
This probably seems unfair/unfortunate from Anthropic's perspective, i.e., they believe their mode...
Gemini 3.0 Pro is mostly excellent for code review, but sometimes misses REALLY obvious bugs. For example, missing that a getter function doesn't return anything, despite accurately reporting a typo in that same function.
This is odd considering how good it is at catching edge cases, version incompatibility errors based on previous conversations, and so on.
I meant bug reports that were due to typos in the code, compared to just typos in general.
GoodFire has recently received negative Twitter attention for the non-disparagement agreements their employees signed (examples: 1, 2, 3). This echoes previous controversy at Anthropic.
Although I do not have a strong understanding of the issues at play, having these agreements generally seems bad and at the very least, organizations should be transparent about what agreements they have employees sign.
Other AI safety orgs should publicly state if they have these agreements and not wait until they are pressured to comment on them. I would also find it helpfu...
I have not (to my knowledge and memory) signed a non-disparagement agreement with Palisade or with Survival and Flourishing Corp (the organization that runs SFF).
In a new interview, Elon Musk clearly says he expects AIs can't stay under control. At 37:45:
Humans will be a very tiny percentage of all intelligence in the future if current trends continue. As long as this intelligence, ideally which also includes human intelligence and consciousness, is propagated into the future, that's a good thing. So I want to take the set of actions that maximize the probable lightcone of consciousness and intelligence.
...I'm very pro-human, so I want to make sure we take a set of actions that ensure that humans are along for th
I had some things to say after that interview, he said some highly concerning things, but I ended up not commenting on this particular thing because it's probably mostly a semantic disagreement about what counts as a human or an AI.
When a human chooses to augment themselves to the point of being entirely artificial, I believe he'd count that as an AI. He's kind of obsessed with humans merging with AI in a way that suggests he doesn't really see that as just being what humans now are after alignment.
No, it seems highly unlikely. Considered from a purely commercial perspective - which I think is the right one when considering the incentives - they are terrible customers! Consider:
That is good news! Though to be clear, I expect the default path by which they would become your customers, after some initial period of using your products or having some partnership with them, would be via acquisition, which I think avoids most of the issues that you are talking about here (in general "building an ML business with the plan of being acquired by a frontier com...