Well that was timely
Amazon recently bought a 960MW nuclear-powered datacenter.
I think this doesn't contradict your claim that "The largest seems to consume 150 MW" because the 960MW datacenter hasn't been built (or there is already a datacenter there but it doesn't consume that much energy for now)?
Domain: Mathematics
Link: vEnhance
Person: Evan Chen
Background: math PhD student, math olympiad coach
Why: Livestreams himself thinking about olympiad problems
Domain: Mathematics
Link: Thinking about math problems in real time
Person: Tim Gowers
Background: Fields medallist
Why: Livestreams himself thinking about math problems
From the Rough Notes section of Ajeya's shared scenario:
Meta and Microsoft ordered 150K GPUs each, big H100 backlog. According to Lennart's BOTECs, 50,000 H100s would train a model the size of Gemini in around a month (assuming 50% utilization)
Just to check my understanding, here's my BOTEC of the number of FLOPs for 50k H100s during a month: 5e4 H100s * 1e15 bf16 FLOPs/second * 0.5 utilization * (3600 * 24 * 30) seconds/month = 6.48e25 FLOPs.
This is indeed close enough to Epoch's median estimate of 7.7e25 FLOPs for Gemini Ultra 1.0 (this doc cites a...
Thanks, I think this is a useful post, I also use these heuristics.
I recommend Andrew Gelman’s blog as a source of other heuristics. For example, the Piranha problem and some of the entries in his handy statistical lexicon.
Mostly I care about this because if there's a small number of instances that are trying to take over, but a lot of equally powerful instances that are trying to help you, this makes a big difference. My best guess is that we'll be in roughly this situation for "near-human-level" systems.
I don't think I've seen any research about cross-instance similarity
I think mode-collapse (update) is sort of an example.
...How would you say humanity does on this distinction? When we talk about planning and goals, how often are we talking about "all humans", vs "repres
I'm not even sure what it would mean for a non-instantiated model without input to do anything.
For goal-directedness, I'd interpret it as "all instances are goal-directed and share the same goal".
As an example, I wish Without specific countermeasures had made the distinction more explicit.
More generally, when discussing whether a model is scheming, I think it's useful to keep in mind worlds where some instances of the model scheme while others don't.
When talking about AI risk from LLM-like models, when using the word "AI" please make it clear whether you are referring to:
For example, there's a big difference between claiming that a model is goal-directed and claiming that a particular instance of a model given a prompt is goal-directed.
I think this distinction is obvious and important but too rarely made explicit.
Here are the Latest Posts I see on my front page and how I feel about them (if I read them, what I remember, liked or disliked, if I didn't read them, my expectations and prejudices)
According to SemiAnalysis in July:
OpenAI regularly hits a batch size of 4k+ on their inference clusters, which means even with optimal load balancing between experts, the experts only have batch sizes of ~500. This requires very large amounts of usage to achieve.
...Our understanding is that OpenAI runs inference on a cluster of 128 GPUs. They have multiple of these clusters in multiple datacenters and geographies. The inference is done at 8-way tensor parallelism and 16-way pipeline parallelism. Each node of 8 GPUs has only ~130B parameters, or less tha
I'm grateful for this post: it gives simple concrete advice that I intend to follow, and that I hadn't thought of. Thanks.
For onlookers, I strongly recommend Gabriel Peyré and Marco Cuturi's online book Computational Optimal Transport. I also think this is a case where considering discrete distributions helps build intuition.
As previously discussed a couple times on this website
For context, Daniel wrote Is this a good way to bet on short timelines? (which I didn't know about when writing this comment) 3 years ago.
HT Alex Lawsen for the link.
@Daniel Kokotajlo what odds would you give me for global energy consumption growing 100x by the end of 2028? I'd be happy to bet low hundreds of USD on the "no" side.
ETA: to be more concrete I'd put $100 on the "no" side at 10:1 odds but I'm interested if you have a more aggressive offer.
If they are right then this protocol boils down to “evaluate, then open source.” I think there are advantages to having a policy which specializes to what AI safety folks want if AI safety folks are correct about the future and specializes to what open source folks want if open source folks are correct about the future.
In practice, arguing that your evaluations show open-sourcing is safe may involve a bunch of paperwork and maybe lawyer fees. If so, this would be a big barrier for small teams, so I expect open-source advocates not to be happy with such a trajectory.
I'd be curious about how much more costly this attack is on LMs Pretrained with Human Preferences (including when that method is only applied to "a small fraction of pretraining tokens" as in PaLM 2).
I don't have the energy to contribute actual thoughts, but here are a few links that may be relevant to this conversation:
Quinton’s
His name is Quintin (EDIT: now fixed)
You may want to check out Benchmarks for Detecting Measurement Tampering:
...Detecting measurement tampering can be thought of as a specific case of Eliciting Latent Knowledge (ELK): When AIs successfully tamper with measurements that are used for computing rewards, they possess important information that the overseer doesn't have (namely, that the measurements have been tampered with). Conversely, if we can robustly elicit an AI's knowledge of whether the measurements have been tampered with, then we could train the AI to avoid measurement tampering. In
Thank you Ruby. Two other posts I like that I think fit this category are A Brief Introduction to Container Logistics and What it's like to dissect a cadaver.
How did you end up doing this work? Did you deliberately seek it out?
I went to a French engineering school which is also a military school. During the first year (which corresponds to junior year in US undergrad), each student typically spends around six months in an armed forces regiment after basic training.
Students get some amount of choice of where to spend these six months among a list of options, and there are also some opportunities outside of the military: these include working as a teaching assistant in some high schools, working for s...
How is it logistically possible for the guards to go on strike?
Who was doing all the routine work of operating cell doors, cameras, and other security facilities?
This is a good question. In France, prison guards are not allowed to strike (like most police, military, and judges). At the time, the penitentiary administration asked for sanctions against guards who took part in the strike, but I think most were not applied because there was a shortage of guards.
In practice, guards were replaced by gendarmes, and work was reduced to the basics...
I was a little bit confused about Egalitarianism not requiring (1). As an egalitarian, you may not need a full distribution over who you could be, but you do need the support of this distribution, to know what you are minimizing over?
Thanks for this. I’ve been thinking about what to do, as well as where and with whom to live over the next few years. This post highlights important things missing from default plans.
It makes me more excited about having independence, space to think, and a close circle of trusted friends (vs being managed / managing, anxious about urgent todos, and part of a scene).
I’ve spent more time thinking about math completely unrelated to my work after reading this post.
The theoretical justifications are more subtle, and seem closer to true, than previous justificat...
Thanks for this comment, I found it useful.
What did you want to write at the end of the penultimate paragraph?
Thanks for this post! Relatedly, Simon DeDeo had a thread on different ways the KL-divergence pops up in many fields:
...Kullback-Leibler divergence has an enormous number of interpretations and uses: psychological, epistemic, thermodynamic, statistical, computational, geometrical... I am pretty sure I could teach an entire graduate seminar on it.
Psychological: an excellent predictor of where attention is directed. http://ilab.usc.edu/surprise/
Epistemic: a normative measure of where you ought to direct your experimental efforts (maximize expected model-breakin
A few ways that StyleGAN is interesting for alignment and interpretability work:
I've been thinking about these two quotes from AXRP a lot lately:
From Richard Ngo's interview:
...Richard Ngo: Probably the main answer is just the thing I was saying before about how we want to be clear about where the work is being done in a specific alignment proposal. And it seems important to think about having something that doesn’t just shuffle the optimization pressure around, but really gives us some deeper reason to think that the problem is being solved. One example is when it comes to Paul Christiano’s work on amplification, I think one core insi
Your link redirects back to this page. The quote is from one of Eliezer's comments in Reply to Holden on Tool AI.
It's an example first written about by Paul Christiano here (at the beginning of Part III).
The idea is this: suppose we want to ensure that our model has acceptable behavior even in worst-case situations. One idea would be to do adversarial training: at every step during training, train an adversary model to find inputs on which the model behaves unacceptably, and penalize the model accordingly.
If the adversary is able to uncover all the worst-case inputs, this penalization ensures we end up with a model that behaves acceptably on all inputs.
RSA-2048...
Software: streamlit.io
Need: making small webapps to display or visualize results
Other programs I've tried: R shiny, ipywidgets
I find streamlit extremely simple to use, it interoperates well with other libraries (eg pandas or matplotlib), the webapps render well and are easy to share, either temporarily through ngrok, or with https://share.streamlit.io/.
Another way adversarial training might be useful, that's related to (1), is that it may make interpretability easier. Given that it weeds out some non-robust features, the features that remain (and the associated feature visualizations) tend to be clearer, cf e.g. Adversarial Robustness as a Prior for Learned Representations. One example of people using this is Leveraging Sparse Linear Layers for Debuggable Deep Networks (blog post, AN summary).
The above examples are from vision networks - I'd be curious about similar phenomena when adversarially training ...
Relevant related work : NNs are surprisingly modular
I believe Richard linked to Clusterability in Neural Networks, which has superseded Pruned Neural Networks are Surprisingly Modular.
The same authors also recently published Detecting Modularity in Deep Neural Networks.
On one hand, Olah et al.’s (2020) investigations find circuits which implement human-comprehensible functions.
At a higher level, they also find that different branches (when the modularity is enforced already by the architecture) tend to contain different features.
Another meaning could be: I want to raise the salience of the issue ‘Red vs Not Red’, I want to convey that ‘Red vs Not Red’ is an underrated axis. I think this is also an example of level 4?
This distinction reminds me of Evading Black-box Classifiers Without Breaking Eggs, in the black box adversarial examples setting.