It seems likely that process supervision was used for o1. I'd be curious to what extent it addresses the concerns here, if a supervision model assesses that each reasoning step is correct, relevant, and human-understandable. Even with process supervision, o1 might give a final answer that essentially ignores the process or uses some self-prompting. But process supervision also feels helpful, especially when the supervising model is more human-like, similar to pre-o1 models.
Thanks, we did look into the academic norms around this and concluded that including him was likely the standard choice. This choice was especially clear since (if I remember right) there was no further round of approval from the other authors either for the final edits after the relevant point in time.
(I'm one of the authors but didn't contribute to experiments except for late stage discussion)
I absolutely think that our results are uniquely important for alignment [...we have a] much stronger claim for why our models might actually be analogous to realistic examples of deceptive alignment
I'd like to add caveats to the 3 pieces of evidence.
[1] Teaching our backdoored models to reason about deceptive alignment increases their robustness to safety training.
This is correct but it needs qualification. Currently the only control condition for this claim is h...
The only team member whose name is on the CAIS extinction risk statement is Tony (Yuhuai) Wu.
(Though not everyone who signed the statement is listed under it, especially if they're less famous. And I know one person in the xAI team who has privately expressed concern about AGI safety in ~2017.)
So I'm imagining the agent doing reasoning like:
Misaligned goal --> I should get high reward --> Behavior aligned with reward function
The shortest description of this thought doesn't include "I should get high reward" because that's already implied by having a misaligned goal and planning with it.
In contrast, having only the goal "I should get high reward" may add description length like Johannes said. If so, the misaligned goal could well be equally simple or simpler than the high reward goal.
Interesting point. Though on this view, "Deceptive alignment preserves goals" would still become true once the goal has drifted to some random maximally simple goal for the first time.
To be even more speculative: Goals represented in terms of existing concepts could be simple and therefore stable by default. Pretrained models represent all kinds of high-level states, and weight-regularization doesn't seem to change this in practice. Given this, all kinds of goals could be "simple" as they piggyback on existing representations, requiring little additional description length.
See also: Your posts should be on Arxiv
I do agree we're leaving lots of value on the table and even causing active harm by not writing things up well, at least for Arxiv, for a bunch of reasons including some of the ones listed here.
I see. In that case, what do you think of my suggestion of inverting the LM? By default, it maps human reward functions to behavior. But when you invert it, it maps behavior to reward functions (possibly this is a one-to-many mapping but this ambiguity is a problem you can solve with more diverse behavior data). Then you could use it for IRL (with the some caveats I mentioned).
Which may be necessary since this:
The LM itself is directly mapping human behaviour (as described in the prompt) to human rewards/goals (described in the output of the LM).
...see...
Do I read right that the suggestion is as follows:
Great to see this studied systematically - it updated me in some ways.
Given that the study measures how likeable, agreeable, and informative people found each article, regardless of the topic, could it be that the study measures something different from "how effective was this article at convincing the reader to take AI risk seriously"? In fact, it seems like the contest could have been won by an article that isn't about AI risk at all. The top-rated article (Steinhardt's blog series) spends little time explaining AI risk: Mostly just (part of) the last of...
Not sure if any of these qualify but: Military equipment, ingredients for making drugs, ingredients for explosives, refugees and travelers (being transferred between countries), stocks and certificates of ownership (used to be physical), big amounts of cash. Also I bet there was lots of registration of goods in planned economies.
Another advantage of Chinese leadership in AI: while right now they have less alignment research than the West, they may be better at scaling it up at crunch time: they have more control over what companies and people work on, a bigger government, and a better track record at pulling off major projects like controlling COVID and, well, large-scale 'social engineering'.
Playing this game made me realize that humans aren't trainged to predict at the token-level. I don't know the token-level vocabulary; and made lots of mistakes by missing spaces and punctuation. Is it possible to convert the token-level prediction in to word-level prediction? This may get you a better picture of human ability.
Relevant: Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations.
They argue that the pre-trained network already learns some non-confused features but doesn't use them. And you just need to fine-tune the last layer to utilize them.
We’ll be able to fine-tune in the test environment so won’t experience OOD at deployment, and while changes will happen, continual fine-tuning will be good enough to stop the model from ever being truly OOD. We think this may apply in settings where we’re using the model for prediction, but it’s unclear whether continual fine-tuning will be able to help models learn and adapt to the rapid OOD shifts that could occur when the models are transferred from offline learning to online interaction at deployment.
Couldn't the model just fail at the start of fine-tu...
This distillation was useful for me, thanks for making it! As feedback, I got stuck at the bullet-point explanation of imitative generalization. There was not enough detail to understand it so I had to read Beth's post first and try connect it to your explanation. For example kind of changes are we considering? To what model? How do you evaluate if an change lets the human make better predictions?
A large amount of math describes the relations between agents at the same level of analysis: this is almost all of game theory. [...] our focus is on "vertical" relations, between composite agents and their parts.
This seems to be what is studied in the fields of organizational economics and to some extent in industrial organization / vertical integration. These fields have a great deal of game theory on vertical relationships, particularly rel...
My point is that, while PCIe bandwidths aren't increasing very quickly, it's easy to increase the number of machines you use. So you can distribute each NN layer (width-wise) across many machines, each of which adds to the total bandwidth you have.
(As noted in the previous comment, you can do this with <<300GB of total GPU memory for GPT-3 with something like ZeRO-infinity)
Beware bandwidth bottlenecks, as I mentioned in my original post.
Presumably bandwidth requirements can be reduced a lot through width-wise parallelism. Each GPU only has to load one slice of the model then. Of course you'll need more GPUs then but still not a crazy number as long as you use something like ZeRO-infinity.
(Yes, 8x gpu->gpu communications will hurt overall latency... but not by all that much I don't think. 1 second is an eternity.)
Width-wise communication, if you mean that, can be quite a latency bottleneck for training. And it gets ...
Thanks for elaborating I think I know what you mean now. I missed this:
I am talking about pipelining loading the NN weights into the GPU. Which is not dependent on the result of the previous layer's computation.
My original claim was that Zero-infinity has higher latency compared to pipelining in across many layers of GPUs so that you don't have to repeatedly load weights from RAM. But as you pointed out, Zero-infinity may avoid the additional latency by loading the next layer's weights from RAM at the same as computing the previous layer's output. This...
The key is: pipelining doesn't help with latency of individual requests. But that's not what we care about here. What we care about is the latency from starting request 1 to finishing request N
Thanks for the examples. Your point seems to be about throughput, not latency (which to my knowledge is defined on a per-request basis). The latency per request may not matter for training but it does matter for inference if you want your model to be fast enough to interact with the world in real time or faster.
Perhaps what you meant is that latency will be high but this isn't a problem as long as you have high throughput. That's is basically true for training. But this post is about inference where latency matters a lot more.
(It depends on the application of course, but the ZeRO Infinity approach can make your model so slow that you don't want to interact with it in real time, even at GPT-3 scale)
That would be interesting if true. I thought that pipelining doesn't help with latency. Can you expand?
Generically, pipelining increases throughput without lowering latency. Say you want to compute f(x) where f is a NN. Every stage of your pipeline processes e.g. one of the NN layers. Then stage N has to wait for the earlier stages to be completed before it can compute the output of layer N. That's why the latency to compute f(x) is high.
NB, GPT-3 used pipelining for training (in combination with model- and data parallelism) and still the large GPT-3 has h...
No, they don't. The primary justification for introducing them in the first place was to make a cheaper forward pass (=inference)
The motivation to make inference cheaper doesn't seem to be mentioned in the Switch Transformer paper nor in the original Shazeer paper. They do mention improving training cost, training time (from being much easier to parallelize), and peak accuracy. Whatever the true motivation may be, it doesn't seem that MoEs change the ratio of training to inference cost, except insofar as they're currently finicky to train.
...But the glas
You may have better info, but I'm not sure I expect 1000x better serial speed than humans (at least not with innovations in the next decade). Latency is already a bottleneck in practice, despite efforts to reduce it. Width-wise parallelism has its limits and depth- or data-wise parallelism doesn't improve latency. For example, GPT-3 already has high latency compared to smaller models and it won't help if you make it 10^3x or 10^6x bigger.
As Steven noted, your $1/hour number is cheaper than my numbers and probably more realistic. That makes a significant difference.
I agree that transformative impact is possible once we've built enough GPUs and connected them up into many, many new supercomputers bigger than the ones we have today. In a <=10 year timeline scenario, this seems like a bottleneck. But maybe not with longer timelines.
you're missing all the possibilities of a 'merely human-level' AI. It can be parallelized, scaled up and down (both in instances and parameters), ultra-reliable, immortal, consistently improved by new training datasets, low-latency, ultimately amortizes to zero capital investment
I agree this post could benefit from discussing the advantages of silicon-based intelligence, thanks for bringing them up. I'd add that (scaled up versions of current) ML systems have disadvantages compared to humans, such as a lacking actuators and being cumbersome to fine-t...
I broadly agree with your first point, that inference can be made more efficient. Though we may have different views on how much?
Of course, both inference and training become more efficient and I'm not sure if the ratio between them is changing over time.
As I mentioned there are also reasons why inference could become more expensive than in the numbers I gave. Given this uncertainty, my median guess is that the cost of inference will continue to exceed the cost of training (averaged across the whole economy).
I don't think sparse (mixture of expert) mode...
Our default expectation about large neural networks should be that we will understand them in roughly the same ways that we understand biological brains, except where we have specific reasons to think otherwise.
Here's a relevant difference: In the brain, nearby neurons can communicate with lower cost and latency than far-apart neurons. This could encourage nearby neurons to form modules to reduce the number of connections needed in the brain. But this is not the case for standard artificial architectures where layers are often fully connected or similar.
Some minor feedback points: Just from reading the abstract and intro, this could be read as a non-sequitur: "It limits our ability to mitigate short-term harms from NLP deployments". Also, calling something a "short-term" problem doesn't seem necessary and it may sound like you think the problem is not very important.
One thing I dislike about the 'punctuation outside quotes' view is that it treats "!" and "?" differently than a full stop.
"This is an exclamation"!
"Is this a question"?
Seems less natural to me than:
"This is an exclamation!"
"Is this a question?"
I think have this intuition because it is part of the quote that it is an exclamation or a question.
Most importantly I expect them to be fine-tuned on various things (perhaps you can bundle this under "higher-quality data"). Think of how Codex and Copilot are much better than vanilla GPT-3 at coding. That's the power of fine-tuning / data quality.
Fine-tuning GPT-3 on code had little benefit compared to training from scratch:
...Surprisingly, we did not observe improvements when starting from a pre-trained language model, possibly because the finetuning dataset is so large. Nevertheless, models fine-tuned from GPT converge more quickly, so we apply this strat
2023
The multimodal transformers are now even bigger; the biggest are about half a trillion parameters [...] The hype is insane now
This part surprised me. Half a trillion is only 3x bigger than GPT-3. Do you expect this to make a big difference? (Perhaps in combination with better data?). I wouldn't, given that GPT-3 was >100x bigger than GPT-2.
Maybe your'e expecting multimodality to help? It's possible, but worth keeping in mind that according to some rumors, Google's multimodal model already has on the order of 100B parameters.
On the other hand, ...
The of Delta is ca. 2x the R0 of the Wuhan strain and this doubles the effect of new immunity on .
In fact, the ONS data gives me that ~7% of Scotland had Delta so that's a reduction in of *7% = 6*7% = 0.42 just from very recent and sudden natural immunity.
That's not [edited: forgot to say "not"] enough to explain everything, but there are more factors:
1) Heterogenous immunity: the first people to become immune are often high-risk people who go to superspreader events etc.
2) Vaccinations also w...
According to one expert, the immune system essentially makes bets on how often it will face a given virus and how the virus will mutate in the future:
https://science.sciencemag.org/content/372/6549/1392
By that logic, being challenged more often means that the immune system should have a stronger and longer-lasting response:
...The immune system treats any new exposure—be it infection or vaccination—with a cost-benefit threat analysis for the magnitude of immunological memory to generate and maintain. There are resource-commitment decisions: more cells and more
Suggestion for content 2: relationship to invariant causal prediction
Lots of people in ML these days seem excited about getting out of distribution generalization with techniques like invariant causal prediction. See e.g. this, this, section 5.2 here and related background. This literature seems promising but in discussions about inner alignment it's missing. It seems useful to discuss how far it can go in helping solve inner alignment.
Good points here.
Btw I sometimes think back to how your 3y old comments on this post have aged well.