AI has probably increased valuations for Big Tech (particularly Nvidia) by at least a few trillion over the past two years. So part of this is that investors think OpenAI/Anthropic will only capture around 10% of total AI profits.
65T tokens doesn't get you to 1e26 FLOP with 100B active params? You'd need well over 100T tokens: 6 * 100 billion * 65 trillion is 3.9e25 FLOP.
GPT-4.5 being trained on fewer tokens than GPT-4o doesn't really make sense. GPT-4.5 only having 5x more active params than GPT-4o doesn't quite make sense either, though I'm not as confident that's wrong.
1e26 FLOP would have had a significant opportunity cost. Remember that OpenAI was and is very GPU constrained and may have valued GPU hours in a large-scale cluster a lot more than $2/hour. It would be worth it to...
I don't think GPT-4o was trained on 1e26 FLOP or particularly close to it. Overtraining is common but GPT-4o being overtrained by 10x for 1e26 FLOP is kind of a strong and surprising claim (some models like Llama 3 8b are extremely overtrained but they're small so this overtraining is cheap). I think a more natural explanation is that it improves on GPT-4 because of superior post-training and other innovations.
The high cost and slow speed of GPT-4.5 seems like a sign OpenAI is facing data constraints, though we don't actually know the parameters and OpenAI might be charging an bigger margin than usual (it's a "research preview" not a flagship commercial product). If data was more abundant, wouldn't GPT-4.5 be more overtrained and have fewer parameters?
edit: FWIW Artificial Analysis measures GPT-4.5 at a not-that-bad 50 tokens per second whereas I've been experiencing a painfully slow 10-20 tokens/second in the chat app. So may just be growing pains until t...
if OpenAI follows the usual naming convention of roughly 100x in raw compute.
I doubt this is a real convention. I think OpenAI wanted to call Orion GPT-5 if they thought it was good enough to deserve the name.
...In Holden Karnofsky's "AI Could Defeat All Of Us Combined" a plausible existential risk threat model is described, in which a swarm of human-level AIs outmanoeuvre humans due to AI's faster cognitive speeds and improved coordination, rather than qualitative superintelligence capabilities. This scenario is predicated on the belief that "once the first human-level AI system is created, whoever created it could use the same computing power it took to create it in order to run several hundred million copies for about a year each." If the first AGIs are as expe
By several reports, (e.g. here and here) OpenAI is throwing enormous amounts of training compute at o-series models. And if the new RL paradigm involves more decentralized training compute than the pretraining paradigm, that could lead to more consolidation into a few players, not less, because pretraining* is bottlenecked by the size of the largest cluster. E.g. OpenAI's biggest single compute cluster is similar in size to xAI's, even though OpenAI has access to much more compute overall. But if it's just about who has the most compute then the biggest pl...
AI systems can presumably be given at least as much access to company data as human employees at that company. So if rapidly scaling up the number and quality of human workers at a given company would be transformative, AI agents with >=human-level intelligence can also be transformative.
I think a little more explanation is required on why there isn't already a model with 5-10x* more compute than GPT-4 (which would be "4.5 level" given that GPT version numbers have historically gone up by 1 for every two OOMs, though I think the model literally called GPT-5 will only be a roughly 10x scale-up).
You'd need around 100,000 H100s (or maybe somewhat fewer; Llama 3.1 was 2x GPT-4 and trained using 16,000 H100s) to train a model at 10x GPT-4. This has been available to the biggest hyperscalers since sometime last year. Naively it might...
GPT-4 (Mar 2023 version) is rumored to have been trained on 25K A100s for 2e25 FLOPs, and Gemini 1.0 Ultra on TPUv4s (this detail is in the report) for 1e26 FLOPs. In BF16, A100s give 300 teraFLOP/s, TPUv4s 270 teraFLOP/s, H100s 1000 teraFLOP/s (marketing materials say 2000 teraFLOP/s, but that's for sparse computation that isn't relevant for training). So H100s have 3x advantage over hardware that trained GPT-4 and Gemini 1.0 Ultra. Llama-3-405b was trained on 16K H100s for about 2 months, getting 4e25 BF16 FLOPs at 40% compute utilization.
With 100K H100s...
Llama 405B was trained on a bunch of synthetic data in post-training for coding, long-context prompts, and tool use (see section 4.3 of the paper).
AI that can rewrite CUDA is a ways off. It's possible that it won't be that far away in calendar time, but it is far away in terms of AI market growth and hype cycles. If GPT-5 does well, Nvidia will reap the gains more than AMD or Google.
The US is currently donating doses to other countries in large quantities. Domestically, it has around 54m doses distributed but not used right now. (https://covid.cdc.gov/covid-data-tracker/#vaccinations). Some but certainly not all of those are at risk of expiration. If US authorities recommended booster shots for the general population then that would easily use up the currently unused supply and reduce vaccine exports.
I did it, I did it, I did it, yay!
A compromise that I find appealing and might implement for myself is giving a fixed percentage over a fixed amount, with that fixed percentage being relatively high (well above ten percent). You could also have multiple "donation brackets" with an increased marginal donation rate as your income increases.
I doubt an IQ test would be useful at all. One has to be quite intelligent to be a real candidate for presidency.
Probably shouldn't say someone "probably" has an IQ between 145 and 160 unless you have pretty good evidence.
I think it makes a big difference if the preferred theory is gender/racial equality as opposed to fundamentalist Christianity, and whether the opposition to those perceived challenges result from emotional sensitivity as opposed to blind faith. At the very least, the blog post doesn't indicate that the author would be irrational about issues other than marginalization.
I don't see how the fact that the permissiveness principle is only based on one (two, actually, including the third one) of the six foundations would imply that it's not a widely-held intuition.
How risk-averse are you? But even if you aren't, I suspect that right now bitcoins aren't a great investment strictly in expected-value terms due to the high risk that they will decline in value by a lot. No one really knows what will happen, though.
Another possible critique is that the philosophical arguments for ethical egoism are (I think) at least fairly plausible. The extent to which this is a critique of EA is debatable (since people within the movement state that it's compatible with non-utilitarian ethical theories and that it appeals to people who want to donate for self-interested reasons) but it's something which merits consideration.
Ehh, I think that's pretty much what rule util means, though I'm not that familiar with the nuances of the definition so take my opinion with a grain of salt. Rule util posits that we follow those rules with the intent of promoting the good; that's why it's called rule utilitarianism.
That would be a form of deontology, yes. I'm not sure which action neo-Kantians would actually endorse in that situation, though.
I think that's accurate, though maybe not because the programming jargon is unnecessarily obfuscating. The basic point is that following the rule is good in and of itself. You shouldn't kill people because there is a value in not killing that is independent of the outcome of that choice.
Your description of deontological ethics sounds closer to rule consequentialism, which is a different concept. Deontology means that following certain rules is good in and of itself, not because they lead to better decisionmaking (in terms of promoting some other good) in situations of uncertainty.
Survey taken. Defected since I'm neutral as to whether the money goes to Yvain or a random survey-taker, but would prefer the money going to me over either of those two.
I think there are two models that you measured time horizon for, Claude 3 Opus, and GPT-4 Turbo, that didn't make it onto the main figure. Is that right? There are 13 models in Figure 5, which shows the time horizon curves for a bunch of models across the full test suite, and only 11 dots on Figure 1.