My emotional state right now: https://twitter.com/emojimashupbot/status/1409934745895583750?s=46
I will accept the early resolution, but I'd like to reserve the option to reverse the decision and the payment should the world turn out unexpectedly in our favor.
Also, I'd like to state that I commit to using the money to buy more equipment for my AI safety research. [Edit: Matthew paid up!] [Edit 2: So did Tamay!]
I suspect the MMLU and the MATH milestones are the easiest to achieve. I suspect it will probably happen after a GPT-4-level model is specialized to perform well in mathematics like Minerva.
I'm curious about this too. The retrospective covers weaknesses in each milestone, but a collection of weak milestones doesn't necessarily aggregate to a guaranteed loss, since performance ought to be correlated (due to an underlying general factor of AI progress).
Hmm? The 10 billion funding increase to OpenAI and the arms race with google pretty much guaranteed that the 10^30/ 1 billion USD machine for training would be satisfied. So we can mark that one as "almost certainly" satisfied by EOY 2023. Only way it isn't is a shortage of GPU/TPUs.
GPT-4 likely satisfies MMLU. So with 2 "almost certain" conditions met, plus if by some fluke they aren't met by 2026, there are still several other ways Matt can lose the bet.
I think you're overconfident here. I'm quite skeptical that GPT-4 already got above 80% on every single task in the MMLU since there are 57 tasks and it got 86.4% on average. I'm also skeptical that OpenAI will very soon spend >$1 billion to train a single model, but I definitely don't think that's implausible. "Almost certain" for either of those seems wrong.
There's gpt-5 though, or GPT-4.math.finetune. You saw the Minerva results. You know there will be significant gain with a fine-tune, likely enough to satisfy 2-3 of your conditions.
As I said it's ridiculous to think someone either in the Google or OAI camp won't have more than 1 billion USD in training hardware, in service for a single model (training many instances in parallel) by openAI.
Think about what that means. 1 A100 is 25k. The cluster meta uses is 2048 of them. So about 50 million.
Why would you not go for the most powerful model possible as soon as you can? Either the world's largest tech giant is about to lose it all, or they are going to put the proportional effort in.
As I said it's ridiculous to think someone either in the Google or OAI camp won't have more than 1 billion USD in training hardware, in service for a single model (training many instances in parallel) by openAI.
I think you're reading this condition incorrectly. The $1 billion would need to be spent for a single model. If OpenAI buys a $2 billion supercomputer but they train 10 models with it, that won't necessarily qualify.
Then why did you add the term? I assume you meant that the entire supercomputer is working on instances of the same model at once. Obviously training is massively parallel.
Once the model is done obviously the supercomputer will be used for other things.
I congratulate Nathan Helm-Burger and Tomás B. for taking the other side of the bet.
Just for the record, I also took your bet. ;)
Congratulations. However, unless I'm mistaken, you simply said you'd be open to taking the bet. We didn't actually take it with you, did we?
Yea, I guess I was a little unclear on whether your post constituted a bet offer where people could simply reply to accept as I did, or if you were doing specific follow-up to finalize the bet agreements. I see you did do that with Nathan and Tomás, so it makes sense you didn't view our bet as on. It's ok, I was more interested in the epistemic/forecasting points than the $1,000 anyway. ;)
I commend you for following up and for your great retrospective analysis of the benchmark criteria. Even though I offered to take your bet, I didn't realize just how problematic the benchmark criteria were for your side of the bet.
Most importantly, it's disquieting and bad news that long timelines are looking increasingly implausible. I would have felt less worried about a world where you were right about that.
Wild that this bet lasted less than a year.
If you were interested in rebetting, maybe you can make the threshold 3 or 4 items.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
Last year I bet some people about short AI timelines. While I don't think I've lost the bet yet, I think it's clear at this point that I will lose with high probability. I've outlined the reasons why I think that in a retrospective here. Even if I end up winning, I think it will likely be the result of a technicality, and that wouldn't be very interesting.
Because of my personal preference for settling this matter now without delay, I have decided to take the step of conceding the bet now. Note however that I am not asking Tamay to do the same. I have messaged the relevant parties and asked them to send me details on how to pay them.
I congratulate Nathan Helm-Burger and Tomás B. for taking the other side of the bet.