I've created an ensemble model that employs techniques like multi-step reasoning to establish what should be considered the real current state-of-the-art in LLMs. It substantially exceeds the highest-scoring individual models and subjectively feels smarter:
MMLU-Pro 0-shot CoT: 78.2 vs 75.6 for GPT-4o
NYT Connections, 436 questions: 34.9 vs 26.5 for GPT-4o
GPQA 0-shot CoT: 56.0 vs 52.5 for Claude 3.5 Sonnet.
I might make it publicly accessible if there's enough interest. Of course, there are expected tradeoffs: it's slower and more expensive to run.
I'm a fan of prediction markets, but they're limited to pre-set bets and not ideal for long-shot, longer-term predictions, mainly because betting against such a prediction means a loss compared to risk-free bonds if money is tied up. Therefore, I'd like to fund a 2024 Long-Shot Prediction Contest offering up to three $500 prizes. However, I need volunteers to act as judges and help getting this publicized.
Entrants will submit one prediction for 2024 on any topic or event
Volunteer judges and I will vote on the likelihood of each prediction and how "interesting" it is, forming a ranked list
In January 2025, judges will determine which predictions came true, and winners will get their prizes
To start with a $500 prize, I need at least two people to volunteer as judges and a minimum of 10 predictions (judges cannot enter). If this receives, let's say, 50+ predictions, there will be two prizes. For 200+ predictions, three prizes.
Interested in judging or have any suggestions? Let me know.
suggestions:
Will do.
Entering an extremely unlikely prediction as a strategy to maximize EV only makes sense if there's a huge number of entrants, which seems improbable unless this contest goes viral. The inclusion of an "interesting" factor in the ranking criteria should deter spamming with low-quality entries.