1 min read

4

This is a special post for quick takes by Lech Mazur. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
4 comments, sorted by Click to highlight new comments since:

I've created an ensemble model that employs techniques like multi-step reasoning to establish what should be considered the real current state-of-the-art in LLMs. It substantially exceeds the highest-scoring individual models and subjectively feels smarter:

MMLU-Pro 0-shot CoT: 78.2 vs 75.6 for GPT-4o

NYT Connections, 436 questions: 34.9 vs 26.5 for GPT-4o

GPQA 0-shot CoT: 56.0 vs 52.5 for Claude 3.5 Sonnet.

I might make it publicly accessible if there's enough interest. Of course, there are expected tradeoffs: it's slower and more expensive to run.

I'm a fan of prediction markets, but they're limited to pre-set bets and not ideal for long-shot, longer-term predictions, mainly because betting against such a prediction means a loss compared to risk-free bonds if money is tied up. Therefore, I'd like to fund a 2024 Long-Shot Prediction Contest offering up to three $500 prizes. However, I need volunteers to act as judges and help getting this publicized.

  • Entrants will submit one prediction for 2024 on any topic or event

  • Volunteer judges and I will vote on the likelihood of each prediction and how "interesting" it is, forming a ranked list

  • In January 2025, judges will determine which predictions came true, and winners will get their prizes

To start with a $500 prize, I need at least two people to volunteer as judges and a minimum of 10 predictions (judges cannot enter). If this receives, let's say, 50+ predictions, there will be two prizes. For 200+ predictions, three prizes.

Interested in judging or have any suggestions? Let me know.

suggestions:

  1. Duplicate this to the open thread to increase visibility
  2. I don't know your exact implementation for forming the ranked list, but I worry that if you (for example) simply sort from low likelihood to high likelihood, it encourages people to only submit very low probability predictions.
  1. Will do.

  2. Entering an extremely unlikely prediction as a strategy to maximize EV only makes sense if there's a huge number of entrants, which seems improbable unless this contest goes viral. The inclusion of an "interesting" factor in the ranking criteria should deter spamming with low-quality entries.