You can directly write/paste your own lyrics (Custom Mode). And v3 came out fairly recently, which is better in general, in case you haven't tried it in a while.
They seem to be created by https://app.suno.ai/ And yes, it is really easy to create songs - you can either have it create the lyrics for you based on a prompt (the default), or you can write/paste the lyrics yourself (Custom Mode). Songs can be up to ~2 minutes long I think.
Yeah, this seems to be a big part of it. If you instead switch it to the probability at market midpoint, Manifold is basically perfectly calibrated, and Kalshi is if anything overconfident (Metaculus still looks underconfident overall).
No, the letter has not been falsified.
Just to clarify: ~700 out of ~770 OpenAI employees have signed the letter (~90%)
Out of the 10 authors of the autointerpretability paper, only 5 have signed the letter. This is much lower than the average rate. One out of the 10 is no longer at OpenAI, so couldn't have signed it, so it makes sense to count this as 5/9 rather than 5/10. Either way, it's still well below the average rate.
Ah, nice catch, I'll update my comment.
There is an updated list of 702 who have signed the letter (as of the time I'm writing this) here: https://www.nytimes.com/interactive/2023/11/20/technology/letter-to-the-open-ai-board.html (direct link to pdf: https://static01.nyt.com/newsgraphics/documenttools/f31ff522a5b1ad7a/9cf7eda3-full.pdf)
Nick Cammarata left OpenAI ~8 weeks ago, so he couldn't have signed the letter.
Out of the remaining 6 core research contributors:
Out of the non-core research contributors:
That being said, it looks like Jan Leike has tweeted that he thinks the board should resign: https://twitter.com/janleike/status/1726600432750125146
And that tweet was liked by Leo Gao: https://twitter.com/nabla_theta/likes
Still, it is interesting that this group is clearly underrepresented among people who have actually signed the letter.
Edit: Updated to note that Nick Cammarata is no longer at OpenAI, so he couldn't have signed the letter. For what it's worth, he has liked at least one tweet that called for the board to resign: https://twitter.com/nickcammarata/likes
It seems like a strategy by investors or even large tech companies to create a self-fulfilling prophecy to create a coalition of OpenAI employees, when there previously was none.
How is this more likely than the alternative, which is simply that this is an already-existing coalition that supports Sam Altman as CEO? Considering that he was CEO until he was suddenly removed yesterday, it would be surprising if most employees and investors didn't support him. Unless I'm misunderstanding what you're claiming here?
If you follow the link, under the section "Free Market Seen as Best, Despite Inequality", Vietnam is the country with the highest agreement by far with the statement "Most people are better off in a free market economy, even though some people are rich and some are poor" (95%!)
That being said, while it is the most pro-capitalism country, it is clearly not the most capitalist country (although it's not that bad, 72nd out of 176 countries ranked: https://www.heritage.org/index/ranking), and it would likely be more capitalist today if South Vietnam had won.
Small typo/correction: Waymo and Cruise each claim 10k rides per week, not riders.
I agree that there is a good chance that this solution is not actually SOTA, and that it is important to distinguish the three sets.
There's a further distinction between 3 guesses per problem (which is allowed according to the original specification as Ryan notes), and 2 guesses per problem (which is currently what the leaderboard tracks [rules]).
Some additional comments / minor corrections:
AFAICT, the current SOTA-on-the-private-test-set with 3 submissions per problem is 37%, and that solution scores 54% on the public eval set.
The SOTA-on-the-public-eval-set is at least 60% (see thread).
I think this is a typo and you mean the opposite.
From looking into this a bit, it seems pretty clear that the public eval set and the private test set are not IID. They're "intended" to be the "same" difficulty, but AFAICT this essentially just means that they both consist of problems that are feasible for humans to solve.
It's not the case that a fixed set of eval/test problems were created and then randomly distributed between the public eval set and private test set. At your link, Chollet says "the [private] test set was created last" and the problems in it are "more unique and more diverse" than the public eval set. He confirms that here:
Bottom line: I would expect Ryan's solution to score significantly lower than 50% on the private test set.