All of dschwarz's Comments + Replies

9 years since the last comment - I'm interested in how this argument interacts with GPT-4 class LLMs, and "scale is all you need".

Sure, LLMs are not evolved in the same way as biological systems, so the path towards smarter LLMs aren't fragile in the way brains are described in this article, where maybe the first augmentation works, but the second leads to psychosis.

But LLMs are trained on writing done by biological systems with intelligence that was evolved with constraints.

So what does this say about the ability to scale up training on this human data in an attempt to reach superhuman intelligence?

Thank you for the careful look into data leakage in the other thread! Some of your findings were subtle, and these are very important details.

Instead of writing a long comment, we wrote a separate post that, like @habryka and Daniel Halawi did, looks into this carefully.  We re-read all 4 papers making these misleading claims this year and show our findings on how they're falling short.

https://www.lesswrong.com/posts/uGkRcHqatmPkvpGLq/contra-papers-claiming-superhuman-ai-forecasting 

Good point.  For this public report, we manually checked all the data points that were included here. FutureSearch threw out many other unreliable data points it couldn't corroborate, that's a core part of what it does.

The sources linked here are low quality data brokers due to a bug - there is a higher quality data source corroborating it, but FutureSearch doesn't cite the higher quality one. 

We're working on fixing this, and identifying all primary vs. secondary sources.

2RobertM
Cool, thanks!

All of the research was done by FutureSearch, so AI, with a few exceptions, such as https://app.futuresearch.ai/reports/3Li1?nodeId=MIw9, where it couldn't infer good team/enterprise ratios from analogous products where numbers were reliable. Estimating ChatGPT Teams subscribers was the hardest part, requiring the most judgment.

Most of the final words in the report were written or revised by humans. We put a high quality bar on this to publish it publicly, and did more human intervention than normal.
 

(Responded to the version of this on the EA Forum post.)

Great post!

| Manifold markets that were resolved after GPT-4’s current knowledge cutoff of Jan 1, 2022

Were you able to verify that newer knowledge didn't bleed in? Anecdotally GPT-4 can report various different cutoff dates, depending on the API. And there is anecdotal evidence that GPT-4-0314 occasionally knows about major world events after its training window, presumably from RLHF?

This could explain the better scores on politics than science.

5dynomight
Sadly, no—we had no way to verify that. I guess one way you might try to confirm/refute the idea of data leakage would be to look at the decomposition of brier scores: GPT-4 is much better calibrated for politics vs. science but only very slightly better at politics vs. science in terms of refinement/resolution. Intuitively, I'd expect data leakage to manifest as better refinement/resolution rather than better calibration.

Nice post! I'll throw another signal boost for the Metaculus hackathon that OP links, since this is the first time Metaculus is sharing their whole 1M db of individual forecasts (not just the db of questions & resolutions which is already available). You have to apply to get access though. I'll link it again even though OP already did: https://metaculus.medium.com/announcing-metaculuss-million-predictions-hackathon-91c2dfa3f39

There are nice cash prizes too.

As the OP writes, I think most the ideas here would be valid entries in the hackathon, though the... (read more)

I have to respectfully disagree with your position. Kant's point, and the point of similar people who make the sweeping universalizations that you dislike, is that it is only in such idealized circumstances that we can make rational decisions. What makes a decision good or bad is whether it would be the decision rational people would endorse in a perfect society.

The trouble is not moving from our flawed world to an ideal world. The trouble is taking the lesson we've learned from considering the ideal world and applying it to the flawed world. Kant's pr... (read more)

1[anonymous]
I actually happen to think that human morality is a fit topic for empirical inquiry, same as human language. This is a wildly different approach from either the Kantian or the Rawlsian approach. To study English, we look at the actual practices and we (possibly) develop hypotheses about the development of English and of language in general. What we do not do - in an empirical study of English - is ask ourselves what grammar, what pronunciation, what meanings we would prefer in a perfect society. Such questions are what the creators of Esperanto asked themselves (I presume). Kant and Rawls are trying to do the moral equivalent of inventing Esperanto. I, in contrast, think that morality is something that, like English and French, already exists in the world, possibly varying a bit from place to place. I realize that Kant and Rawls seek to critique our actual practices. It may seem puzzling for me to say so since I just explained my preferred approach as empirical, but so do I. But I do so from a different direction. Just as linguists will distinguish between natural language as it arises spontaneously among speakers, and the pedantic rules endorsed by language mavens, so do I distinguish between morality as it would arise spontaneously among people, and the laws raised over us by legislatures.