elifland

https://www.elilifland.com/. You can give me anonymous feedback here. I often change my mind and don't necessarily endorse past writings.

Comments

Sorted by

I found some prior relevant work and tagged them in https://www.lesswrong.com/tag/successor-alignment. I found the top few comments on https://www.lesswrong.com/posts/axKWaxjc2CHH5gGyN/ai-will-not-want-to-self-improve#comments and https://www.lesswrong.com/posts/wZAa9fHZfR6zxtdNx/agi-systems-and-humans-will-both-need-to-solve-the-alignment#comments helpful.

edit: another effect to keep in mind is that capabilities research may be harder to sandbag on because of more clear metrics.

Wanted to write a more thoughtful reply to this, but basically yes, my best guess is that the benefits of informing the world are in expectation bigger than the negatives from acceleration. A potentially important background views is that I think takeoff speeds matter more than timelines, and it's unclear to me how having FrontierMath affects takeoff speeds.

I wasn't thinking much about the optics, but I'd guess that's not a large effect. I agree that Epoch made a mistake here though and this is a negative.

I could imagine changing my mind somewhat easily,.

I feel like I might be missing something, but conditional on scheming isn't it differentially useful for safety because by default scheming AIs would be more likely to sandbag on safety research than capabilities research?

elifland176

Yes, that answer matches my understanding of the concern. If the vast majority of the dataset was private to Epoch, OpenAI they could occasionally submit their solution (probably via API) to Epoch to grade, but wouldn’t be able to use the dataset with high frequency as evaluation in many experiments.


This is assuming that companies won’t fish out the data from API logs anyway, which the OP asserts but I think is unclear.


Also if they have access to the mathematicians’ reasoning in addition to final answers, this could potentially be valuable without directly training on it (e.g. maybe they could use to evaluate process-based grading approaches).


(FWIW I’m explaining the negatives, but I disagree with the comment I’m expanding on regarding the sign of Frontier Math, seems positive EV to me despite the concerns)

Superforecasters can beat domain experts, as shown in Phil Tetlock's work comparing superforecasters to intelligence analysts.

This isn't accurate, see this post: especially (3a), (3b), and https://docs.google.com/document/d/1ZEEaVP_HVSwyz8VApYJij5RjEiw3mI7d-j6vWAKaGQ8/edit?tab=t.0#heading=h.mma60cenrfmh Goldstein et al (2015)

eliflandΩ582

Do you think that cyber professionals would take multiple hours to do the tasks with 20-40 min first-solve times? I'm intuitively skeptical.

Yes, that would be my guess, medium confidence.

One component of my skepticism is that someone told me that the participants in these competitions are less capable than actual cyber professionals, because the actual professionals have better things to do than enter competitions. I have no idea how big that selection effect is, but it at least provides some countervailing force against the selection effect you're describing.

I'm skeptical of your skepticism. Not knowing basically anything about the CTF scene but using the competitive programming scene as an example, I think the median competitor is much more capable than the median software engineering professional, not less. People like competing at things they're good at.

eliflandΩ45-1

I believe Cybench first solve times are based on the fastest top professional teams, rather than typical individual CTF competitors or cyber employees, for which the time to complete would probably be much higher (especially for the latter).

This is clarifying for me, appreciate it. If I believed (a) that we needed a paradigm shift like the ones to LLMs in order to get AI systems resulting in substantial AI R&D speedup, and (b) that trend extrapolation from benchmark data would not be informative for predicting these paradigm shifts, then I would agree that the benchmarks + gaps method is not particularly informative.

Do you think that's a fair summary of (this particular set of) necessary conditions?

(edit: didn't see @Daniel Kokotajlo's new comment before mine. I agree with him regarding disagreeing with both sub-claims but I think I have a sense of where you're coming from.)

Load More