And I'm unsure that experts are comparable, to be frank. Due to financial limitations, I used graduate students in BioLP, while the authors of LAB-bench used PhD-level scientists.

Reply

OpenAI's CBRN tests seem unclear

Igor Ivanov4mo30

I didn't have in mind o1, these exact results seem consistent. Here's an example I had in mind:

Claude 3.5 Sonnet (old) scores 48% on ProtocolQA, and 7.1% on BioLP-bench
GPT-4o scores 53% on ProtocolQA and 17% on BioLP-bench

Reply

OpenAI's CBRN tests seem unclear

Igor Ivanov4mo*122

Good post.

The craziest thing for me is that the results of different evals, like ProtocolQA and my BioLP-bench, that suppose to evaluate similar things, are highly inconsistent. For example, two models can have similar scores on ProtocolQA, but one scores twice as much answers on BioLP-bench as the other. It means that we might not measure things we think we measure. And no one knows what causes this difference in the results.

Reply

AI Safety Evaluations: A Regulatory Review

Igor Ivanov1y41

This is an amazing overview of the field. Even if it won't collect tons of upvotes, it is super important, and saved me many hours. Thank you.

Reply

LLMs can strategically deceive while doing gain-of-function research

Igor Ivanov1y10

Totally agree. But in other cases, when the agent was discouraged against dceiving, it did it too.

Reply

5 psychological reasons for dismissing x-risks from AGI

Igor Ivanov1y10

Thanks for your feedback. It's always a pleasure to see that my work is helpful for people. I hope you will write articles that are way better than mine!

Reply

5 psychological reasons for dismissing x-risks from AGI

Igor Ivanov1y10

Thanks for your thoughtful answer. It's interesting how I just describe my observations, and people make conclusions out of it that I didn't think of

Reply

How have you become more hard-working?

Answer by Igor IvanovSep 27, 202331

For me it was a medication for my bipolar disorder quetiapine

Reply

Impending AGI doesn’t make everything else unimportant

Igor Ivanov2y20

Thanks. I got a bit clickbaity in the title.

Reply