All of Igor Ivanov's Comments + Replies

And I'm unsure that experts are comparable, to be frank. Due to financial limitations, I used graduate students in BioLP, while the authors of LAB-bench used PhD-level scientists.

I didn't have in mind o1, these exact results seem consistent. Here's an example I had in mind: 

Claude 3.5 Sonnet (old) scores 48% on ProtocolQA, and 7.1% on BioLP-bench
GPT-4o scores 53% on ProtocolQA and 17% on BioLP-bench

Good post. 

The craziest thing for me is that the results of different evals, like ProtocolQA and my BioLP-bench, that suppose to evaluate similar things, are highly inconsistent. For example, two models can have similar scores on ProtocolQA, but one scores twice as much answers on BioLP-bench as the other. It means that we might not measure things we think we measure. And no one knows what causes this difference in the results.

2LucaRighetti
Do you feel that's still an issue when you comapre to the human expert baseline? * Human experts scored 79% on ProtocolQA, and o1-preview scored 81% * Human experts scored 38% on BioLP-bench, and o1-preview scored 36% The fact that both human experts and o1-preview scored twice as high on ProtocolQA than BioLP-bench doesn't feel that inconsistent to me. It seems your BioLP-bench questions are just "twice as hard". I'd find it more inconsistent if o1-preview matched human expert performance on one test but not the other. (There are other open questions about whether the human experts in both studies are comparable and how much time people had)

This is an amazing overview of the field.  Even if it won't collect tons of upvotes, it is super important, and saved me many hours. Thank you.

I tried to use the exact quotes while describing things that they sent me because it's easy for me to misrepresent their actions, and I don't want tit to be the case.

Totally agree. But in other cases, when the agent was discouraged against dceiving, it did it too.

2Chris_Leong
You mean where they said that it was unlikely to succeed?

Thanks for your feedback. It's always a pleasure to see that my work is helpful for people. I hope you will write articles that are way better than mine!

Thanks for your thoughtful answer. It's interesting how I just describe my observations, and people make conclusions out of it that I didn't think of

Answer by Igor Ivanov31

For me it was a medication for my bipolar disorder quetiapine

Thanks. I got a bit clickbaity in the title.

Thanks for sharing your experience. I wish you to stay strong

The meaninglessness comes from the idea akin to to "why bother with anything if AGI will destroy everything it"

Read Feynman's citation from the beginning. It describes his feelings about atom bomb that are relevant for some people's thoughts about AGI.

Your comment is somewhat along the lines of the stoic philosophy.

Hi

In this post you asked to leave the names of therapists familiar with alignment.

I am such a therapist. I live in the UK. That's my website.

I recently wrote a post about my experience as a therapist with clients working on AI safety. It might serve as indirect proof that I really have such clients. 

1DivineMango
Thanks for your comment! I'm updating the post this week and will include you in the new version.

This is tricky. May it exacerbate your problems?

Anyway. If there's a chance I can helpful for you, let me know.

These problems are not unique to AI safety, but they are present way more often with my clients working on AI safety, than with my other clients.

1TeaTieAndHat
Yeah, I’d have guessed as much Maybe it’s a sign I should get into AI safety, then /j

Thanks. I am not a native English speaker, and I use GPT-4 to help me catch mistakes, but it seems like it's not perfect :)

Thanks for sharing your experience. My experience is that talking with non-AI safety people is similar to talks about global warming. If someone tells me about that, I say that this is an important issue, but I honestly don't invest that much effort to fight against it.

This is my experience, and yours might be different.

I totally agree that it might be good to have such a fire alarm as soon as possible, and looking at how fast people make GPT-4 more and more powerful makes me think that this is only a matter of time.

I believe we need a fire alarm.

People were scared of nuclear weapons since 1945, but no one restricted the arms race until The Cuban Missile Crisis in 1961. 

We know for sure that the crisis really scared both Soviet and US high commands, and the first document to restrict nukes was signed the next year, 1962.

What kind of fire alarm it might be? That is The Question.

4Seth Herd
I think we'll get some more scares from systems like autoGPT. Watching an AI think to itself, in English, is going to be powerful. And when someone hooks one up to an unturned model and asks it to think about whether and how to take over the world, I think we'll get another media event. For good reasons. I think actually making such systems, while the core LLM is still too dumb to actually succeed at taking over the world, might be important. 

I think an important thing to get people convinced of the importance of AI safety is to find proper "Gateway drug" ideas that already bother that person, so they are likely to accept this idea, and through it get interested in AI safety. 

For example, if a person is concerned about the rights of minorities, you might tell them about how we don't know how LLMs work, and this causes bias and discrimination, or how it will increase inequality.

If a person cares about privacy and is afraid of government surveillance, then you might tell them about how AI might make all these problems much worse.

Eh. It's sad if this problem is really so complex.

Thank you. At this point, I feel like I have to stick to some way to align AGI, even if it has not that big chance to succeed, because it looks like there are not that many options.

3Nathan Helm-Burger
Well, there is the possibility that some wealthy entities (individuals, governments, corporations) will become convinced that they are truly at risk as AGI enters the Overton window. In which case, they might be willing to drop a billion of funding on the project, just in case. The lure of developing uploading as a path to immortality and superpowers may help convince some billionaires. Also, as AGI becomes more believable and the risk becomes more clear, top neuroscientists and programmers may be willing to drop their current projects and switch to working on uploading. If both those things happen, I think there's a good chance it would work out. If not, I am doubtful.

Thanks for your elaborate response!

But why do you think that this project will take so much time? Why can't it be implemented faster?

3Nathan Helm-Burger
Well, because a lot of scientists have been working on this for quite a while, and the brain is quite complex. On the plus side, there's a lot of existing work. On the negative side, there's not a lot of overlap between the group of people who know enough about programming and machine learning and large scale computing vs the group of people who know a lot about neuroscience and the existing partial emulations of the brain and existing detailed explanations of the circuits of the brain. I mean, it does seem like the sort of project which could be tackled if a large well-funded determined set of experts with clear metrics worked on in parallel. I think I more despair of the idea of organizing such an effort successfully without it drowning in bureaucracy and being dragged down by the heel-dragging culture of current academia. Basically, I estimate that Conjecture has a handful of smart determined people and maybe a few million dollars to work with, and I estimate this project being accomplished in a reasonable timeframe (like 2-3 years) as an effort that would cost hundreds of millions or billions of dollars and involve hundreds or thousands of people. Maybe my estimates are too pessimistic. I'm a lot less confident about my estimates of the cost of this project than I am in my estimates of how much time we have available to work with before strong AGI capable of recursive self-improvement gets built. I'm less confident about how long we will have between dangerous AGI is built and it actually gets out of control and causes a catastrophe. Another 2 years? 4? 5? I dunno. I doubt very much that it'll be 10 years. Before then, some kind of action to reduce the threat needs to be taken. Plans which don't seem to take this into account seem to me to be unhelpfully missing the point.

Do you have any plans for inter-lab communications based on your evals?

I think, your evals might be a good place for AGI labs to standardize protocols for safety measures. 

I think this Wizard of Oz problem in large part is about being mindful and honest with oneself.

Wishful thinking is somewhat the default state for people. It's hard to be critical to own ideas and wishes. Especially, when things like money or career advancement are at stake.

Thank you! The idea of inter-temporal coordination looks interesting

Can you elaborate on your comment? 

It seems so intriguing to me, and I would love to learn more about "Why it's a bad strategy if our AGI timeline is 5 years or less"?

3Nathan Helm-Burger
Thanks for your interest Igor. Let me try better to explain my position. Basically, I am in agreement that 'brain-like AGI' or CogEms is the best fastest path towards a safe-enough AGI to at least help us make faster progress towards a more complete alignment solution. I am worried that this project will take about 10 -15 years, and that mainstream ML is going to become catastrophically dangerous within about 5 years. So, to bridge this gap I think we need to manufacture a delay. We need to stretch the time we have between inventing dangerously capable AGI systems and when that invention leads to catastrophe. We also need to be pursuing alignment (in many ways, including via developing brain-like AGI), or the delay will be squandered. My frustration with Conjecture's post here is that they talk about pursuing brain-like AGI without at least mentioning that time might be too short for that and that in order for them to be successful they need someone to be working on buying them time. My focus over the past few months has been on how we might manufacture this delay. My current best answer is that we will have to make do with something like a combination of social and governmental forces and better monitoring tools (compute governance), better safety evaluations (e.g. along the lines of ARC's safety evals, but even more diverse and thorough), and use of narrow AI tools to monitor and police the internet, using cyberweapons and perhaps official State police force or military might (in the case of international dispute) to stomp out rogue AGI before it can recursively self-improve to catastrophically strong intelligence. This is a tricky subject, potentially quite politically charged and frightening, an unpleasant scenario to talk about. Nevertheless, I think this is where we are and we must face that reality. I believe that there will be several years before we have any kind of alignment solution, but where we have the ability to build rapidly recursively self-impro

Why do you think that it will not be competitive with other approaches?

For example, it took 10 years to sequence the first human genome.  After nearly 7 years of work, another competitor started an alternative human genome project using completely another technology, and both projects were finished approximately at the same time.

I think, we are entering a black swan and it's hard to predict anything.

I absolutely agree with the conclusion. Everything is moving so fast.

I hope, these advances will cause massive interest in the alignment problem from all sorts of actors, and even if OpenAI are talking about safety (and recently they started talking about it quite often) in a large part because of PR reasons, it still means that they think, society is concerned about the progress which is a good sign.

What are examples of “knowledge of building systems that are broadly beneficial and safe while operating in the human capabilities regime?”

I assume the mentioned systems are institutions like courts, government, corporations, or universities


 


Charlotte thinks that humans and advanced AIs are universal Turing machines, so predicting capabilities is not about whether a capability is present at all, but whether it is feasible in finite time with a low enough error rate.

I have a similar thought. If AI has human-level capabilities, and a part of its job is ... (read more)

Thanks for your view on doomerism and your thoughts on the framing of a hopre

One thing helping me to preserve hope is the fact that there are so many unknown variables about AGI and how humanity will respond to it, that I don't think that any current-day prediction is worth a lot.

Although I must admit, doomers like Connor Leahy and Eliezer Yudkovsky might be extremely persuasive but they also don't know many important things about the future and they are also full of cognitive biases.  All of this makes me tell myself a mantra "There is still hope that we might win".

I am not sure whether this is the best way to think about these risks but I feel like if I'll give it up, it is a straightforward path to existential anxiety and misery, so I try not to question it too much.
 

[This comment is no longer endorsed by its author]Reply
7Paul Crowley
This is explicitly the discussion the OP asked to avoid.

I agree. We have problems with emotional attachment to humans all the time, but humans are more or less predictable, not too powerful, and usually not so great at manipulations

2Dagon
Fair enough.  I think we disagree on how manipulative and effective humans have become at emotional manipulation, especially via media, but we probably agree that AI makes the problem worse. I'm not sure whether we agree that the problem is AI assisting humans with bad motives in such manipulations, or whether it's attachment TO the AI which is problematic.  I mean, some of each, but the former scares me a lot more.

Thank you for your comment and everything you mentioned in it. I am a psychologist entering the field of AI policy-making, and I am starving for content like this

It does, and it causes a lot of problems, so I would prefer to avoid such problems with AIs 

Also, I believe that an advanced AI will be much more capable in terms of deception and manipulation than an average human

I 100% agree with you. 

I am a person entering the field right now, I also know several people in a position similar to mine, and there are just no positions for people like me, even though I think I am very proactive and have valuable experience

1Severin T. Seehrich
Yep, the field is sort of underfunded, especially after the FTX crash. That's why I suggested grantwriting as a potential career path. In general, for newcomers to the field, I very strongly recommend booking a career coaching call with AI Safety Support. They have a policy of not turning anyone down, and quite a bit of experience in funneling newcomers at any stage of their career into the field. https://80000hours.org/ are also a worthwhile address, though they can't make the time to talk with everyone.

Good post, but there is a big disbalance in human-ants relationships. 

If people could communicate with ants, nothing would stop humans to make ants suffer if it made the deal better for humans because of a power disbalance. 

For example, domesticated chickens live in very crowded and stinky conditions, and their average lifespan is a month after which they are killed. Not a particularly good living conditions.

People just care about profitability do it just because they can.

Good post

I have similar thoughts. I believe that at one moment, fears about TAI will spread like a wildfire, and the field will get a giant stream of people, money and policies, and it is hard to feel from today

First, your article is very insightful and well-structured, and totally like it.

But there is one thing that bugs me.

I am a person new to AI alignment field, and recently, I realized (maybe by mistake) that there is very hard to find a long-term financially stable full-time job in AI field-building. 

For me, it basically means that only a tiny amount of people consider AI alignment important enough to pay money to decrease P(doom).  And at the same time, here we are talking about possibility of doom within next 10 or 20 years. For me it is all a bi... (read more)

3simeon_c
Thanks for your comment!  I see your point on fear spreading causing governments to regulate. I basically agree that if it's what happens, it's good to be in a position to shape the regulation in a positive way or at least try to. I still think that I'm more optimistic about corporate governance which seems more tractable than policy governance to me. 

ChatGPT was recently launched, and it is so powerful, that it made me think that the problem of a misuse of a powerful AI  It's a very powerful tool. No one really knows how to use it, but I am sure, we will soon see it used as a tool for unpleasant things

But I also see more and more of perception of AI as a live entity with agency. People are having conversations with ChatGPT as with a human

I agree that fearmongering is thin ice, and can easily backfire, and it must be done carefully and ethically, but is it worse than the alternative in which people are unaware of AGI-related risks? I don't think that anybody can say with certainty

1Dave Lindbergh
Agreed. We sail between Scylla and Charybdis - too much or too little fear are both dangerous and it is difficult to tell how much is too much. I had an earlier pro-fearmongering comment which, on further thought, I replaced with a repeat of my first comment (since there seems to be no "delete comment"). I want the people working on AI to be fearful, and careful. I don't think I want the general public, or especially regulators, to be fearful. Because ignorant meddling seems far more likely to do harm than good - if we survive this at all, it'll likely be because of (a) the (fear-driven) care of AI researchers and (b) the watchfulness and criticism of knowledgeable skeptics who fear a runaway breakout. Corrective (b) is likely to disappear or become ineffective if the research is driven underground even a tiny bit. Given that (b) is the only check on researchers who are insufficiently careful and working underground, I don't want anything done to reduce the effectiveness of (b). Even modest regulatory suppression of research, or demands for fully "safe" AI development (probably an impossibility) seem likely to make those funding and performing the research more secretive, less open, and less likely to be stopped or redirected in time by (b). I think there is no safe path forward. Only differing types and degrees of risk. We must steer between the rocks the best we can.

The reactor meltdown on a Soviet submarine was not posing an existential threat. In the worst case, it would be a little version of Chernobyl. We might compare it to an AI which causes some serious problems, like a stock market crash, but not existential ones. And the movie is not a threat at all.

"The question is how plausible it is to generate situations that are scary enough to be useful, but under enough control to be safe."
That is a great summary of what I wanted to say!

I agree

In my opinion, this methodology will be a great way for a model to learn how to persuade humans and exploit their biases because this way model might learn these biases not just from the data it collected but also fine-tune its understanding by testing its own hypotheses