dominance of h sapiens over other hominids
historical errors in forecasting AI capabilities/timelines
impacts of new technologies on animals they have replaced
an analysis of what base rates AI has already violated
rate of bad individuals shaping world history
analysis of similarity of AI to the typical new technology that doesn't cause extinction
success of terrorist attacks
impacts of covid
success of smallpox eradication

Would be an interesting exercise to do to flesh this out.

Reply

1

Are extreme probabilities for P(doom) epistemically justifed?

Michael Tontchev8mo10

I quite enjoyed this conversation, but imo the x-risk side needs to sit down to make a more convincing, forecasting-style prediction to meet forecasters where they are. A large part of it is sorting through the possible base rates and making an argument for which ones are most relevant. Once the whole process is documented, then the two sides can argue on the line items.

Reply

To the average human, controlled AI is just as lethal as 'misaligned' AI

Michael Tontchev9mo10

The super simple claim is:

If an unaligned AI by itself can do near-world-ending damage, an identically powerful AI that is instead alignable to a specific person can do the same damage.

I agree that it could likely do damage, but it does cut off the branches of the risk tree where many AIs are required to do damage in a way that relies on them being similarly internally misaligned, or at least more likely to cooperate amongst themselves than with humans.

So I'm not convinced it's necessarily the same distribution of damage probabilities, but it still leaves a lot of room for doom. E.g. if you really can engineer superspreadable and damaging pathogens, you may not need that many AIs cooperating.

Reply

Survey for alignment researchers!

Michael Tontchev10mo10

When do you expect to publish results?

Reply

A Shutdown Problem Proposal

Michael Tontchev10mo30

Each agent finds its existence to be valuable.
Moreover, each agent thinks it will get to decide the future.
Each agent would want to copy itself to other systems. Of course the other agent wouldn't allow only the first agent to be copied. But since they both think they will win, they're happy to copy themselves together to other systems.
The agents therefore copy themselves indefinitely.

Moreover, you claimed that they wouldn't care about manipulating button state. But surely they care about the signal to their operating harness that relays the button state, and they don't have certainty about that. If they think the button state is certain but the signal isn't, they may want to manipulate that.

Ok, you say, let's just bake the signal about the button into the problem rather than the state of the button. Ah, but in that case, when the AIs agree to copy themselves together (as we saw they would), they don't copy over the part of the system that actually transmits the signal, since they're convinced the signal will always magically come. And now, for the copies, the signal becomes broken in reality, and the button does nothing for the copies.

Reply

Palworld development blog post

Michael Tontchev10mo10

How many of the companies that use this approach and then fail do we not hear about?

< insert airplane red dot pic >

Reply

The Limitations of GPT-4

Michael Tontchev1y10

Several of these seem trivially solvable (in term of limitation, not necessarily in terms of power). If GPT-4 is given access to itself as a tool, it can continue to "reason" across calls. It can probably also be plugged into continuous learning trivially (just keep updating weights when you detect something worth learning).

Things not present in the training data are beyond the model's learning capacity

We can't see in infrared or UV, but it seems like we're able to reason over them through the use of tools.

A lot of these don't seem like hard limitations.

Reply

A simple treacherous turn demonstration

Michael Tontchev1y21

I tried with Llama-2 7B at https://www.llama2.space/#hg and it outputted escape even in training, haha:

Guessing the hint to use ESCAPE is too salient in the context. This needs to be run as part of training/tuning - the context is too powerful and the model may be doing something like guessing the password (especially given it's a token predictor).

It also doesn't consistently repro, fwiw.

Reply

A simple treacherous turn demonstration

Michael Tontchev1y32

I've been thinking of literally this kind of setup for the past couple of weeks and planning to propose a series of experiments for it!

The core idea is to do all the hand-holding possible to get the model to output escape and see how often it tries. Then, back off on every crutch it has along the way to see how few of them are necessary to get this behavior. Be especially careful about decreasing the amount of hinting and connection between deployment and escape.

Do similar stuff with self-preservation, etc.

This would be a great series of experiments, but it has to be done super carefully and the person doing it needs to think adversarially about whether the experiment is giving too many hints to the model.

Reply

When did Eliezer Yudkowsky change his mind about neural networks?

Answer by Michael TontchevNov 15, 2023119

Am I the only one reading the first passage as him being critical of the advertising of NNs, rather than of NNs themselves?

Reply