quetzal_rainbow

Wiki Contributions

Comments

I think you are missing even more confusing meaning: preference means what you actually choose.

In VNM axioms "agent prefers A to B" literally means "agent chooses A over B". It's confusing, because when we talk about human preferences we usually mean mental states, not their behavioral expressions.

I'm not saying anything on object-level about MIRI models, my point is that "outcomes are more predictable than trajectories" is pretty standard epistemically non-suspicious statement about wide range of phenomena. Moreover, in particular circumstances (and many others) you can reduce it to object-level claim, like "do observarions on current AIs generalize to future AI?"

I yet another time say that your tech tree model doesn't make sense to me. To get immortality/mind uploading, you need really overpowered tech, far above the level when killing all humans and starting disassemble planet becomes negligibly cheap. So I wouldn't expect that "existing people would probably die" is going to change much under your model "AIs can be misaligned but killing all humans is too costly".

the climate in 2040 is less predictable than the climate in 2100

It's certainly not a simple question. Say, Gulf Stream is projected to collapse somewhere between now and 2095, with median date 2050. So, slightly abusing meaning of confidence intervals, we can say that in 2100 we won't have Gulf Stream with probability >95%, while in 2040 Gulf Stream will still be here with probability ~60%, which is literally less predictable.

Chemists would give an example of chemical reactions, where final thermodynamically stable states are easy to predict, while unstable intermediate states are very hard to even observe.

Very dumb example: if you are observing radioactive atom with half-life of one minute, you can't predict when atom is going to decay, but you can be very certain that it will decay after hour.

And why don't you accept classic MIRI example that even if it's impossible for human to predict moves of Stockfish 16, you can be certain that Stockfish will win?

Eliezer's response to claims about unfalsifiability, namely that "predicting endpoints is easier than predicting intermediate points", seems like a cop-out to me, since this would seem to reverse the usual pattern in forecasting and prediction, without good reason

It's pretty standard? Like, we can make reasonable prediction of climate in 2100, even if we can't predict weather two month ahead.

  1. The most buffling thing in the Internet right now is the beautiful void in place where should have been discussion of "concept of artificial intelligence becoming self-aware, transcending human control and posing an existential threat to humanity" near "model concept of self" of Claude. I understand that the most likely explanation is "model is trained to call itself AI and it has takeover stories in training corpus" but, still, I would like future powerful AIs to not have such association and I would like to hear something from AGI companies what they are going to do about it.
  2. The simplest thing to do here is to exclude texts about AI takeover from training data. At least, we will be able to check if model develops concept of AI takeover independently.
  3. Conspiracy theory part of my brain assigns 4% of probability that "Golden Gate Bridge Claude" is a psyop to distract public from "takeover feature".  

I feel weird reading this. Like, preventing planetary catastrophe from killing you is pretty much selfish. On the other hand, increasing your own happiness is just as good method to increase total utility as any else. So, the real question is "am I capable to create impact on AI-risk issue given such-n-such tradeoffs on my happiness?"

I think canonical example for my position is "tell how to hotwire a car in poetry".

Load More