All of Alvin Ånestrand's Comments + Replies

Good observation. The only questions that don't explicitly exclude it in the resolution criteria are "Will there be a massive catastrophe caused by AI before 2030?" and "Will an AI related disaster kill a million people or cause $1T of damage before 2070?", but I think the question creators mean a catastrophic event that is more directly caused by the AI, rather than just a reaction to AI being released.

Manifold questions are sometimes somewhat subjective in nature, which is a bit problematic.

I think my points argue more that control research might have higher expected value than some other approaches, that don't address delegation at all or are much less tractable. But I agree, if slop is the major problem, then most current control research doesn't adress it, though it's nice to see that this might change if Buck is right.

And my point about formal verification was to work around the slop problem by verifying the safety approach to a high degree of certainty. I don't know if it's feasible, though, but some seem to think so. Why do you think it's a bad idea?

I can think of a few reasons someone might think AI Control research should receive very high priority, apart from what is mentioned in the post or in Buck's comment:

  • You hope/expect early transformative AI to be used for provable safety approaches, using formal verification methods.
  • You think AI control research is more tractable than other research agendas, or will have useful results faster, before they are too late to apply.
  • Our only chance of aligning a superintelligence is to delegate the problem to AIs, either because it is too hard for humans, or it w
... (read more)
6johnswentworth
Even if all of those are true, the argument in the post would still imply that control research (at least of the sort people do today) cannot have very high expected value. Like, sure, let's assume for sake of discussion that most total AI safety research will be done by early transformative AI, that the only chance of aligning superintelligent AIs is to delegate, that control research is unusually tractable, and that for some reason we're going to use the AIs to pursue formal verification (not a good idea, but whatever). Even if we assume all that, we still have the problem that control research of the sort people do today does basically-nothing to address slop; it is basically-exclusively focused on intentional scheming. Insofar as intentional scheming is not the main thing which makes outsourcing to early AIs fail, all that control research cannot have very high expected value. None of your bullet points address that core argument at all.

Interesting!

I thought of a couple of things that I was wondering if you have considered.

It seems to me like when examining mutual information between two objects, there might be a lot of mutual information that an agent cannot use. Like there is a lot of mutual information between my present self and me in 10 minutes, but most of that is in information about myself that I am not aware of, that I cannot use for decision making.

Also, if you examine an object that is fairly constant, would you not get high mutual information for the object at different times, even though it is not very agentic? Can you differentiate autonomy and a stable object?

1Akira Pyinya
Thank you for your reply! "The self in 10 minutes" is a good example of revealing the difference between ACI and the traditional rational intelligence model. In the rational model, the input information is send to atom-like agent, where decisions are made based on the input. But ACI believes that's not how real-world agents work.  An agent is a complex system made up with many different parts and levels: the heart receives mechanical, chemical, and electronic information from its past self and continue beating, but with different heart rates because of some outside reasons; a cell keeps running its metabolic and functional process, which is determined by its past situation, and affected by its neighbors and chemicals in the blood; finally, the brain outputs neural signals based on its past state and new sensory information. In other words, the brain has mutual information with its past self, the body, and the outer world, but that's only a small part of the mutual information between my present self and me in 10 minutes. In other words, the brain uses only a tiny part of the information an agent uses. furthermore, when we talk about awareness, I am aware of only a tiny part of the information process in my brain. An agent is not like an atom, but an onion with many layers. Decisions are made in parallel in these layers, and we are aware of only a small part of the layers. It's even not possible to draw a solid boundary between awareness and no awareness.  The second question, a stable object may have high mutual information at different times, but may also have high mutual information with other agents. For example, a rock may be stable in size and shape, but its position and movement may highly depends on outside natural force and human behavior. However, the definition of agency is more complex than this, I will try to discuss it in the future posts. 

I think my default response when I learn about [trait X] is almost the opposite of how it is described in the post, at least if I learn that someone I know has it.

My mind reflexively tries to explain how [trait X] is not that bad, or good in the certain context. I have had to force myself to not automatically defend it in my head. I might signal (consciously or unconsciously) dislike for the trait in general, but not when I am confronted with someone I know having it. There are probably exceptions to this though, maybe for more extreme traits. I hope I wou... (read more)

2Yoav Ravid
Thanks for pointing that out. For me it's not so extreme as you describe it, but it definitely happens to me often that when I hear something like that, a hypothesis about why it's acceptable/good also jumps into my head.

My apologies, when I started on the post I searched for the word "memorization", and there were not many results. I forgot to change the statement when I realised there were more posts than I first thought.

Although, I still think there is too little discussion about memorization, perhaps with the exception of spaced repetition.

Thank you for pointing out the error.