Research coordinator of Stop/Pause area at AI Safety Camp.
See explainer on why AGI could not be controlled enough to stay safe:
lesswrong.com/posts/xp6n2MG5vQkPpFEBH/the-control-problem-unsolved-or-unsolvable
Lucius, the text exchanges I remember us having during AISC6 was about the question whether 'ASI' could control comprehensively for evolutionary pressures it would be subjected to. You and I were commenting on a GDoc with Forrest. I was taking your counterarguments against his arguments seriously – continuing to investigate those counterarguments after you had bowed out.
You held the notion that ASI would be so powerful that it could control for any of its downstream effects that evolution could select for. This is a common opinion held in the community. But I've looked into this opinion and people's justifications for it enough to consider it an unsound opinion.[1]
I respect you as a thinker, and generally think you're a nice person. It's disappointing that you wrote me off as a crank in one sentence. I expect more care, including that you also question your own assumptions.
A shortcut way of thinking about this:
The more you increase 'intelligence' (as a capacity in transforming patterns in data), the more you have to increase the number of underlying information-processing components. But the corresponding increase in the degrees of freedom those components have in their interactions with each other and their larger surroundings grows faster.
This results in a strict inequality between:
The hashiness model is a toy model for demonstrating this inequality (incl. how the mismatch between 1. and 2. grows over time). Anders Sandberg and two mathematicians are working on formalising that model at AISC.
There's more that can be discussed in terms of why and how this fully autonomous machinery is subjected to evolutionary pressures. But that's a longer discussion, and often the researchers I talked with lacked the bandwidth.
I agree that Remmelt seems kind of like he has gone off the deep end
Could you be specific here?
You are sharing a negative impression ("gone off the deep end"), but not what it is based on. This puts me and others in a position of not knowing whether you are e.g. reacting with a quick broad strokes impression, and/or pointing to specific instances of dialogue that I handled poorly and could improve on, and/or revealing a fundamental disagreement between us.
For example, is it because on Twitter I spoke up against generative AI models that harm communities, and this seems somehow strategically bad? Do you not like the intensity of my messaging? Or do you intuitively disagree with my arguments about AGI being insufficiently controllable?
As is, this is dissatisfying. On this forum, I'd hope[1] there is a willingness to discuss differences in views first, before moving to broadcasting subjective judgements[2] about someone.
Even though that would be my hope, it's no longer my expectation. There's an unhealthy dynamic on this forum, where 3+ times I noticed people moving to sideline someone with unpopular ideas, without much care.
To give a clear example, someone else listed vaguely dismissive claims about research I support. Their comment lacked factual grounding but still got upvotes. When I replied to point out things they were missing, my reply got downvoted into the negative.
I guess this is a normal social response on most forums. It is naive of me to hope that on LessWrong it would be different.
This particularly needs to be done with care if the judgement is given by someone seen as having authority (because others will take it at face value), and if the judgement is guarding default notions held in the community (because that supports an ideological filter bubble).
For example, it might be the case that, for some reason, alignment would only have been solved if and only if Abraham Lincoln wasn't assassinated in 1865. That means that humans in 2024 in our world (where Lincoln was assasinated in 1865) will not be able to solve alignment, despite it being solvable in principle.
With this example, you might still assert that "possible worlds" are world states reachable through physics from past states of the world. Ie. you could still assert that alignment possibility is path-dependent from historical world states.
But you seem to mean something broader with "possible worlds". Something like "in theory, there is a physically possible arrangement of atoms/energy states that would result in an 'aligned' AGI, even if that arrangement of states might not be reachable from our current or even a past world".
–> Am I interpreting you correctly?
Alignment is a broad word, and I don't really have the authority to interpret stranger's words in a specific way without accidentally misrepresenting them.
You saying this shows the ambiguity here of trying to understand what different people mean. One researcher can make a technical claim about the possibility/tractability of "alignment" that is similarly worded to a technical claim others made. Yet their meaning of "alignment" could be quite different.
It's hard then to have a well-argued discussion, because you don't know whether people are equivocating (ie. switching between different meanings of the term).
one article managed to find six distinct interpretations of the word:
That's a good summary list! I like the inclusion of "long-term outcomes" in P6. In contrast, P4 could just entail short-term problems that were specified by a designer or user who did not give much thought to long-term repercussions.
The way I deal with the wildly varying uses of the term "alignment" is to use a minimum definition that most of those six interpretations are consistent with. Where (almost) everyone would agree that AGI not meeting that definition would be clearly unaligned.
Thanks!
With ‘possible worlds’, do you mean ‘possible to be reached from our current world state’?
And what do you mean with ‘alignment’? I know that can sound like an unnecessary question. But if it’s not specified, how can people soundly assess whether it is technically solvable?
Thanks, when you say “in the space of possible mathematical things”, do you mean “hypothetically possible in physics” or “possible in the physical world we live in”?
Here's how I specify terms in the claim:
Good to know. I also quoted your more detailed remark on AI Standards Lab at the top of this post.
I have made so many connections that have been instrumental to my research.
I didn't know this yet, and glad to hear! Thank you for the kind words, Nell.
Good to know that this is why you think AI Safety Camp is not worth funding.
Once a core part of the AGI non-safety argument is put into maths to be comprehensible for people in your circle, it’d be interesting to see how you respond then.