All of Remmelt's Comments + Replies

It's because you keep making incomprehensible arguments that don't make any sense

Good to know that this is why you think AI Safety Camp is not worth funding. 

Once a core part of the AGI non-safety argument is put into maths to be comprehensible for people in your circle, it’d be interesting to see how you respond then.

Lucius, the text exchanges I remember us having during AISC6 was about the question whether 'ASI' could control comprehensively for evolutionary pressures it would be subjected to. You and I were commenting on a GDoc with Forrest. I was taking your counterarguments against his arguments seriously – continuing to investigate those counterarguments after you had bowed out.

You held the notion that ASI would be so powerful that it could control for any of its downstream effects that evolution could select for. This is a common opinion held in the community. Bu... (read more)

4Lucius Bushnaq
I think it is very fair that you are disappointed. But I don't think I can take it back. I probably wouldn’t have introduced the word crank myself here. But I do think there’s a sense in which Oliver’s use of it was accurate, if maybe needlessly harsh. It does vaguely point at the right sort of cluster in thing-space. It is true that we discussed this and you engaged with a lot of energy and in good faith. But I did not think Forrest’s arguments were convincing at all, and I couldn’t seem to manage to communicate to you why I thought that. Eventually, I felt like I wasn’t getting through to you, Quintin Pope also wasn’t getting through to you, and continuing started to feel draining and pointless to me. I emerged from this still liking you and respecting you, but thinking that you are wrong about this particular technical matter in a way that does seem like the kind of thing people imagine when they hear ‘crank’.

I agree that Remmelt seems kind of like he has gone off the deep end


Could you be specific here?  

You are sharing a negative impression ("gone off the deep end"), but not what it is based on. This puts me and others in a position of not knowing whether you are e.g. reacting with a quick broad strokes impression, and/or pointing to specific instances of dialogue that I handled poorly and could improve on, and/or revealing a fundamental disagreement between us.

For example, is it because on Twitter I spoke up against generative AI models that harm communi... (read more)

I think many people have given you feedback. It is definitely not because of "strategic messaging". It's because you keep making incomprehensible arguments that don't make any sense and then get triggered when anyone tries to explain why they don't make sense, while making statements that are wrong with great confidence.

As is, this is dissatisfying. On this forum, I'd hope[1] there is a willingness to discuss differences in views first, before moving to broadcasting subjective judgements[2] about someone.

People have already spent many hours givin... (read more)

For example, it might be the case that, for some reason, alignment would only have been solved if and only if Abraham Lincoln wasn't assassinated in 1865. That means that humans in 2024 in our world (where Lincoln was assasinated in 1865) will not be able to solve alignment, despite it being solvable in principle.


With this example, you might still assert that "possible worlds" are world states reachable through physics from past states of the world. Ie. you could still assert that alignment possibility is path-dependent from historical world states.

But you... (read more)

1Satron
Yup, that's roughly what I meant. However, one caveat would be that I would change "physically possible" to "metaphysically/logically possible" because I don't know if worlds with different physics could exist, whereas I am pretty sure that worlds with different metaphysical/logical laws couldn't exist. By that, I mean stuff like the law of non-contradiction and "if a = b, then b = a." I think the main antidote against this is to ask the person you are speaking with to define the term if they are making claims in which equivocation is especially likely. Yeah, that's reasonable.

Thanks!

With ‘possible worlds’, do you mean ‘possible to be reached from our current world state’?

And what do you mean with ‘alignment’? I know that can sound like an unnecessary question. But if it’s not specified, how can people soundly assess whether it is technically solvable?

4Satron
By "possible worlds," I mean all worlds that are consistent with laws of logic, such as the law of non-contradiction. For example, it might be the case that, for some reason, alignment would only have been solved if and only if Abraham Lincoln wasn't assassinated in 1865. That means that humans in 2024 in our world (where Lincoln was assasinated in 1865) will not be able to solve alignment, despite it being solvable in principle. My answer is kind of similar to @quila's. I think that he means roughly the same thing by "space of possible mathematical things." I don't think that my definition of alignment is particularly important here because I was mostly clarifying how I would interpret the sentence if a stranger said it. Alignment is a broad word, and I don't really have the authority to interpret stranger's words in a specific way without accidentally misrepresenting them. For example, one article managed to find six distinct interpretations of the word:

Thanks, when you say “in the space of possible mathematical things”, do you mean “hypothetically possible in physics” or “possible in the physical world we live in”?

2[anonymous]
Possible to be ran on a computer in the actual physical world

Here's how I specify terms in the claim:

  • AGI is a set of artificial components, connected physically and/or by information signals over time, to in aggregate sense and act autonomously over many domains.
    • 'artificial' as configured out of a (hard) substrate that can be standardised to process inputs into outputs consistently (vs. what our organic parts can do).
    • 'autonomously' as continuing to operate without needing humans (or any other species that share a common ancestor with humans).
  • Alignment is at the minimum the control of the AGI's components (as modifie
... (read more)

Good to know. I also quoted your more detailed remark on AI Standards Lab at the top of this post.

I have made so many connections that have been instrumental to my research. 


I didn't know this yet, and glad to hear!  Thank you for the kind words, Nell.

Fair question. You can assume it is AoE.

Research leads are not going to be too picky in terms of what hour you send the application in,

There is no need to worry about the exact deadline. Even if you send in your application on the next day, that probably won't significantly impact your chances of getting picked up by your desired project(s).

Sooner is better, since many research leads will begin composing their teams after the 17th, but there is no hard cut-off point.

Thanks!  These are thoughtful points. See some clarifications below:
 

AGI could be very catastrophic even when it stops existing a year later.

You're right. I'm not even covering all the other bad stuff that could happen in the short-term, that we might still be able to prevent, like AGI triggering global nuclear war.

What I'm referring to is unpreventable convergence on extinction.
 

If AGI makes earth uninhabitable in a trillion years, that could be a good outcome nonetheless.

Agreed that could be a good outcome if it could be attainable.

In prac... (read more)

Update: reverting my forecast back to 80% chance likelihood for these reasons.

I'm also feeling less "optimistic" about an AI crash given:

  1. The election result involving a bunch of tech investors and execs pushing for influence through Trump's campaign (with a stated intention to deregulate tech).
  2. A military veteran saying that the military could be holding up the AI industry like "Atlas holding the globe", and an AI PhD saying that hyperscaled data centers, deep learning, etc, could be super useful for war.

I will revise my previous forecast back to 80%+ chance.

Yes, I agree formalisation is needed. See comment by flandry39 in this thread on how one might go about doing so. 

Worth considering is that there are actually two aspects that make it hard to define the term ‘alignment’ such to allow for sufficiently rigorous reasoning:

  1. It must allow for logically valid reasoning (therefore requiring formalisation).
  2. It must allow for empirically sound reasoning (ie. the premises correspond with how the world works). 

In my reply above, I did not help you much with (1.). Though even while still using the English lang... (read more)

4harfe
This is maybe not the central point, but I note that your definition of "alignment" doesn't precisely capture what I understand "alignment" or a good outcome from AI to be: AGI could be very catastrophic even when it stops existing a year later. If AGI makes earth uninhabitable in a trillion years, that could be a good outcome nonetheless. I don't know whether that covers "humans can survive on mars with a space-suit", but even then, if humans evolve/change to handle situations that they currently do not survive under, that could be part of an acceptable outcome.

For an overview of why such a guarantee would turn out impossible, suggest taking a look at Will Petillo's post Lenses of Control.

Defining alignment (sufficiently rigorous so that a formal proof of (im)possibility of alignment is conceivable) is a hard thing!

It's less hard than you think, if you use a minimal-threshold definition of alignment: 

That "AGI" continuing to exist, in some modified form, does not result eventually in changes to world conditions/contexts that fall outside the ranges that existing humans could survive under. 

1harfe
This is not a formal definition. Your English sentence has no apparent connection to mathematical objects, which would be necessary for a rigorous and formal definition.

Yes, I think there is a more general proof available. This proof form would combine limits to predictability and so on, with a lethal dynamic that falls outside those limits.

The question is more if it can ever be truly proved at all, or if it doesn't turn out to be an undecidable problem.

Control limits can show that it is an undecidable problem. 

A limited scope of control can in turn be used to prove that a dynamic convergent on human-lethality is uncontrollable. That would be a basis for an impossibility proof by contradiction (cannot control AGI effects to stay in line with human safety).

Awesome directions. I want to bump this up.
 

This might include AGI predicting its own future behaviour, which is kind of essential for it to stick to a reliably aligned course of action.

There is a simple way of representing this problem that already shows the limitations. 

Assume that AGI continues to learn new code from observations (inputs from the world) – since learning is what allows the AGI to stay autonomous and adaptable in acting across changing domains of the world.

Then in order for AGI code to be run to make predictions about relev... (read more)

Just found your insightful comment. I've been thinking about this for three years. Some thoughts expanding on your ideas:
 

my idea is more about whether alignment could require that the AGI is able to predict its own results and effects on the world (or the results and effects of other AGIs like it, as well as humans)...

In other words, alignment requires sufficient control. Specifically, it requires AGI to have a control system with enough capacity to detect, model, simulate, evaluate, and correct outside effects propagated by the AGI's own components.... (read more)

No actually, assuming the machinery has a hard substrate and is self-maintaining is enough. 

we could create aligned ASI by simulating the most intelligent and moral people

This is not an existence proof, because it does not take into account the difference in physical substrates.

Artificial General Intelligence would be artificial, by definition. In fact, what allows for the standardisation of hardware components is the fact that the (silicon) substrate is hard under human living temperatures and pressures. That allows for configurations to stay compartmentalised and stable.

Human “wetware” has a very different substrate. It’s a soup of bouncing org... (read more)

Just found a podcast on OpenAI’s bad financial situation.

It’s hosted by someone in AI Safety (Jacob Haimes) and an AI post-doc (Igor Krawzcuk).

https://kairos.fm/posts/muckraiker-episodes/muckraiker-episode-004/

Noticing no response here after we addressed superficial critiques and moved to discussing the actual argument.

For those few interested in questions raised above, Forrest wrote some responses: http://69.27.64.19/ai_alignment_1/d_241016_recap_gen.html

The claims made will feel unfamiliar and the reasoning paths too. I suggest (again) taking the time to consider what is meant. If a conclusion looks intuitively wrong from some AI Safety perspective, it may be valuable to explicitly consider the argumentation and premises behind that. 

BTW if anyone does want to get into the argument, Will Petillo’s Lenses of Control post is a good entry point. 

It’s concise and correct – a difficult combination to achieve here. 

Resonating with you here!  Yes, I think autonomous corporations (and other organisations) would result in society-wide extraction, destabilisation and totalitarianism.

2[anonymous]
Thanks! I should have been more clear that the trajectory toward level 5 (with all human virtue/trust being hackable for instrumental gains) itself is concerning, not just the eventual leap when it gets there.

Sam Altman demonstrating what kind of actions you can get away with in front of everyone's eyes seems problematic.


Very much agreeing with this.

Appreciating your inquisitive question!

One way to think about it:

For OpenAI to scale more toward “AGI”, the corporation needs more data, more automatable work, more profitable uses for working machines, and more hardware to run those machines. 

If you look at how OpenAI has been increasing those four variables, you can notice that there are harms associated with each. This tends to result in increasing harms.

One obvious example:  if they increase hardware, this also increases pollution (from mining, producing, installing, and running the hardware)... (read more)

Let me rephrase that sentence to ‘industry expenditures in deep learning’. 

what signals you send to OAI execs seems not relevant.

Right, I don’t occupy myself much with what the execs think. I do worry about stretching the “Overton window” for concerned/influential stakeholders broadly. Like, if no-one (not even AI Safety folk) acts to prevent OpenAI from continuing to violate its charter, then everyone kinda gets used to it being this way and maybe assumes it can’t be helped or is actually okay.

i don't see why this would lead them to downsize, if "the gap between industry investment in deep learning and actual revenue has balloon

... (read more)
3Remmelt
Let me rephrase that sentence to ‘industry expenditures in deep learning’. 

Donation opportunities for restricting AI companies 

... (read more)

When you say failures will "build up toward lethality at some unknown rate", why would failures build up toward lethality? We have lots of automated systems e.g. semiconductor factories, and failures do not accumulate until everyone at the factory dies, because humans and automated systems can notice errors and correct them.

Let's take your example of semiconductor factories.

There are several ways to think about failures here. For one, we can talk about local failures in the production of the semiconductor chips. These especially will get corrected for.

A le... (read more)

1Remmelt
Noticing no response here after we addressed superficial critiques and moved to discussing the actual argument. For those few interested in questions raised above, Forrest wrote some responses: http://69.27.64.19/ai_alignment_1/d_241016_recap_gen.html The claims made will feel unfamiliar and the reasoning paths too. I suggest (again) taking the time to consider what is meant. If a conclusion looks intuitively wrong from some AI Safety perspective, it may be valuable to explicitly consider the argumentation and premises behind that. 

I agree that with superficial observations, I can't conclusively demonstrate that something is devoid of intellectual value.

Thanks for recognising this, and for taking some time now to consider the argument. 

 

However, the nonstandard use of words like "proof" is a strong negative signal on someone's work.

Yes, this made us move away from using the term “proof”, and instead write “formal reasoning”. 

Most proofs nowadays are done using mathematical notation. So it is understandable that when people read “proof”,  they automatically think “... (read more)

1Remmelt
Let's take your example of semiconductor factories. There are several ways to think about failures here. For one, we can talk about local failures in the production of the semiconductor chips. These especially will get corrected for. A less common way to talk about factory failures is when workers working in the factories die or are physically incapacitated as a result, eg. because of chemical leaks or some robot hitting them. Usually when this happens, the factories can keep operating and existing. Just replace the expendable workers with new workers. Of course, if too many workers die, other workers will decide to not work at those factories. Running the factories has to not be too damaging to the health of the internal human workers, in any of the many (indirect) that ways operations could turn out to be damaging.  The same goes for humans contributing to the surrounding infrastructure needed to maintain the existence of these sophisticated factories – all the building construction, all the machine parts, all the raw materials, all the needed energy supplies, and so on. If you try overseeing the relevant upstream and downstream transactions, it turns out that a non-tiny portion of the entire human economy is supporting the existence of these semiconductor factories one way or another. It took a modern industrial cross-continental economy to even make eg. TSMC's factories viable. The human economy acts as a forcing function constraining what semiconductor factories can be. There are many, many ways to incapacitate complex multi-celled cooperative organisms like us. So the semiconductor factories that humans are maintaining today ended up being constrained to those that for the most part do not trigger those pathways downstream. Some of that is because humans went through the effort of noticing errors explicitly and then correcting them, or designing automated systems to do likewise. But the invisible hand of the market considered broadly – as constituting of

How about I assume there is some epsilon such that the probability of an agent going off the rails

Got it. So we are both assuming that there would be some accumulative failure rate [per point 3.].
 

Why can't the agent split into multiple ~uncorrelated agents and have them each control some fraction of resources (maybe space) such that one off-the-rails agent can easily be fought and controlled by the others?

I tried to adopt this ~uncorrelated agents framing, and then argue from within that. But I ran up against some problems with this framing: 

  • It
... (read more)
1Jeremy Gillen
I appreciate that you tried. If words are failing us to this extent, I'm going to give up.

I usually do not take authorities too seriously before I understand their reasoning in a particular question. And understanding a person's reasoning may occasionally mean that I disagree in particular points as well. In my experience, even the most respectful people are still people, which means they often think in messy ways and they are good just on average

Right – this comes back to actually examining people’s reasoning. 

Relying on the authority status of an insider (who dismissed the argument) or on your ‘crank vibe’ of the outsider (who made the a... (read more)

efficiently filters for people who are inclined to join the activist movement--especially on the hard-core "front lines"--whereas passive "supporters" can be more trouble than they're worth.

I had not considered how our messaging is filtering out non-committed supporters. Interesting!

No worries. We won't be using ChatGPT or any other model to generate our texts.

As I understand the issue, the case for barricading AI rests on:  

Great list! Basically agreeing with the claims under 1. and the structure of what needs to be covered under 2.
  

Meanwhile, the value of disruptive protest is left to the reader to determine.

You're right. Usually when people hear about a new organisation on the forum, they expect some long write-up of the theory of change and the considerations around what to prioritise. 

I don't think I have time right now for writing a neat public write-up. This is just me being realistic... (read more)

So it's the AI being incompetent?

Yes, but in the sense that there are limits to the AGI's capacity to sense, model, simulate, evaluate, and correct own component effects propagating through a larger environment.
 


You don't have to simulate something to reason about it.

If you can't simulate (and therefore predict) that a failure mode that by default is likely to happen would happen, then you cannot counterfactually act to prevent the failure mode.

 

You could walk me though how one of these theorems is relevant to capping self-improvement of reliabil

... (read more)
8Jeremy Gillen
How about I assume there is some epsilon such that the probability of an agent going off the rails is greater than epsilon in any given year. Why can't the agent split into multiple ~uncorrelated agents and have them each control some fraction of resources (maybe space) such that one off-the-rails agent can easily be fought and controlled by the others? This should reduce the risk to some fraction of epsilon, right? (I'm gonna try and stay focused on a single point, specifically the argument that leads up to >99%, because that part seems wrong for quite simple reasons).

claiming to have a full mathematical proof that safe AI is impossible,

I have never claimed that there is a mathematical proof. I have claimed that the researcher I work with has done their own reasoning in formal analytical notation (just not maths). Also, that based on his argument – which I probed and have explained here as carefully as I can – AGI cannot be controlled enough to stay safe, and actually converges on extinction.

That researcher is now collaborating with Anders Sandberg to formalise an elegant model of AGI uncontainability in mathematical no... (read more)

  • I agree that with superficial observations, I can't conclusively demonstrate that something is devoid of intellectual value. However, the nonstandard use of words like "proof" is a strong negative signal on someone's work.
  • If someone wants to demonstrate a scientific fact, the burden of proof is on them to communicate this in some clear and standard way, because a basic strategy of anyone practicing pseudoscience is to spend lots of time writing something inscrutable that ends in some conclusion, then claim that no one can disprove it and anyone who thinks
... (read more)

Let me recheck the AI Impacts paper.

I definitely made a mistake in quickly checking that number shared by colleague.

The 2023 AI Impacts survey shows a mean risk of 14.4% for the question “What probability do you put on future AI advances causing human extinction or similarly permanent and severe disempowerment of the human species within the next 100 years?”.

Whereas the other smaller sample survey gives a median estimate of 30%  

I already thought using those two figures as a range did not make sense, but putting a mean and a median in the same range i... (read more)

Thanks, as far as I can this is a mix of critiques of strategic approach (fair enough), about communication style (fair enough), and partial misunderstandings of the technical arguments.

 

instead of a succession of events which need to go your way, I think you should aim for incremental marginal gains. There is no cost-effectiveness analysis…

I agree that we should not get hung up on a succession of events to go a certain way. IMO, we need to get good at simultaneously broadcasting our concerns in a way that’s relatable to other concerned communities, a... (read more)

1Remmelt
I definitely made a mistake in quickly checking that number shared by colleague. The 2023 AI Impacts survey shows a mean risk of 14.4% for the question “What probability do you put on future AI advances causing human extinction or similarly permanent and severe disempowerment of the human species within the next 100 years?”. Whereas the other smaller sample survey gives a median estimate of 30%   I already thought using those two figures as a range did not make sense, but putting a mean and a median in the same range is even more wrong. Thanks for pointing this out! Let me add a correcting comment above. 

An obvious consideration for any reflective agent is to find ways to reduce the risk of goal-related failure.

by "goal-related systems" I just meant whatever is keeping track of the outcomes being optimized for.


So the argument for 3. is that just by AGI continuing to operate and maintain its components as adapted to a changing environment, the machinery can accidentally end up causing destabilising effects that were untracked or otherwise insufficiently corrected for. 

You could call this a failure of the AGI’s goal-related systems if you mean with tha... (read more)

3Jeremy Gillen
So it's the AI being incompetent? Yeah I think would be a good response to my argument against premise 2). I've had a quick look at the list of theorems in the paper, I don't know most of them, but the ones I do know don't seem to support the point you're making. So I don't buy it. You could walk me though how one of these theorems is relevant to capping self-improvement of reliability? You don't have to simulate something to reason about it. Garrabrant induction shows one way of doing self-referential reasoning. As an analogy: Use something more like democracy than like dictatorship, such that any one person going crazy can't destroy the world/country, as a crazy dictator would.

Appreciating your openness. 

(Just making dinner – will get back to this when I’m behind my laptop in around an hour). 

0Remmelt
So the argument for 3. is that just by AGI continuing to operate and maintain its components as adapted to a changing environment, the machinery can accidentally end up causing destabilising effects that were untracked or otherwise insufficiently corrected for.  You could call this a failure of the AGI’s goal-related systems if you mean with that that the machinery failed to control its external effects in line with internally represented goals.  But this would be a problem with the control process itself.   Unfortunately, there are fundamental limits to that cap the extent to which the machinery can improve its own control process.  Any of the machinery’s external downstream effects that its internal control process cannot track (ie. detect, model, simulate, and identify as a “goal-related failure”), that process cannot correct for.   For further explanation, please see links under point 4.   The problem here is that (a) we are talking about not just a complicated machine product but self-modifying machinery and (b) at the scale this machinery would be operating at it cannot account for most of the potential human-lethal failures that could result.  For (a), notice how easily feedback processes can become unsimulatable for such unfixed open-ended architectures.  * E.g. How can AGI code predict how its future code learned from unknown inputs will function in processing subsequent unknown inputs? What if future inputs are changing as a result of effects propagated across the larger environment from previous AGI outputs? And those outputs were changing as a result of previous new code that was processing signals in connection with other code running across the machinery? And so on.   For (b), engineering decentralised redundancy can help especially at the microscale.  * E.g. correcting for bit errors. * But what does it mean to correct for failures at the level of local software (bugs, viruses, etc)? What does it mean to correct for failures across som

There is some risk of its goal-related systems breaking

Ah, that’s actually not the argument.

Could you try read points 1-5. again?

3Jeremy Gillen
I've reread and my understanding of point 3 remains the same. I wasn't trying to summarize points 1-5, to be clear. And by "goal-related systems" I just meant whatever is keeping track of the outcomes being optimized for. Perhaps you could point me to my misunderstanding?

Even if you know a certain market is a bubble, it's not exactly trivial to exploit if you don't know when it's going to burst, which prices will be affected, and to what degree. "The market can remain irrational longer than you can remain solvent" and all that.

Yes, all of this. I didn’t know how to time this, and also good point that operationalising it in terms of AI stocks to target at what strike price could be tricky too. 

If I could get the timing right, this makes sense. But I don’t have much of an edge in judging when the bubble would burst. And put options are expensive. 

If someone here wants to make a 1:1 bet over the next three years, I’m happy to take them up on the offer. 

If there's less demand from cloud users to rent GPU's Google/Microsoft/Amazon would likely use the GPU's in their datacenters for their own projects (or projects like Antrophic/OpenAI).

 

That’s a good point. Those big tech companies are probably prepared to pay for the energy use if they have the hardware lying around anyway. 

To clarify for future reference, I do think it’s likely (80%+) that at some point over the next 5 years there will be a large reduction in investment in AI and a corresponding market crash in AI company stocks, etc, and that both will continue to be for at least three months.

 

Update: I now think this is 90%+ likely to happen (from original prediction date).

1Remmelt
Update: reverting my forecast back to 80% chance likelihood for these reasons.
3Alexander Gietelink Oldenziel
How many put options have you bought? You can make a killing if you are right.    Bet or Update. 
Load More