LESSWRONG
LW

All of Remmelt's Comments + Replies

We don't want to post again "This might be the last AI Safety Camp"

It's because you keep making incomprehensible arguments that don't make any sense

Good to know that this is why you think AI Safety Camp is not worth funding.

Once a core part of the AGI non-safety argument is put into maths to be comprehensible for people in your circle, it’d be interesting to see how you respond then.

We don't want to post again "This might be the last AI Safety Camp"

[+]Remmelt11d-8-11

We don't want to post again "This might be the last AI Safety Camp"

Remmelt11d1-1

Lucius, the text exchanges I remember us having during AISC6 was about the question whether 'ASI' could control comprehensively for evolutionary pressures it would be subjected to. You and I were commenting on a GDoc with Forrest. I was taking your counterarguments against his arguments seriously – continuing to investigate those counterarguments after you had bowed out.

You held the notion that ASI would be so powerful that it could control for any of its downstream effects that evolution could select for. This is a common opinion held in the community. Bu... (read more)

4Lucius Bushnaq11d

I think it is very fair that you are disappointed. But I don't think I can take it back. I probably wouldn’t have introduced the word crank myself here. But I do think there’s a sense in which Oliver’s use of it was accurate, if maybe needlessly harsh. It does vaguely point at the right sort of cluster in thing-space. It is true that we discussed this and you engaged with a lot of energy and in good faith. But I did not think Forrest’s arguments were convincing at all, and I couldn’t seem to manage to communicate to you why I thought that. Eventually, I felt like I wasn’t getting through to you, Quintin Pope also wasn’t getting through to you, and continuing started to feel draining and pointless to me. I emerged from this still liking you and respecting you, but thinking that you are wrong about this particular technical matter in a way that does seem like the kind of thing people imagine when they hear ‘crank’.

We don't want to post again "This might be the last AI Safety Camp"

Remmelt11d-10

I agree that Remmelt seems kind of like he has gone off the deep end

Could you be specific here?

You are sharing a negative impression ("gone off the deep end"), but not what it is based on. This puts me and others in a position of not knowing whether you are e.g. reacting with a quick broad strokes impression, and/or pointing to specific instances of dialogue that I handled poorly and could improve on, and/or revealing a fundamental disagreement between us.

For example, is it because on Twitter I spoke up against generative AI models that harm communi... (read more)

habryka11d181

I think many people have given you feedback. It is definitely not because of "strategic messaging". It's because you keep making incomprehensible arguments that don't make any sense and then get triggered when anyone tries to explain why they don't make sense, while making statements that are wrong with great confidence.

As is, this is dissatisfying. On this forum, I'd hope^[1] there is a willingness to discuss differences in views first, before moving to broadcasting subjective judgements^[2] about someone.

People have already spent many hours givin... (read more)

What do you mean with ‘alignment is solvable in principle’?

Remmelt15d20

For example, it might be the case that, for some reason, alignment would only have been solved if and only if Abraham Lincoln wasn't assassinated in 1865. That means that humans in 2024 in our world (where Lincoln was assasinated in 1865) will not be able to solve alignment, despite it being solvable in principle.

With this example, you might still assert that "possible worlds" are world states reachable through physics from past states of the world. Ie. you could still assert that alignment possibility is path-dependent from historical world states.

But you... (read more)

1Satron15d

Yup, that's roughly what I meant. However, one caveat would be that I would change "physically possible" to "metaphysically/logically possible" because I don't know if worlds with different physics could exist, whereas I am pretty sure that worlds with different metaphysical/logical laws couldn't exist. By that, I mean stuff like the law of non-contradiction and "if a = b, then b = a." I think the main antidote against this is to ask the person you are speaking with to define the term if they are making claims in which equivocation is especially likely. Yeah, that's reasonable.

What do you mean with ‘alignment is solvable in principle’?

Remmelt17d10

Thanks!

With ‘possible worlds’, do you mean ‘possible to be reached from our current world state’?

And what do you mean with ‘alignment’? I know that can sound like an unnecessary question. But if it’s not specified, how can people soundly assess whether it is technically solvable?

4Satron17d

By "possible worlds," I mean all worlds that are consistent with laws of logic, such as the law of non-contradiction. For example, it might be the case that, for some reason, alignment would only have been solved if and only if Abraham Lincoln wasn't assassinated in 1865. That means that humans in 2024 in our world (where Lincoln was assasinated in 1865) will not be able to solve alignment, despite it being solvable in principle. My answer is kind of similar to @quila's. I think that he means roughly the same thing by "space of possible mathematical things." I don't think that my definition of alignment is particularly important here because I was mostly clarifying how I would interpret the sentence if a stranger said it. Alignment is a broad word, and I don't really have the authority to interpret stranger's words in a specific way without accidentally misrepresenting them. For example, one article managed to find six distinct interpretations of the word:

What do you mean with ‘alignment is solvable in principle’?

Remmelt17d10

Thanks, when you say “in the space of possible mathematical things”, do you mean “hypothetically possible in physics” or “possible in the physical world we live in”?

2[anonymous]17d

Possible to be ran on a computer in the actual physical world

What do you mean with ‘alignment is solvable in principle’?

Answer by RemmeltJan 17, 202530

Here's how I specify terms in the claim:

AGI is a set of artificial components, connected physically and/or by information signals over time, to in aggregate sense and act autonomously over many domains.
- 'artificial' as configured out of a (hard) substrate that can be standardised to process inputs into outputs consistently (vs. what our organic parts can do).
- 'autonomously' as continuing to operate without needing humans (or any other species that share a common ancestor with humans).
Alignment is at the minimum the control of the AGI's components (as modifie

... (read more)

Funding Case: AI Safety Camp 11

Remmelt1mo10

Good to know. I also quoted your more detailed remark on AI Standards Lab at the top of this post.

Funding Case: AI Safety Camp 11

Remmelt1mo10

I have made so many connections that have been instrumental to my research.

I didn't know this yet, and glad to hear! Thank you for the kind words, Nell.

AI Safety Camp 10

Remmelt3mo10

Fair question. You can assume it is AoE.

Research leads are not going to be too picky in terms of what hour you send the application in,

There is no need to worry about the exact deadline. Even if you send in your application on the next day, that probably won't significantly impact your chances of getting picked up by your desired project(s).

Sooner is better, since many research leads will begin composing their teams after the 17th, but there is no hard cut-off point.

If we had known the atmosphere would ignite

Remmelt3mo10

Thanks! These are thoughtful points. See some clarifications below:

AGI could be very catastrophic even when it stops existing a year later.

You're right. I'm not even covering all the other bad stuff that could happen in the short-term, that we might still be able to prevent, like AGI triggering global nuclear war.

What I'm referring to is unpreventable convergence on extinction.

If AGI makes earth uninhabitable in a trillion years, that could be a good outcome nonetheless.

Agreed that could be a good outcome if it could be attainable.

In prac... (read more)

If AI is in a bubble and the bubble bursts, what would you do?

Remmelt3mo10

Update: reverting my forecast back to 80% chance likelihood for these reasons.

An AI crash is our best bet for restricting AI

Remmelt3mo70

I'm also feeling less "optimistic" about an AI crash given:

The election result involving a bunch of tech investors and execs pushing for influence through Trump's campaign (with a stated intention to deregulate tech).
A military veteran saying that the military could be holding up the AI industry like "Atlas holding the globe", and an AI PhD saying that hyperscaled data centers, deep learning, etc, could be super useful for war.

I will revise my previous forecast back to 80%+ chance.

If we had known the atmosphere would ignite

Remmelt3mo10

Yes, I agree formalisation is needed. See comment by flandry39 in this thread on how one might go about doing so.

Worth considering is that there are actually two aspects that make it hard to define the term ‘alignment’ such to allow for sufficiently rigorous reasoning:

It must allow for logically valid reasoning (therefore requiring formalisation).
It must allow for empirically sound reasoning (ie. the premises correspond with how the world works).

In my reply above, I did not help you much with (1.). Though even while still using the English lang... (read more)

4harfe3mo

This is maybe not the central point, but I note that your definition of "alignment" doesn't precisely capture what I understand "alignment" or a good outcome from AI to be: AGI could be very catastrophic even when it stops existing a year later. If AGI makes earth uninhabitable in a trillion years, that could be a good outcome nonetheless. I don't know whether that covers "humans can survive on mars with a space-suit", but even then, if humans evolve/change to handle situations that they currently do not survive under, that could be part of an acceptable outcome.

If we had known the atmosphere would ignite

Remmelt3mo10

For an overview of why such a guarantee would turn out impossible, suggest taking a look at Will Petillo's post Lenses of Control.

If we had known the atmosphere would ignite

Remmelt3mo1-2

Defining alignment (sufficiently rigorous so that a formal proof of (im)possibility of alignment is conceivable) is a hard thing!

It's less hard than you think, if you use a minimal-threshold definition of alignment:

That "AGI" continuing to exist, in some modified form, does not result eventually in changes to world conditions/contexts that fall outside the ranges that existing humans could survive under.

1harfe3mo

This is not a formal definition. Your English sentence has no apparent connection to mathematical objects, which would be necessary for a rigorous and formal definition.

If we had known the atmosphere would ignite

Remmelt3mo10

Yes, I think there is a more general proof available. This proof form would combine limits to predictability and so on, with a lethal dynamic that falls outside those limits.

If we had known the atmosphere would ignite

Remmelt3mo10

The question is more if it can ever be truly proved at all, or if it doesn't turn out to be an undecidable problem.

Control limits can show that it is an undecidable problem.

A limited scope of control can in turn be used to prove that a dynamic convergent on human-lethality is uncontrollable. That would be a basis for an impossibility proof by contradiction (cannot control AGI effects to stay in line with human safety).

If we had known the atmosphere would ignite

Remmelt3mo30

Awesome directions. I want to bump this up.

This might include AGI predicting its own future behaviour, which is kind of essential for it to stick to a reliably aligned course of action.

There is a simple way of representing this problem that already shows the limitations.

Assume that AGI continues to learn new code from observations (inputs from the world) – since learning is what allows the AGI to stay autonomous and adaptable in acting across changing domains of the world.

Then in order for AGI code to be run to make predictions about relev... (read more)

If we had known the atmosphere would ignite

Remmelt3mo10

Just found your insightful comment. I've been thinking about this for three years. Some thoughts expanding on your ideas:

my idea is more about whether alignment could require that the AGI is able to predict its own results and effects on the world (or the results and effects of other AGIs like it, as well as humans)...

In other words, alignment requires sufficient control. Specifically, it requires AGI to have a control system with enough capacity to detect, model, simulate, evaluate, and correct outside effects propagated by the AGI's own components.... (read more)

If we had known the atmosphere would ignite

Remmelt3mo10

No actually, assuming the machinery has a hard substrate and is self-maintaining is enough.

If we had known the atmosphere would ignite

Remmelt3mo10

we could create aligned ASI by simulating the most intelligent and moral people

This is not an existence proof, because it does not take into account the difference in physical substrates.

Artificial General Intelligence would be artificial, by definition. In fact, what allows for the standardisation of hardware components is the fact that the (silicon) substrate is hard under human living temperatures and pressures. That allows for configurations to stay compartmentalised and stable.

Human “wetware” has a very different substrate. It’s a soup of bouncing org... (read more)

OpenAI defected, but we can take honest actions

Remmelt3mo10

Just found a podcast on OpenAI’s bad financial situation.

It’s hosted by someone in AI Safety (Jacob Haimes) and an AI post-doc (Igor Krawzcuk).

https://kairos.fm/posts/muckraiker-episodes/muckraiker-episode-004/