Lucius, the text exchanges I remember us having during AISC6 was about the question whether 'ASI' could control comprehensively for evolutionary pressures it would be subjected to. You and I were commenting on a GDoc with Forrest. I was taking your counterarguments against his arguments seriously – continuing to investigate those counterarguments after you had bowed out.
You held the notion that ASI would be so powerful that it could control for any of its downstream effects that evolution could select for. This is a common opinion held in the community. Bu...
I agree that Remmelt seems kind of like he has gone off the deep end
Could you be specific here?
You are sharing a negative impression ("gone off the deep end"), but not what it is based on. This puts me and others in a position of not knowing whether you are e.g. reacting with a quick broad strokes impression, and/or pointing to specific instances of dialogue that I handled poorly and could improve on, and/or revealing a fundamental disagreement between us.
For example, is it because on Twitter I spoke up against generative AI models that harm communi...
I think many people have given you feedback. It is definitely not because of "strategic messaging". It's because you keep making incomprehensible arguments that don't make any sense and then get triggered when anyone tries to explain why they don't make sense, while making statements that are wrong with great confidence.
As is, this is dissatisfying. On this forum, I'd hope[1] there is a willingness to discuss differences in views first, before moving to broadcasting subjective judgements[2] about someone.
People have already spent many hours givin...
For example, it might be the case that, for some reason, alignment would only have been solved if and only if Abraham Lincoln wasn't assassinated in 1865. That means that humans in 2024 in our world (where Lincoln was assasinated in 1865) will not be able to solve alignment, despite it being solvable in principle.
With this example, you might still assert that "possible worlds" are world states reachable through physics from past states of the world. Ie. you could still assert that alignment possibility is path-dependent from historical world states.
But you...
Here's how I specify terms in the claim:
Fair question. You can assume it is AoE.
Research leads are not going to be too picky in terms of what hour you send the application in,
There is no need to worry about the exact deadline. Even if you send in your application on the next day, that probably won't significantly impact your chances of getting picked up by your desired project(s).
Sooner is better, since many research leads will begin composing their teams after the 17th, but there is no hard cut-off point.
Thanks! These are thoughtful points. See some clarifications below:
AGI could be very catastrophic even when it stops existing a year later.
You're right. I'm not even covering all the other bad stuff that could happen in the short-term, that we might still be able to prevent, like AGI triggering global nuclear war.
What I'm referring to is unpreventable convergence on extinction.
If AGI makes earth uninhabitable in a trillion years, that could be a good outcome nonetheless.
Agreed that could be a good outcome if it could be attainable.
In prac...
I'm also feeling less "optimistic" about an AI crash given:
I will revise my previous forecast back to 80%+ chance.
Yes, I agree formalisation is needed. See comment by flandry39 in this thread on how one might go about doing so.
Worth considering is that there are actually two aspects that make it hard to define the term ‘alignment’ such to allow for sufficiently rigorous reasoning:
In my reply above, I did not help you much with (1.). Though even while still using the English lang...
For an overview of why such a guarantee would turn out impossible, suggest taking a look at Will Petillo's post Lenses of Control.
Defining alignment (sufficiently rigorous so that a formal proof of (im)possibility of alignment is conceivable) is a hard thing!
It's less hard than you think, if you use a minimal-threshold definition of alignment:
That "AGI" continuing to exist, in some modified form, does not result eventually in changes to world conditions/contexts that fall outside the ranges that existing humans could survive under.
The question is more if it can ever be truly proved at all, or if it doesn't turn out to be an undecidable problem.
Control limits can show that it is an undecidable problem.
A limited scope of control can in turn be used to prove that a dynamic convergent on human-lethality is uncontrollable. That would be a basis for an impossibility proof by contradiction (cannot control AGI effects to stay in line with human safety).
Awesome directions. I want to bump this up.
This might include AGI predicting its own future behaviour, which is kind of essential for it to stick to a reliably aligned course of action.
There is a simple way of representing this problem that already shows the limitations.
Assume that AGI continues to learn new code from observations (inputs from the world) – since learning is what allows the AGI to stay autonomous and adaptable in acting across changing domains of the world.
Then in order for AGI code to be run to make predictions about relev...
Just found your insightful comment. I've been thinking about this for three years. Some thoughts expanding on your ideas:
my idea is more about whether alignment could require that the AGI is able to predict its own results and effects on the world (or the results and effects of other AGIs like it, as well as humans)...
In other words, alignment requires sufficient control. Specifically, it requires AGI to have a control system with enough capacity to detect, model, simulate, evaluate, and correct outside effects propagated by the AGI's own components....
we could create aligned ASI by simulating the most intelligent and moral people
This is not an existence proof, because it does not take into account the difference in physical substrates.
Artificial General Intelligence would be artificial, by definition. In fact, what allows for the standardisation of hardware components is the fact that the (silicon) substrate is hard under human living temperatures and pressures. That allows for configurations to stay compartmentalised and stable.
Human “wetware” has a very different substrate. It’s a soup of bouncing org...
Noticing no response here after we addressed superficial critiques and moved to discussing the actual argument.
For those few interested in questions raised above, Forrest wrote some responses: http://69.27.64.19/ai_alignment_1/d_241016_recap_gen.html
The claims made will feel unfamiliar and the reasoning paths too. I suggest (again) taking the time to consider what is meant. If a conclusion looks intuitively wrong from some AI Safety perspective, it may be valuable to explicitly consider the argumentation and premises behind that.
BTW if anyone does want to get into the argument, Will Petillo’s Lenses of Control post is a good entry point.
It’s concise and correct – a difficult combination to achieve here.
Appreciating your inquisitive question!
One way to think about it:
For OpenAI to scale more toward “AGI”, the corporation needs more data, more automatable work, more profitable uses for working machines, and more hardware to run those machines.
If you look at how OpenAI has been increasing those four variables, you can notice that there are harms associated with each. This tends to result in increasing harms.
One obvious example: if they increase hardware, this also increases pollution (from mining, producing, installing, and running the hardware)...
what signals you send to OAI execs seems not relevant.
Right, I don’t occupy myself much with what the execs think. I do worry about stretching the “Overton window” for concerned/influential stakeholders broadly. Like, if no-one (not even AI Safety folk) acts to prevent OpenAI from continuing to violate its charter, then everyone kinda gets used to it being this way and maybe assumes it can’t be helped or is actually okay.
...i don't see why this would lead them to downsize, if "the gap between industry investment in deep learning and actual revenue has balloon
Donation opportunities for restricting AI companies
When you say failures will "build up toward lethality at some unknown rate", why would failures build up toward lethality? We have lots of automated systems e.g. semiconductor factories, and failures do not accumulate until everyone at the factory dies, because humans and automated systems can notice errors and correct them.
Let's take your example of semiconductor factories.
There are several ways to think about failures here. For one, we can talk about local failures in the production of the semiconductor chips. These especially will get corrected for.
A le...
I agree that with superficial observations, I can't conclusively demonstrate that something is devoid of intellectual value.
Thanks for recognising this, and for taking some time now to consider the argument.
However, the nonstandard use of words like "proof" is a strong negative signal on someone's work.
Yes, this made us move away from using the term “proof”, and instead write “formal reasoning”.
Most proofs nowadays are done using mathematical notation. So it is understandable that when people read “proof”, they automatically think “...
How about I assume there is some epsilon such that the probability of an agent going off the rails
Got it. So we are both assuming that there would be some accumulative failure rate [per point 3.].
Why can't the agent split into multiple ~uncorrelated agents and have them each control some fraction of resources (maybe space) such that one off-the-rails agent can easily be fought and controlled by the others?
I tried to adopt this ~uncorrelated agents framing, and then argue from within that. But I ran up against some problems with this framing:
I usually do not take authorities too seriously before I understand their reasoning in a particular question. And understanding a person's reasoning may occasionally mean that I disagree in particular points as well. In my experience, even the most respectful people are still people, which means they often think in messy ways and they are good just on average
Right – this comes back to actually examining people’s reasoning.
Relying on the authority status of an insider (who dismissed the argument) or on your ‘crank vibe’ of the outsider (who made the a...
As I understand the issue, the case for barricading AI rests on:
Great list! Basically agreeing with the claims under 1. and the structure of what needs to be covered under 2.
Meanwhile, the value of disruptive protest is left to the reader to determine.
You're right. Usually when people hear about a new organisation on the forum, they expect some long write-up of the theory of change and the considerations around what to prioritise.
I don't think I have time right now for writing a neat public write-up. This is just me being realistic...
So it's the AI being incompetent?
Yes, but in the sense that there are limits to the AGI's capacity to sense, model, simulate, evaluate, and correct own component effects propagating through a larger environment.
You don't have to simulate something to reason about it.
If you can't simulate (and therefore predict) that a failure mode that by default is likely to happen would happen, then you cannot counterfactually act to prevent the failure mode.
...You could walk me though how one of these theorems is relevant to capping self-improvement of reliabil
claiming to have a full mathematical proof that safe AI is impossible,
I have never claimed that there is a mathematical proof. I have claimed that the researcher I work with has done their own reasoning in formal analytical notation (just not maths). Also, that based on his argument – which I probed and have explained here as carefully as I can – AGI cannot be controlled enough to stay safe, and actually converges on extinction.
That researcher is now collaborating with Anders Sandberg to formalise an elegant model of AGI uncontainability in mathematical no...
Let me recheck the AI Impacts paper.
I definitely made a mistake in quickly checking that number shared by colleague.
The 2023 AI Impacts survey shows a mean risk of 14.4% for the question “What probability do you put on future AI advances causing human extinction or similarly permanent and severe disempowerment of the human species within the next 100 years?”.
Whereas the other smaller sample survey gives a median estimate of 30%
I already thought using those two figures as a range did not make sense, but putting a mean and a median in the same range i...
Thanks, as far as I can this is a mix of critiques of strategic approach (fair enough), about communication style (fair enough), and partial misunderstandings of the technical arguments.
instead of a succession of events which need to go your way, I think you should aim for incremental marginal gains. There is no cost-effectiveness analysis…
I agree that we should not get hung up on a succession of events to go a certain way. IMO, we need to get good at simultaneously broadcasting our concerns in a way that’s relatable to other concerned communities, a...
An obvious consideration for any reflective agent is to find ways to reduce the risk of goal-related failure.
…
by "goal-related systems" I just meant whatever is keeping track of the outcomes being optimized for.
So the argument for 3. is that just by AGI continuing to operate and maintain its components as adapted to a changing environment, the machinery can accidentally end up causing destabilising effects that were untracked or otherwise insufficiently corrected for.
You could call this a failure of the AGI’s goal-related systems if you mean with tha...
Even if you know a certain market is a bubble, it's not exactly trivial to exploit if you don't know when it's going to burst, which prices will be affected, and to what degree. "The market can remain irrational longer than you can remain solvent" and all that.
Yes, all of this. I didn’t know how to time this, and also good point that operationalising it in terms of AI stocks to target at what strike price could be tricky too.
If there's less demand from cloud users to rent GPU's Google/Microsoft/Amazon would likely use the GPU's in their datacenters for their own projects (or projects like Antrophic/OpenAI).
That’s a good point. Those big tech companies are probably prepared to pay for the energy use if they have the hardware lying around anyway.
To clarify for future reference, I do think it’s likely (80%+) that at some point over the next 5 years there will be a large reduction in investment in AI and a corresponding market crash in AI company stocks, etc, and that both will continue to be for at least three months.
Update: I now think this is 90%+ likely to happen (from original prediction date).
Good to know that this is why you think AI Safety Camp is not worth funding.
Once a core part of the AGI non-safety argument is put into maths to be comprehensible for people in your circle, it’d be interesting to see how you respond then.