All of thenoviceoof's Comments + Replies

I was recently experimenting in extreme amounts of folding (LW linkpost): I'd be interested to hear from Chris whether he thinks this is too much folding?

2Chris_Leong
I think it depends on the audience. That level of collapsible sections is too much for a more "normy" audience, but there will be some folks who love it.

Hmm, "AI war makes s-risks more likely" seems plausible, but compared to what? If we were given a divine choice was between a non-aligned/aligned AI war, or a suffering-oriented singleton, wouldn't we choose the war? Maybe more likely relative to median/mean scenarios, but that seems hard to pin down.

Hmm, I thought I put a reference to the DoD's current Replicator Initiative into the post, but I can't find it: I must have moved it out? Still, yes, we're moving towards automated war fighting capability.

2avturchin
After the AI war, there will be one AI winner and Singleon, which has all the same risk of causing s-risks, at first approximation. So AI war just adds probability to any s-risk chance from Singleton. 

The post setup skips the "AIs are loyal to you" bit, but it does seem like this line of thought broadly aligns with the post.

I do think this does not require ASI, but I would agree that including it certainly doesn't help.

Some logical nits:

  • Early on you mention physical attacks to destroy offline backups; these attacks would be highly visible and would contradict the dark forest nature the scenario.
  • Perfect concealment and perfect attacks are in tension. The AI supposedly knows the structure and vulnerabilities of the systems hosting an enemy AI, but finding these things out for sure requires intrusion, which can be detected. The AI can hold off on attacking and work off of suppositions, but then the perfect attack is not guaranteed, and the attack could fail due to unknowns.
... (read more)
1funnyfranco
The physical attacks may be highly visible, but not their source. An AGI could deploy autonomous agents with no clear connection back to it, manipulate human actors without them realising, or fabricate intelligence to create seemingly natural accidents. The AGI itself remains invisible. While this increases the visibility of an attack, it does not expose the AGI. It wouldn't be a visible war - more like isolated acts of sabotage. Good point to raise, though. You bring up manoeuvre warfare, but that assumes AI operates under constraints similar to human militaries. The reason to prefer perfect, deniable strikes is that failure in an early war phase means immediate extinction for the weaker AGI. Imperfect attacks invite escalation and countermeasures - if AGI Alpha attacks AI Bravo first but fails, it almost guarantees its own destruction. In human history, early aggression sometimes works - Pearl Harbour, Napoleon's campaigns - but other times it leads to total defeat - Germany in WW2, Saddam Hussein invading Kuwait. AIs wouldn’t gamble unless they had no choice. A first strike is only preferable when not attacking is clearly worse. Of course, if an AGI assesses that waiting for the perfect strike gives its opponent an insurmountable edge, it may attack earlier, even if imperfectly. But unless forced, it will always prioritise invisibility. This difference in strategic incentives is why AGI war operates under a different logic than human conflicts, including nuclear deterrence. The issue with the US nuking other nations is that nuclear war is catastrophically costly - even for the "winner." Beyond the direct financial burden, it leads to environmental destruction, diplomatic fallout, and increased existential risk. The deterrent is that nobody truly wins. An AGI war is entirely different: there is no environmental, economic, or social cost - only a resource cost, which is negligible for an AGI. More importantly, eliminating competition provides a definitive strateg

Let's say there's a illiterate man that lives a simple life, and in doing so just happens to follow all the strictures of the law, without ever being able to explain what the law is. Would you say that this man understands the law?

Alternatively, let's say there is a learned man that exhaustively studies the law, but only so he can bribe and steal and arson his way to as much crime as possible. Would you say that this man understands the law?

I would say that it is ambiguous whether the 1st man understands the law; maybe? kind of? you could make an argument ... (read more)

This makes much more sense: when I was reading from your post lines like "[LLMs] understand human values and ethics at a human level", this is easy to read as "because LLMs can output an essay on ethics, those LLMs will not do bad things". I hope you understand why I was confused; maybe you should swap "understand ethics" for something like "follow ethics"/"display ethical behavior"? And maybe try not to stick a mention of "human uploads" (which presumably do have real understanding) right before this discussion?

And responding to your clarification, I expe... (read more)

2Roko
What is the difference between these two? This sounds like a distinction without a difference

I'd agree that the arguments I raise could be addressed (as endless arguments attest) and OP could reasonably end up with a thesis like "LLMs are actually human aligned by default". Putting my recommendation differently, the lack of even a gesture towards those arguments almost caused me to dismiss the post as unserious and not worth finishing.

I'm somewhat surprised, given OP's long LW tenure. Maybe this was written for a very different audience and just incidentally posted to LW? Except the linkpost tagline focuses on the 1st part of the post, not the 2nd, implying OP thought this was actually persuasive?! Is OP failing an intellectual Turing test or am I???

6Noosphere89
I agree with you that it is quite bad that Roko didn't attempt to do this, and my steelmanning doesn't change the fact that the original argument is quite bad, and should be shored up.

The post seems to make an equivalence between LLMs understanding ethics and caring about ethics, which does not clearly follow (I can study Buddhist ethics without caring about following it). We could cast RLHF as training LLMs into caring about some sort of ethics, but then jailbreaking becomes a bit of a thorny question. Alternatively, why do we assume training the appearance of obedience is enough when you start scaling LLMs?

There are other nitpicks I will drop in short form: why assume "superhuman levels of loyalty" in upgraded LLMs? Why implicitly ass... (read more)

8Roko
I think you don't understand what an LLM is. When the LLM produces a text output like "Dogs are cute", it doesn't have some persistent hidden internal state that can decide that dogs are actually not cute but it should temporarily lie and say that they are cute. The LLM is just a memoryless machine that produces text. If it says "dogs are cute" and that's the end of the output, then that's all there is to it. Nothing is saved, the weights are fixed at training time and not updated at inference time and the neuron activations are thrown away at the of the inference computation. If you can get (using RLHF) an LLM to output text that consistently reflects human value judgements, then it is by definition "aligned". It really cares, in the only way it is possible for a text generator to care.
3Noosphere89
It's correct that understanding a value!= caring about the value in the general case, and this definitely should be fixed, but I think the defensible claim here is that the data absolutely influence which values you eventually adopt, and we do have ways to influence what an LLM values just by changing their datasets. As far as why we should assume superhuman levels of loyalty, the basic answer is that the second species arguments relies on premises that are crucially false for the AI case. The big reason why gorillas/chimpanzees lost out and got brutally killed by humans when we dominated is because of us being made out of a ridiculously sparse RL process, which means we had barely any alignment effort by evolution or genetically close to human species and more importantly there was no gorilla/chimpanzee alignment effort at all, nor did they have the tools to control what our data sources are, unlike in the AI case where we both have way denser feedback and more control over their data sources, and we also have help from SGD for any inner alignment issue, which is way more powerful as an optimizer than evolution/natural selection, mostly due to not having very exploitable hacks.

Hello kgldeshapriya, welcome to LessWrong!

At first I thought that the OTP chips would be locked to a single program, which would make it infeasible since programs need to be updated regularly, but it sounds like the OTP chip is either on the control plane above the CPU/GPU, or physically passes CPU signals through it, so it can either kill power to the motherboard, or completely sever CPU processing. I'll assume one of these schemes is how you'd use the OTP chips.

I agree with JBlack that LW probably already has details on why this wouldn't work, but I'll f... (read more)

I almost missed that there's new thoughts here, I thought this was a rehash of your previous post The AI Apocalypse Myth!

The new bit sounds similar to Elon Musk's curious AI plan. I think this means it has a similar problem: humans are complex and a bounty of data to learn about, but as the adage goes, "all happy families are alike; each unhappy family is unhappy in its own way." A curious/learning-first AI might make many discoveries about happy humans while it is building up power, and then start putting humans in a greater number of awful but novel and ... (read more)

Thoughts on the different sub-questions, from someone that doesn't professionally work in AI safety:

  • "Who is responsible?" Legally, no one has this responsibility (say, in the same way that the FDA is legally responsible for evaluating drugs). Hopefully in the near future, if you're in the UK the UK AI task force will be competent and have jurisdiction/a mandate to do so, and even more hopefully more countries will have similar organizations (or an international organization exists).
  • Alternative "responsible" take: I'm sure if you managed to get the attentio
... (read more)
1MiguelDev
Unfortunately, I'm not based in the UK. However, the UK government's prioritization of the alignment problem is commendable, and I hope their efforts continue to yield positive results.     Unfortunately, I'm not based in the UK. However, the UK government's prioritization of the alignment problem is commendable, and I hope their efforts continue to yield positive results. (Are we attempting to identify a trusted mediator? Are we seeking individuals competent enough for evaluation? Or are we trying to establish a mechanism to assign accountability should things go awry? Ideally, all these roles would be fulfilled by the same entity or individual, but it's not necessarily the case.) I understand your point, but it seems that we need a specific organization or team designed for such operations. Why did I pose the question initially? I've developed a prototype for a shutdown mechanism, which involves a potentially hazardous step. This prototype requires assessment by a reliable and skilled team. From my observations of discussions on LW, it appears there's a "clash of agendas" that takes precedence over the principle of "preserving life on earth." Consequently, this might not be the right platform to share anything of a hazardous nature. Thank you for taking the time to respond to my inquiry.

I think the 1st argument proves too much - I don't think we usually expect simulations to never work unless otherwise proven? Maybe I'm misunderstanding your point? I agree with Vaughn downvotes assessment; maybe more specific arguments would help clarify your position (like, to pull something out of by posterior, "quantization of neuron excitation levels destroys the chaotic cascades necessary for intelligence. Also, chaos is necessary for intelligence because...").

To keep things brief, the human intelligence explosion seems to require open brain surgery to re-arrange neurons, which seems a lot more complicated than flipping bits in RAM.

1Amadeus Pagel
We usually use the term simulation to refer to models that are meant to help us understand something, maybe even to make predictions, but not to replace what is supposed to be simulated. Yes, this is one of the many differences between the brain and the computer, and given so many differences we simply can't conclude from any attribute of a brain that a computer with the same attribute is possible.

Interesting, so maybe a more important crux between us is whether AI would have empathy for humans. You seem much more positive about AI working with humanity past the point that AI no longer needs humanity.

Some thoughts:

  • "as intelligence scales beings start to introspect and contemplate... the existing of other beings." but the only example we have for this is humans. If we scaled octopus intelligence, which are not social creatures, we might have a very different correlation here (whether or not any given neural network is more similar to a human or an oc
... (read more)

I'm going to summarize what I understand to be your train of thought, let me know if you disagree with my characterization, or if I've missed a crucial step:

  • No supply chains are fully automated yet, so AI requires humans to survive and so will not kill them.
  • Robotics progress is not on a double exponential. The implication here seems to be that there needs to be tremendous progress in robotics in order to replace human labor (to the extent needed in an automated supply chain).

I think other comments have addressed the 1st point. To throw in yet another analo... (read more)

1Spiritus Dei
I think robotics will eventually be solved but on a much longer time horizon. Every existence proof is in a highly controlled environment -- especially the "lights out" examples. I know Tesla is working on it, but that's a good example of the difficulty level. Elon is famous for saying next year it will be solved and now he says there are a lot of "false dawns".  For AIs to be independent of humans it will take a lot of slow moving machinary in the 3D world which might be aided by smart AIs in the future, but it's still going to be super slow compared to the advances they will make via compute scaling and algorithmic improvements which take place in the cloud.  And now I'm going to enter speculative fiction zone (something I wish more AI doomers would admit they're doing) -- I assume the most dangerous point in the interactions between AIs and humans is when their intelligence and conscious levels are close to equal. I make this assumption since I assume lower IQ and conscious beings are much more likely to make poor or potentially irrational decisions. That doesn't mean a highly intelligent being couldn't be psychotic, but we're already seeing a huge numbers of AIs deploy so they will co-exist  within an AI ecosystem. We're in the goldilocks zone where AI and human intelligence are close to each other, but that moment is quickly fading away. If AIs were not in a symbiotic relationship with humans during this periond then some of the speculative fiction by the AI doomers might be more realistic. And I believe that they will reach a point that they no longer require humans, just like when a child becomes independent of its parents. AI doomers would have us believe that the most obvious next step for the child that is superhuman in intelligence and consciousness would be to murder the parents. That only makes sense if it's a low-IQ character in a sci-fi novel.  If they said they are going to leave Earth and explore the cosmos. Okay, that is believable. Perhaps th