Hmm, "AI war makes s-risks more likely" seems plausible, but compared to what? If we were given a divine choice was between a non-aligned/aligned AI war, or a suffering-oriented singleton, wouldn't we choose the war? Maybe more likely relative to median/mean scenarios, but that seems hard to pin down.
Hmm, I thought I put a reference to the DoD's current Replicator Initiative into the post, but I can't find it: I must have moved it out? Still, yes, we're moving towards automated war fighting capability.
The post setup skips the "AIs are loyal to you" bit, but it does seem like this line of thought broadly aligns with the post.
I do think this does not require ASI, but I would agree that including it certainly doesn't help.
Some logical nits:
Let's say there's a illiterate man that lives a simple life, and in doing so just happens to follow all the strictures of the law, without ever being able to explain what the law is. Would you say that this man understands the law?
Alternatively, let's say there is a learned man that exhaustively studies the law, but only so he can bribe and steal and arson his way to as much crime as possible. Would you say that this man understands the law?
I would say that it is ambiguous whether the 1st man understands the law; maybe? kind of? you could make an argument ...
This makes much more sense: when I was reading from your post lines like "[LLMs] understand human values and ethics at a human level", this is easy to read as "because LLMs can output an essay on ethics, those LLMs will not do bad things". I hope you understand why I was confused; maybe you should swap "understand ethics" for something like "follow ethics"/"display ethical behavior"? And maybe try not to stick a mention of "human uploads" (which presumably do have real understanding) right before this discussion?
And responding to your clarification, I expe...
I'd agree that the arguments I raise could be addressed (as endless arguments attest) and OP could reasonably end up with a thesis like "LLMs are actually human aligned by default". Putting my recommendation differently, the lack of even a gesture towards those arguments almost caused me to dismiss the post as unserious and not worth finishing.
I'm somewhat surprised, given OP's long LW tenure. Maybe this was written for a very different audience and just incidentally posted to LW? Except the linkpost tagline focuses on the 1st part of the post, not the 2nd, implying OP thought this was actually persuasive?! Is OP failing an intellectual Turing test or am I???
The post seems to make an equivalence between LLMs understanding ethics and caring about ethics, which does not clearly follow (I can study Buddhist ethics without caring about following it). We could cast RLHF as training LLMs into caring about some sort of ethics, but then jailbreaking becomes a bit of a thorny question. Alternatively, why do we assume training the appearance of obedience is enough when you start scaling LLMs?
There are other nitpicks I will drop in short form: why assume "superhuman levels of loyalty" in upgraded LLMs? Why implicitly ass...
Hello kgldeshapriya, welcome to LessWrong!
At first I thought that the OTP chips would be locked to a single program, which would make it infeasible since programs need to be updated regularly, but it sounds like the OTP chip is either on the control plane above the CPU/GPU, or physically passes CPU signals through it, so it can either kill power to the motherboard, or completely sever CPU processing. I'll assume one of these schemes is how you'd use the OTP chips.
I agree with JBlack that LW probably already has details on why this wouldn't work, but I'll f...
I almost missed that there's new thoughts here, I thought this was a rehash of your previous post The AI Apocalypse Myth!
The new bit sounds similar to Elon Musk's curious AI plan. I think this means it has a similar problem: humans are complex and a bounty of data to learn about, but as the adage goes, "all happy families are alike; each unhappy family is unhappy in its own way." A curious/learning-first AI might make many discoveries about happy humans while it is building up power, and then start putting humans in a greater number of awful but novel and ...
Thoughts on the different sub-questions, from someone that doesn't professionally work in AI safety:
I think the 1st argument proves too much - I don't think we usually expect simulations to never work unless otherwise proven? Maybe I'm misunderstanding your point? I agree with Vaughn downvotes assessment; maybe more specific arguments would help clarify your position (like, to pull something out of by posterior, "quantization of neuron excitation levels destroys the chaotic cascades necessary for intelligence. Also, chaos is necessary for intelligence because...").
To keep things brief, the human intelligence explosion seems to require open brain surgery to re-arrange neurons, which seems a lot more complicated than flipping bits in RAM.
Interesting, so maybe a more important crux between us is whether AI would have empathy for humans. You seem much more positive about AI working with humanity past the point that AI no longer needs humanity.
Some thoughts:
I'm going to summarize what I understand to be your train of thought, let me know if you disagree with my characterization, or if I've missed a crucial step:
I think other comments have addressed the 1st point. To throw in yet another analo...
I was recently experimenting in extreme amounts of folding (LW linkpost): I'd be interested to hear from Chris whether he thinks this is too much folding?