All of Peter Twieg's Comments + Replies

I agree that there seems to be a lot of handwaving about the nanotech argument, but I can't say that I agree here:

>But for the sake of argument, let's say that the AGI does manage to create a nanotech factory, retain control, and still remain undetected by the humans. 

>It doesn't stay undetected long enough to bootstrap and mass produce human replacement infrastructure. 

It seems like the idea is that the AI would create nanomachines that it could host itself on while starting to grey goo enough of the Earth to overtake humanity. While human... (read more)

>then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon.

Yes, but do I take it for granted that an AI will be able to manipulate the human into creating a virus that will kill literally everyone on Earth, or at least a sufficient number to allow the AI to enact some secondary plans to take over the world? Without being detected? Not with anywhere near 100% probability. I just think these sorts of arguments should be subject to Drake equation-style reasonings that will dilute the likelihood of doom under most circumstances.

This isn't an argument for being complacent. But it does allow us to push back against the idea that "we only have one shot at this."

3Rob Bensinger
You have to weigh the conjunctive aspects of particular plans against the disjunctiveness of 'there are many different ways to try to do this, including ways we haven't thought of'.
8CronoDAS
I mean, the human doesn't have to know that it's creating a doomsday virus. The AI could be promising it a cure for his daughter's cancer, or something.

I outlined my expectations, not a "plan".

>You lack imagination, its painfully easy, also cost + required IQ has been dropping steadily every year.

Conversely, it's possible that doomers are suffering from an overabundance of imagination here. To be a bit blunt, I don't take it for granted that an arbitrarily smart AI would be able to manipulate a human into developing a supervirus or nanomachines in a risk-free fashion.

The fast takeoff doom scenarios seem like they should be subject to Drake equation-style analyses to determine P(doom). Even if we develo... (read more)

2CronoDAS
Well, as Eliezer said, today you can literally order custom DNA strings by email, as long as they don't match anything in the "known dangerous virus" database. And the AIs task is a little easier than you might suspect, because it doesn't need to be able to fool everyone into doing arbitrary weird stuff, or even most people. If it can do ordinary Internet things like "buy stuff on Amazon.com", then it just needs to find one poor schmuck to accept deliveries and help it put together its doomsday weapon.
1JNS
How did you reach that conclusion? What does that ontology look like? What is your p(doom)? Is that acceptable? If yes, why is it acceptable? If no, what is the acceptable p(doom)?

>The likely result of humanity facing down an opposed superhuman intelligence is a total loss. Valid metaphors include “a 10-year-old trying to play chess against Stockfish 15”, “the 11th century trying to fight the 21st century,” and “Australopithecus trying to fight Homo sapiens“.

But obviously these metaphors are not very apt, since humanity kinda has a massive incumbent advantage that would need to be overcome. Rome Sweet Rome is a fun story not because 21st century soldiers and Roman legionnaires are intrinsically equals but because the technologica... (read more)

JNS118

I just want to be clear I understand your "plan".

We are going to build a powerful self-improving system, and then let it try end humanity with some p(doom)<1 (hopefully) and then do that iteratively?

My gut reaction to a plan like that looks like this "Eff you. You want to play Russian roulette, fine sure do that on your own. But leave me and everyone else out of it"

AI will be able to invent highly-potent weapons very quickly and without risk of detection, but it seems at least pretty plausible that...... this is just too difficult

You lack imagination, i... (read more)

7CronoDAS
Remember when some people, in order to see what would happen, modified a "drug discovery" AI system to search for maximally toxic molecules instead of minimizing toxicity and it ended up "inventing" molecules very similar to VX nerve gas?

>As it turns out, the only thing that matters was scale.

I mean, in some sense yes. But AlphaGo wasn't trained by finding a transcript of every Go game that had ever been played, but instead was trained via self-play RL. But attempts to create general game-playing agents via similar methods haven't worked out very well, in my understanding. I don't assume that if we just threw 10x or 100x data at them that this would change...

>The architecture that can play 100 games and does extremely well at game 101 the first try gets way more points than one that ... (read more)

2[anonymous]
Gpt-4 did RL feedback that was self evaluation across all the inputs users fed by chatGPT. Self play would be having it practice leetcode problems with the RL feedback the score. The software support is there and the RL feedback worked, why do you think it is even evidence to say "obvious thing that works well hasn't been done yet or maybe it has, openAI won't say" There is also a tremendous amount of self play possible now with the new plugin interface.

>The finished system should be able to extend shoggoth tentacles into a given computer, identify what that computer is doing and make it do it better or differently.

Sure. GPT-X will probably help optimize a lot of software. But I don't think having more resource efficiency should be assumed to lead to recursive self-improvement beyond where we'd be at given a "perfect" use of current software tools. Will GPT-X be able to break out of those current set of tools, only having been trained to complete text and not to actually optimize systems? I don't take this for granted, and my view is that LLMs are unlikely to devise radically new software architectures on their own.

1anithite
<rant>It really pisses me off that the dominant "AI takes over the world" story is more or less "AI does technological magic". Nanotech assemblers, superpersuasion, basilisk hacks and more. Skeptics who doubt this are met with "well if it can't it just improves itself until it can". The skeptics obvious rebuttal that RSI seems like magic too is not usually addressed.</rant> Note:RSI is in my opinion an unpredictable black swan. My belief is RSI will yield somewhere between 1.5-5x speed improvement to a nascent AGI from improvements in GPU utilisation and sparsity/quantisation, requiring significant cognition spent to achieve speedups. AI is still dangerous in worlds where RSI does not occur. Self play generally gives superhuman performance(GO,chess, etc.) even in more complicated imperfect information games (DOTA, Starcraft). Turning a field of engineering into a self-playable game likely leads to (superhuman(80%),Top-human equiv(18%),no change(2%)) capabilities in that field. Superhuman or top-human software engineering (vulnerability discovery and programming) is one relatively plausible path to AI takeover. https://googleprojectzero.blogspot.com/2023/03/multiple-internet-to-baseband-remote-rce.html Can an AI take over the world if it can?: * do end to end software engineering * find vulnerabilities about as well as the researchers at project zero * generate reasonable plans on par with a +1sd int human (IE:not hollywood style movie plots like GPT-4 seems fond of) AI does not need to be even superhuman to be an existential threat. Hack >95% of devices, extend shoggoth tentacles, hold all the data/tech hostage, present as not skynet so humans grudgingly cooperate, build robots to run economy(some humans will even approve of this), kill all humans, done. That's one of the easier routes assuming the AI can scale vulnerability discovery. With just software engineering and a bit of real world engineering(potentially outsourceable) other violent/coercive optio

Sure, this is useful. To your other posts, I don't think we're really disagreeing about what AGI is - I think we'd agree that if you took a model with GPT4-like capabilities and hooked it up to a chess API to reinforce it you would end up with a GPT4 model that's very good at playing chess, not something that has strongly-improved its general underlying world model and thus would also be able to say improve its LSAT score. And this is what I'm imaging most self-play training would accomplish... but I'm open to being wrong. To your point about having a "ben... (read more)

2[anonymous]
To your point about having a "benchmark of many tasks", I guess maybe I could imagine hooking it up to like 100 different self-playing games which are individually easy to run but require vastly different skills to master, but I could also see this just... not working as well. Teams have been trying this for a decade or so already, right? A breakthrough is possible though for sure. No, nobody has been trying anything for decades that matters.  As it turns out, the only thing that matters was scale.  So there are 3 companies that had enough money for scale, and they are the only efforts that count, and all combined have done a small enough number of full scale experiments you can count them up with 2 hands.  @gwern has expressed the opinion that we probably didn't even need the transformer, other neural networks likely would have worked at these scales. As for the rest of it, no, we're saying at massive scales, we abdicate trying to understand AGI architectures - since they are enormously complex and coupled machined - and just iteratively find some that work by trial and error. "work" includes generality.  The architecture that can play 100 games and does extremely well at game 101 the first try gets way more points than one that doesn't.  The one that has never read a book on the topic of the LSAT but still does well on the exam is exactly what we are looking for.  (though this can be tough to filter since obviously it's simply easier to train on all text in existence).   One that has controlled a robot to manipulate fine wire and many object manip tasks, and one that has passed the exams for a course on electronics, and then first try builds a working circuit in a simulated world is what we're looking for.  So more points on that. That's the idea.  Define what we want the machine to do and what we mean by "generality", iterate over the search space a very large number of times.  In an unbiased way, pick the most distinct n winners and have those winners propo

I'm asking specifically about the assertion that "RL style self play" could be used to iterate to AGI. I don't see what sort of game could lead to this outcome. You can't have this sort of self-play with "solve this math problem" as far as I can tell, and even if you could I don't see why it would promote AGI as opposed to something that can solve a narrow class of math problems.

Obviously LLMs have amazing generalist capabilities. But as far as I can tell you can't iterate on the next version of these models by hooking them up to some sort of API that prov... (read more)

2[anonymous]
Anyways here's how to get an AGI this way : https://www.lesswrong.com/posts/Aq82XqYhgqdPdPrBA/full-transcript-eliezer-yudkowsky-on-the-bankless-podcast?commentId=Mvyq996KxiE4LR6ii This will work, the only reason it won't get used is it is possibly not the computationally cheapest option.  (this proposal is incredibly expensive for compute unless we do a lot of reuse of components between iterations).   Whether you consider a machine that has a score heuristic that forces generality by negatively weighting complex specialized architectures and heavily waiting zero shot multimodal/multi-skill tasks, and is able to do hundreds of thousands of tasks an "AGI" is up to your definition. Since the machine would be self replicating and capable of all industrial, construction, driving, logistics, software writing tasks  - all things that conveniently fall into the scope of 'can be objectively evaluated' I say it's an AGI.  It's capable of everything needed to copy itself forever and to self improve, it's functionally a sentient new civilization.  The things you mentioned - like beating GRRM at writing a good story - do not matter.
2[anonymous]
You can connect them to such an API and it's not hard and we already have the things to make the API and you can start with llms. It's a fairly simple recursive bench and obvious. Main limit is just money.
2[anonymous]
I think you need to define what you think AGI is first. I think with a reasonable, grounded, and measurable version of AGI it is trivial to do with self play. Please tell me what you think AGI means. I don't think it matters if there are subjective things the AGI can't do well.

RL isn't magic though. It works in the Go case because we can simulate Go games quickly and easily score the results and then pit adversarial AIs against eachother in order to iteratively learn.

I don't think this sort of process lends itself to the sort of tasks that we can only see an AGI accomplishing. You can't train it to say write a better version of Winds of Winter than GRRM could because you don't have a good algorithm to score each iteration.

So what I'm really trying to ask is what specific sort of open ended problems do we see being particularly conducive to fostering AGI, as opposed to a local maximizer that's highly specialized towards the particular problem?

3[anonymous]
A generality maximizer, where the machine has a large set of "skills" it has learned on many different tasks, can allow it to perform well on zero shot untrained tasks. This was seen in Palm-E and GPT-4. A machine that can do a very large number of tasks that are evaluatable, and at least do ok by mimicking the average human or by weighting the text it learned from by scoring estimates is still an AGI. I think you moved the goalposts from "machine as capable as an average human" or even. "capable as a top 1 percent human and superintelligent in any task with a narrow metric" to "beats humans at EVERYTHING". That is an unreasonable goal and high performing ASIs may not be able to write better that grrm either.

>First problem, A lot of future gains may come from RL style self play (IE:let the AI play around solving open ended problems) 

How do people see this working? I understand the value of pointing to AI dominance in Chess/Go as illustrating how we should expect AI to recursively exceed humans at tasks, but I can't see how RL would be similarly applied to "open-ended problems" to promote similar explosive learning. What kind of open problems with a clear and instantly-discernable reward function would promote AGI growth, rather than a more-narrow type of growth geared towards solving the particular problem well?

1anithite
Note: This is an example of how to do the bad thing (extensive RL fine tuning/training). If you do it the result may be misalignment, killing you/everyone. To name one good example that is very relevant, programming, specifically having the AI complete easy to verify small tasks. The general pattern is to take existing horribly bloated software/data and extract useful subproblems from it. (EG:find the parts of this code that are taking the most time) and then turn those into problems for the AI to solve(eg: here is a function + examples of it being called, make it faster). Ground truth metrics would be simple things that are easy to measure (EG:execution time, code quality/smallness, code coverage, is the output the same?) and then credit assignment for sub-task usefulness can be handled by an expected value estimator trained on that ground truth as is done in traditional game playing RL. Possibly it's just one AI with different prompts. Basically Microsoft takes all the repositories on GitHub that build sucessfully and have some unit tests, and builds an AI augmented pipeline to extract problems from that software. Alternatively, a large company that runs lots of code takes snapshots + IO traces of production machines, and derives examples from that. You need code in the wild doing it's thing. Some example sub-tasks in the domain of software engineering: * make a piece of code faster * make this pile of code smaller * is f(x)==g(x)? If not find a counterexample (useful for grading the above) * find a vulnerability and write an exploit. * fix the bug while preserving functionality * identify invariants/data structures/patterns in memory (EG:linked lists, reference counts) * useful as a building block for further tasks (EG:finding use after free bugs) GPT-4 can already use a debugger to solve a dead simple reverse engineering problem albeit stupidly[1] https://arxiv.org/pdf/2303.12712.pdf#page=119 Larger problems could be approached by identifying u
1baturinsky
Math problems, physical problems, doing stuff in simulations, playing games.