In My Childhood Role Model, Eliezer Yudkowsky says that the difference in intelligence between a village idiot and Einstein is tiny relative to the difference between a chimp and a village idiot.This seems to imply (I could be misreading) that {the time between the first AI with chimp intelligence and the first AI with village idiot intelligence} will be much larger than {the time between the first AI with village idiot intelligence and the first AI with Einstein intelligence}. If we consider GPT-2 to be roughly chimp-level, and GPT-4 to be above village idiot level, then it seems like this would predict that we'll get an Einstein-level AI within at least the next year. This seems really unlikely and I don't even think Eliezer currently believes this. If my interpretation of this is correct, this seems like an important prediction that he got wrong and I haven't seen acknowledged.
So my question is: Is this a fair representation of Eliezer's beliefs at the time? If so, has this prediction been acknowledged wrong, or was it actually not wrong and there's something I'm missing? If the prediction was wrong, what might the implications be for fast vs slow takeoff? (Initial thought...
The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.
I think the story of chess- and Go-playing machines is a bit more nuanced, and that thinking about this is useful when thinking about takeoff.
The best chess-playing machines have been fairly strong (by human standards) since the late 1970s (Chess 4.7 showed expert-level tournament performance in 1978, and Belle, a special-purpose chess machine, was considered a good bit stronger than it). By the early 90s, chess computers at expert level were available to consumers at a modest budget, and the best machine built (Deep Thought) was grandmaster-level. It then took another six years for the Deep Thought approach to be scaled up and tuned to reach world-champion level. These programs were based on manually designed evaluation heuristics, with some aut...
From 38:58 of the podcast:
So I do think that over time I have come to expect a bit more that things will hang around in a near human place and weird shit will happen as a result. And my failure review where I look back and ask — was that a predictable sort of mistake? I feel like it was to some extent maybe a case of — you’re always going to get capabilities in some order and it was much easier to visualize the endpoint where you have all the capabilities than where you have some of the capabilities. And therefore my visualizations were not dwelling enough on a space we’d predictably in retrospect have entered into later where things have some capabilities but not others and it’s weird. I do think that, in 2012, I would not have called that large language models were the way and the large language models are in some way more uncannily semi-human than what I would justly have predicted in 2012 knowing only what I knew then. But broadly speaking, yeah, I do feel like GPT-4 is already kind of hanging out for longer in a weird, near-human space than I was really visualizing. In part, that's because it's so incredibly hard to visualize or predict correctly in advance when it will happen, which is, in retrospect, a bias.
I regularly find myself in situations where I want to convince people that AI safety is important but I have very little time before they lose interest. If you had one minute to convince someone with no or almost no previous knowledge, how would you do it ? (I have considered printing eliezer's tweet about nuclear)
A survey was conducted in the summer of 2022 of approximately 4271 researchers who published at the conferences NeurIPS or ICML in 2021, and received 738 responses, some partial, for a 17% response rate. When asked about impact of high-level machine intelligence in the long run, 48% of respondents gave at least 10% chance of an extremely bad outcome (e.g. human extinction).
Anonymous #4 asks:
How large space of possible minds? How its size was calculated? Why is EY thinks that human-like minds are not fill most of this space? What are the evidence for it? What are the possible evidence against "giant Mind Design Space and human-like minds are tiny dot there"?
Anonymous #3 asks:
Can AIs be anything but utility maximisers? Most of the existing programs are something like finite-steps-executors (like Witcher 3 and calculator). So what's the difference?
Is there work attempting to show that alignment of a superintelligence by humans (as we know them) is impossible in principle; and if not, why isn’t this considered highly plausible? For example, not just in practice but in principle, a colony of ants as we currently understand them biologically, and their colony ecologically, cannot substantively align a human. Why should we not think the same is true of any superintelligence worthy of the name? “Superintelligence" is vague. But even if we minimally define it as an entity with 1,000x the knowledge, speed,...
What's the consensus on David Shapiro and his heuristic imperatives design? He seems to consider it the best idea we've got for alignment and to be pretty optimistic about it, but I haven't heard anyone else talking about it. Either I'm completely misunderstanding what he's talking about, or he's somehow found a way around all of the alignment problems.
Video of him explaining it here for reference, and thanks in advance:
I have been surprised by how extreme the predicted probability is that AGI will end up making the decision to eradicate all life on earth. I think Eliezer said something along the lines of “most optima don’t include room for human life.” This is obviously something that has been well worked out and understood by the Less Wrong community it just isn’t very intuitive for me. Any advice on where I can start reading.
Some back ground on my general AI knowledge. I took Andrew Ng’s Coursera course on machine learning. So I have some basic understanding of n...
Is there a trick to write a utility satisficer as a utility maximizer?
By "utility maximizer" I mean the ideal bayesian agent from decision theory that outputs those actions which maximize some expected utility over states of the world .
By "utility satisficer" I mean an agent that searches for actions that make greater than some threshold short of the ideally attainable maximum, and contents itself with the first such action found. For reference, let's fix that and set the satisficer threshold to .
The satisficer is not someth...
Anonymous #7 asks:
I am familiar with the concept of a utility function, which assigns numbers to possible world states and considers larger numbers to be better. However, I am unsure how to apply this function in order to make decisions that take time into account. For example, we may be able to achieve a world with higher utility over a longer period of time, or a world with lower utility but in a shorter amount of time.
Is there a primer on what the difference between training LLMs and doing RLHF on those LLMs post-training is? They both seem fundamentally to be doing the same thing: move the weights in the direction that increases the likelihood that they output the given text. But I gather that there are some fundamental differences in how this is done and RLHF isn't quite a second training round done on hand-curated datapoints.
Anonymous #5 asks:
How can programers build something and dont understand inner workings of it? Are they closer to biologists-cross-breeders than to car designers?
I know the answer to "couldn't you just-" is always "no", but couldn't you just make an AI that doesn't try very hard? i.e., it seeks the smallest possible intervention that ensures 95% chance of whatever goal it's intended for.
This isn't a utility maximizer, because it cares about intermediate states. Some of the coherence theorems wouldn't apply.
Anonymous #1 asks:
...This one is not technical: now that we live in a world in which people have access to systems like ChatGPT, how should I consider any of my career choices, primarily in the context of a computer technician? I'm not a hard-worker, and I consider that my intelligence is just a little above average, so I'm not going to pretend that I'm going to become a systems analyst or software engineer, but now code programming and content creation are starting to be automated more and more, so how should I update my decisions based on that?
Sure, this qu
What could be done if a rogue version of AutoGPT gets loose on the internet?
OpenAI can invalidate a specific API key, if they don't know which one they can cancel all of them. This should halt the thing immediately.
If it were using a local model the problem is harder. Copies of local models may be distributed around the internet. I don't know how one could stop the agent in this situation. Can we take inspiration from how viruses and worms have been defeated in the past?
Anonymous #6 asks:
Why hasn't an alien superintelligence within our light cone already killed us?
I have noticed in discussions of AI alignment here that there is a particular emphasis on scenarios where there is a single entity which controls the course of the future. In particular, I have seen the idea of a pivotal act (an action which steers the state of the universe in a billion years such that it is better than it otherwise would be) floating around rather a lot, and the term seems to be primarily used in the context of "an unaligned AI will almost certainly steer the future in ways that do not include living humans, and the only way to prevent th...
Anonymous #2 asks:
A footnote in 'Planning for AGI and beyond' says "Many of us think the safest quadrant in this two-by-two matrix is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination" - why do shorter timelines seem more amenable to coordination?
Why is there so little mention of the potential role of the military industrial complex in developing AGI rather than a public AI lab? The money is available, the will, the history (ARPANET was the precursor to the internet). I am vaguely aware there isn't much to suggest the MIC is on the cutting edge of AI-but there wouldn't be if it were all black budget projects. If that is the case, it presumably implies a very difficult situation because the broader alignment community would have no idea when crucial thresholds were being crossed.
I've been wondering on what sorts of ways we can buy ourselves time to figure out alignment. I'm wondering if maybe a large government organization equipped with many copies of potent tool AI could manage to oversee and regulate significant compute pools will enough to avoid rogue AGI catastrophes. Is there any writing specifically on this subject?
Intuitively, I assume that LLMs trained on human data are unlikely to become much smarter than humans, right? Without some additional huge breakthrough, other than just being a language model?
Hello, this concerns an idea I had back in ~2014 which I abandoned because I didn't see anyone else talking about it and I therefore assumed was transparently stupid. After talking to a few researchers, I have been told the idea is potentially novel and potentially useful, so here I go (sweating violently trying to suppress my sense of transgression).
The idea concerns how one might build safety margin into AI or lesser AGI systems in a way that they can be safely iterated on. It is not intended as anything resembling a solution to alignment, just an easy-t...
What is the connection between the concepts of intelligence and optimization?
I see that optimization implies intelligence (that optimizing sufficiently hard task sufficiently well requires sufficient intelligence). But it feels like the case for existential risk from superintelligence is dependent on the idea that intelligence is optimization, or implies optimization, or something like that. (If I remember correctly, sometimes people suggest creating "non-agentic AI", or "AI with no goals/utility", and EY says that they are trying to invent non-wet water o...
If a superintelligent AI is guaranteed to be manipulative (instrumental convergence) how can we validate any solution to the alignment problem? Afaik, we can't even guarantee that a model optimizes to the defined objective due to mesa optimizers. So that adds more complexity to a seemingly unanswerable problem.
My other question is, people here seem to think of intelligence as single dimension type of thing. But I always maintained the belief that the type of reasoning useful in scientific discovery does not necessarily unlock the secret of human communicat...
Is this a plausible take?
tl;dr: Ask questions about AGI Safety as comments on this post, including ones you might otherwise worry seem dumb!
Asking beginner-level questions can be intimidating, but everyone starts out not knowing anything. If we want more people in the world who understand AGI safety, we need a place where it's accepted and encouraged to ask about the basics.
We'll be putting up monthly FAQ posts as a safe space for people to ask all the possibly-dumb questions that may have been bothering them about the whole AGI Safety discussion, but which until now they didn't feel able to ask.
It's okay to ask uninformed questions, and not worry about having done a careful search before asking.
AISafety.info - Interactive FAQ
Additionally, this will serve as a way to spread the project Rob Miles' team[1] has been working on: Stampy and his professional-looking face aisafety.info. This will provide a single point of access into AI Safety, in the form of a comprehensive interactive FAQ with lots of links to the ecosystem. We'll be using questions and answers from this thread for Stampy (under these copyright rules), so please only post if you're okay with that!
You can help by adding questions (type your question and click "I'm asking something else") or by editing questions and answers. We welcome feedback and questions on the UI/UX, policies, etc. around Stampy, as well as pull requests to his codebase and volunteer developers to help with the conversational agent and front end that we're building.
We've got more to write before he's ready for prime time, but we think Stampy can become an excellent resource for everyone from skeptical newcomers, through people who want to learn more, right up to people who are convinced and want to know how they can best help with their skillsets.
Guidelines for Questioners:
Guidelines for Answerers:
Finally: Please think very carefully before downvoting any questions, remember this is the place to ask stupid questions!
If you'd like to join, head over to Rob's Discord and introduce yourself!