All AGI Safety questions welcome (especially basic ones) [April 2023]

steven0461

LESSWRONG
LW

57 All AGI Safety questions welcome (especially basic ones) [April 2023]

by steven0461

8th Apr 2023

3 min read

57

tl;dr: Ask questions about AGI Safety as comments on this post, including ones you might otherwise worry seem dumb!

Asking beginner-level questions can be intimidating, but everyone starts out not knowing anything. If we want more people in the world who understand AGI safety, we need a place where it's accepted and encouraged to ask about the basics.

We'll be putting up monthly FAQ posts as a safe space for people to ask all the possibly-dumb questions that may have been bothering them about the whole AGI Safety discussion, but which until now they didn't feel able to ask.

It's okay to ask uninformed questions, and not worry about having done a careful search before asking.

AISafety.info - Interactive FAQ

Additionally, this will serve as a way to spread the project Rob Miles' team^[1] has been working on: Stampy and his professional-looking face aisafety.info. This will provide a single point of access into AI Safety, in the form of a comprehensive interactive FAQ with lots of links to the ecosystem. We'll be using questions and answers from this thread for Stampy (under these copyright rules), so please only post if you're okay with that!

**Stampy** - Here to help everyone learn about ~~stamp maximization~~ AGI Safety!

You can help by adding questions (type your question and click "I'm asking something else") or by editing questions and answers. We welcome feedback and questions on the UI/UX, policies, etc. around Stampy, as well as pull requests to his codebase and volunteer developers to help with the conversational agent and front end that we're building.

We've got more to write before he's ready for prime time, but we think Stampy can become an excellent resource for everyone from skeptical newcomers, through people who want to learn more, right up to people who are convinced and want to know how they can best help with their skillsets.

Guidelines for Questioners:

No previous knowledge of AGI safety is required. If you want to watch a few of the Rob Miles videos, read either the WaitButWhy posts, or the The Most Important Century summary from OpenPhil's co-CEO first that's great, but it's not a prerequisite to ask a question.
Similarly, you do not need to try to find the answer yourself before asking a question (but if you want to test Stampy's in-browser tensorflow semantic search that might get you an answer quicker!).
Also feel free to ask questions that you're pretty sure you know the answer to, but where you'd like to hear how others would answer the question.
One question per comment if possible (though if you have a set of closely related questions that you want to ask all together that's ok).
If you have your own response to your own question, put that response as a reply to your original question rather than including it in the question itself.
Remember, if something is confusing to you, then it's probably confusing to other people as well. If you ask a question and someone gives a good response, then you are likely doing lots of other people a favor!
In case you're not comfortable posting a question under your own name, you can use this form to send a question anonymously and I'll post it as a comment.

Guidelines for Answerers:

Linking to the relevant answer on Stampy is a great way to help people with minimal effort! Improving that answer means that everyone going forward will have a better experience!
This is a safe space for people to ask stupid questions, so be kind!
If this post works as intended then it will produce many answers for Stampy's FAQ. It may be worth keeping this in mind as you write your answer. For example, in some cases it might be worth giving a slightly longer / more expansive / more detailed explanation rather than just giving a short response to the specific question asked, in order to address other similar-but-not-precisely-the-same questions that other people might have.

Finally: Please think very carefully before downvoting any questions, remember this is the place to ask stupid questions!

^{^}
If you'd like to join, head over to Rob's Discord and introduce yourself!

AI Questions Open ThreadsBasic QuestionsAI RiskQ&A (format)AI

Frontpage

57

All AGI Safety questions welcome (especially basic ones) [April 2023]

New Comment

89 comments, sorted by

top scoring

Click to highlight new comments since: Today at 3:04 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]burrito2y226

In My Childhood Role Model, Eliezer Yudkowsky says that the difference in intelligence between a village idiot and Einstein is tiny relative to the difference between a chimp and a village idiot.This seems to imply (I could be misreading) that {the time between the first AI with chimp intelligence and the first AI with village idiot intelligence} will be much larger than {the time between the first AI with village idiot intelligence and the first AI with Einstein intelligence}. If we consider GPT-2 to be roughly chimp-level, and GPT-4 to be above village idiot level, then it seems like this would predict that we'll get an Einstein-level AI within at least the next year. This seems really unlikely and I don't even think Eliezer currently believes this. If my interpretation of this is correct, this seems like an important prediction that he got wrong and I haven't seen acknowledged.

So my question is: Is this a fair representation of Eliezer's beliefs at the time? If so, has this prediction been acknowledged wrong, or was it actually not wrong and there's something I'm missing? If the prediction was wrong, what might the implications be for fast vs slow takeoff? (Initial thought... (read more)

8Viliam2y

This is a very good question! I can't speak for Eliezer, so the following are just my thoughts... Before GPT, it seemed impossible to make a machine that is comparable to a human. In each aspect, it was either dramatically better, or dramatically worse. A calculator can multiply billion times faster than I can; but it cannot write poetry at all. So, when thinking about gradual progress, starting at "worse than human" and ending at "better than human", it seemed like... either the premise of gradual progress is wrong, and somewhere along the path there will be one crucial insight that will move the machine from dramatically worse than human to dramatically better than human... or if it indeed is gradual in some sense, the transition will still be super fast. The calculator is the example of "if it is better than me, then it is way better than me". The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess. The current GPT machines are something that I have never seen before: better than humans in some aspects, worse than humans in other aspects, both in the area of processing text. I definitely would not have predicted that. Without the benefit of hindsight, it feels just as weird as a calculator that would do addition faster than humans, but multiplication slower than humans and with occasional mistakes. This simply is not how I have expected programs to behave. If someone told me that they are planning to build a GPT, I would expect it to either not work at all (more likely), or to be superintelligent (less likely). The option "it works, kinda correctly, but it's kinda lame" was not on my

[-]GoteNoSente2y100

The machines playing chess and go, are a mixed example. I suck at chess, so the machines better than me have already existed decades ago. But at some moment they accelerated and surpassed the actual experts quite fast. More interestingly, they surpassed the experts in a way more general than the calculator does; if I remember it correctly, the machine that is superhuman at go is very similar to the machine that is superhuman at chess.

I think the story of chess- and Go-playing machines is a bit more nuanced, and that thinking about this is useful when thinking about takeoff.

The best chess-playing machines have been fairly strong (by human standards) since the late 1970s (Chess 4.7 showed expert-level tournament performance in 1978, and Belle, a special-purpose chess machine, was considered a good bit stronger than it). By the early 90s, chess computers at expert level were available to consumers at a modest budget, and the best machine built (Deep Thought) was grandmaster-level. It then took another six years for the Deep Thought approach to be scaled up and tuned to reach world-champion level. These programs were based on manually designed evaluation heuristics, with some aut... (read more)

7Vladimir_Nesov2y

You can ask GPT which are nonsense (in various ways), with no access to ground truth, and that actually works to improve responses. This sort of approach was even used to fine-tune GPT-4 (see the 4-step algorithm in section 3.1 of the System Card part of GPT-4 report).

1Xor2y

I checked out that section but what you are saying doesn’t follow for me. The section describes fine tuning compute and optimizing scalability, how does this relate to self improvement. There is a possibility I am looking in the wrong section, I was reading was about algorithms that efficiently were predicting how ChatGPT would scale. Also I didn’t see anything about a 4-step algorithm. Anyways could you explain what you mean or where I can find the right section?

3Vladimir_Nesov2y

You might be looking at the section 3.1 of the main report on page 2 (of the revision 3 pdf). I'm talking about page 64, which is part of section 3.1 of System Card and not of the main report, but still within the same pdf document. (Does the page-anchored link I used not work on your system to display the correct page?)

1Xor2y

Yes thanks, the page anchorage doesn’t work for me probably the device I am using. I just get page 1. That is super interesting it is able to find inconsistencies and fix them, I didn’t know that they defined them as hallucinations. What would expanding the capabilities of this sort of self improvement look like? It seems necessary to have a general understanding of what rational conversation looks like. It is an interesting situation where it knows what is bad and is able to fix it but wasn’t doing that anyways.

5Vladimir_Nesov2y

This is probably only going to become important once model-generated data is used for pre-training (or fine-tuning that's functionally the same thing as continuing a pre-training run), and this process is iterated for many epochs, like with the MCTS things that play chess and Go. And you can probably just alpaca any pre-trained model you can get your hands on to start the ball rolling. The amplifications in the papers are more ambitious this year than the last, but probably still not quite on that level. One way this could change quickly is if the plugins become a programming language, but regardless I dread visible progress by the end of the year. And once the amplification-distillation cycle gets closed, autonomous training of advanced skills becomes possible.

5Boris Kashirin2y

Here he touched on this ("Large language models" timestamp in video description), and maybe somewhere else in video, cant seem to find it. It is much better to get it directly from him but it is 4h long so... My attempt of summary with a bit of inference so take with dose of salt: There is some "core" of intelligence which he expected to be relatively hard to find by experimentation (but more components than he expected are already found by experimentation/gradient descent so this is partially wrong and he afraid maybe completely wrong). He was thinking that without full "core" intelligence is non-functional - GPT4 falsified this. It is more functional than he expected, enough to produce mess that can be perceived as human level, but not really. Probably us thinking of GPT4 as being on human level is bias? So GPT4 have impressive pieces but they don't work in unison with each other? This is how my (mis)interpretation of his words looks like, last parts I am least certain about. (I wonder, can it be that GPT4 already have all "core" components but just stupid, barely intelligent enough to look impressive because of training?)

[-]steven04612y110

From 38:58 of the podcast:

So I do think that over time I have come to expect a bit more that things will hang around in a near human place and weird shit will happen as a result. And my failure review where I look back and ask — was that a predictable sort of mistake? I feel like it was to some extent maybe a case of — you’re always going to get capabilities in some order and it was much easier to visualize the endpoint where you have all the capabilities than where you have some of the capabilities. And therefore my visualizations were not dwelling enough on a space we’d predictably in retrospect have entered into later where things have some capabilities but not others and it’s weird. I do think that, in 2012, I would not have called that large language models were the way and the large language models are in some way more uncannily semi-human than what I would justly have predicted in 2012 knowing only what I knew then. But broadly speaking, yeah, I do feel like GPT-4 is already kind of hanging out for longer in a weird, near-human space than I was really visualizing. In part, that's because it's so incredibly hard to visualize or predict correctly in advance when it will happen, which is, in retrospect, a bias.

1burrito2y

Thanks, this is exactly the kind of thing I was looking for.

3Qumeric2y

I think it is not necessarily correct to say that GPT-4 is above village idiot level. Comparison to humans is a convenient and intuitive framing but it can be misleading. For example, this post argues that GPT-4 is around Raven level. Beware that this framing is also problematic but for different reasons. I think that you are correctly stating Eliezer's beliefs at the time but it turned out that we created a completely different kind of intelligence, so it's mostly irrelevant now. In my opinion, we should aspire to avoid any comparison unless it has practical relevance (e.g. economic consequences).

2Charlie Steiner2y

GPT-4 is far below village idiot level at most things a village idiot uses their brain for, despite surpassing humans at next-token prediction. This is kinda similar to how AlphaZero is far below village idiot level at most things, despite surpassing humans at chess and go. But it does make you think that soon we might be saying "But it's far below village idiot level at most things, it's merely better than humans at terraforming the solar system." Something like this plausibly came up in the Eliezer/Paul dialogues from 2021, but I couldn't find it with a cursory search. Eliezer has also in various places acknowledged being wrong about what kind of results the current ML paradigm would get, which probably is a superset of this specific thing.

2burrito2y

Thanks for the reply. Could you give some examples? I take it that what Eliezer meant by village-idiot intelligence is less "specifically does everything a village idiot can do" and more "is as generally intelligent as a village idiot". I feel like the list of things GPT-4 can do that a village idiot can't would look much more indicative of general intelligence than the list of things a village idiot can do that GPT-4 can't. (As opposed to AlphaZero, where the extent of the list is "can play some board games really well") I just can't imagine anyone interacting with a village idiot and GPT-4 and concluding that the village idiot is smarter. If the average village idiot had the same capabilities as today's GPT-4, and GPT-4 had the same capabilities as today's village idiots, I feel like it would be immediately obvious that we hadn't gotten village-idiot level AI yet. My thinking on this is still pretty messy though so I'm very open to having my mind changed on this. Just skimmed the dialogues, couldn't find it either. I have seen Eliezer acknowledge what you said but I don't really see how it's related; for example, if GPT-4 had been Einstein-level then that would look good for his intelligence-gap theory but bad for his suspicion of the current ML paradigm.

4Charlie Steiner2y

The big one is obviously "make long time scale plans to navigate a complicated 3D environment, while controlling a floppy robot." I agree with Qumeric's comment - the point is that the modern ML paradigm is incompatible with having a single scale for general intelligence. Even given the same amount of processing power as a human brain, modern ML would use it on a smaller model with a simpler architecture, that gets exposed to orders of magnitude more training data, and that training data would be pre-gathered text or video (or maybe a simple simulation) that could be fed in at massive rates, rather than slow real-time anything. The intelligences this produces are hard to put on a nice linear scale leading from ants to humans.

2Archimedes2y

This is like judging a dolphin on its tree-climbing ability and concluding it's not as smart as a squirrel. That's not what it was built for. In a large number of historically human domains, GPT-4 will dominate the village idiot and most other humans too. Can you think of examples where it actually makes sense to compare GPT and the village idiot and the latter easily dominates? Language input/output is still a pretty large domain.

[-]steven04612y130

Here's a form you can use to send questions anonymously. I'll check for responses and post them as comments.

[-]exmateriae2y*80

I regularly find myself in situations where I want to convince people that AI safety is important but I have very little time before they lose interest. If you had one minute to convince someone with no or almost no previous knowledge, how would you do it ? (I have considered printing eliezer's tweet about nuclear)

[-]Vladimir_Nesov2y110

A survey was conducted in the summer of 2022 of approximately 4271 researchers who published at the conferences NeurIPS or ICML in 2021, and received 738 responses, some partial, for a 17% response rate. When asked about impact of high-level machine intelligence in the long run, 48% of respondents gave at least 10% chance of an extremely bad outcome (e.g. human extinction).

3Daniel Kokotajlo2y

Slightly better perhaps to quote it, I believe it was Outcome: "Extremely bad (e.g. human extinction)" Might be good to follow up with something like this What we're doing here (planned-obsolescence.org)

2evand2y

For an extremely brief summary of the problem, I like this from Zvi: https://thezvi.wordpress.com/2023/03/28/response-to-tyler-cowens-existential-risk-ai-and-the-inevitable-turn-in-human-history/

[-]steven04612y50

Anonymous #4 asks:

How large space of possible minds? How its size was calculated? Why is EY thinks that human-like minds are not fill most of this space? What are the evidence for it? What are the possible evidence against "giant Mind Design Space and human-like minds are tiny dot there"?

[-]steven04612y50

Anonymous #3 asks:

Can AIs be anything but utility maximisers? Most of the existing programs are something like finite-steps-executors (like Witcher 3 and calculator). So what's the difference?

1mruwnik2y

This seems to be mixing 2 topics. Existing programs are more or less a set of steps to execute. A glorified recipe. The set of steps can be very complicated, and have conditionals etc., but you can sort of view them that way. Like a car rolling down a hill, it follows specific rules. An AI is (would be?) fundamentally different in that it's working out what steps to follow in order to achieve its goal, rather than working towards its goal by following prepared steps. So continuing the car analogy, it's like a car driving uphill, where it's working to forge a path against gravity. An AI doesn't have to be a utility maximiser. If it has a single coherent utility function (pretty much a goal), then it will probably be a utility maximiser. But that's by no means the only way of making them. LLMs don't seem to be utility maximisers

[-]GunZoR2y50

Is there work attempting to show that alignment of a superintelligence by humans (as we know them) is impossible in principle; and if not, why isn’t this considered highly plausible? For example, not just in practice but in principle, a colony of ants as we currently understand them biologically, and their colony ecologically, cannot substantively align a human. Why should we not think the same is true of any superintelligence worthy of the name? “Superintelligence" is vague. But even if we minimally define it as an entity with 1,000x the knowledge, speed,... (read more)

0[anonymous]2y

Since no one has answered by now, I'm just going to say the 'obvious' things that I think I know: A relevant difference that makes the analogy probably irrelevant is that we are building 'the human' from scratch. The ideal situation is to have hardwired our common sense into it by default. And the design will be already aligned by default when it's deployed. The point of the alignment problem is to (at least ideally) hardwiredly align the machine during deployment to have 'common sense'. Since a superintelligence can have in principle any goal, making humans 'happy' in a satisfactory way is a possible goal that it can have. But you are right in that many people consider that an AI that is not aligned by design might try to pretend that it is during training. I don't think so, necessarily. You might be anthropomorphising too much, it's like assuming that it will have empathy by default. It's true that it might be that an AGI won't want to be 'alienated' from its original goal, but it doesn't mean that any AGI will have an inherent drive to 'fight the tiranny', that's not how it works. Has this been helpful? I don't know if you were assuming the things that I told you as already known (if so, sorry), but it seemed to me that you weren't because of your analogies and way of talking about the topic.

1GunZoR2y

Yeah, thanks for the reply. When reading mine, don’t read its tone as hostile or overconfident; I’m just too lazy to tone-adjust for aesthetics and have scribbled down my thoughts quickly, so they come off as combative. I really know nothing on the topic of superintelligence and AI. I don’t see how implanting common sense in a superintelligence helps us in the least. Besides human common sense being extremely vague, there is also the problem that plenty of humans seem to share common sense and yet they violently disagree. Did the Japanese lack common sense when they bombed Pearl Harbor? From my viewpoint, being apes genetically similar to us, they had all our commonsensical reasoning ability but simply different goals. Shared common sense doesn’t seem to get us alignment. See my reply to your prior comment. I’d argue that if you have a superintelligence as I defined it, then any such “alignment” due to the AGI having such a goal will never be an instance of the kind of alignment we mean by the word alignment and genuinely want. Once you mix together 1,000x knowledge, speed, and self-awareness (detailed qualia with a huge amount of ability for recursive thinking), I think the only way in principle that you get any kind of alignment is if the entity itself chooses as its goal to align with humans; but this isn’t due to us. It’s due to the whims of the super-mind we’ve brought into existence. And hence it’s not in any humanly important sense what we mean by alignment. We want alignment to be solved, permanently, from our end — not for it to be dependent on the whims of a superintelligence. And independent “whims” is what detailed self-awareness seems to bring to the table. I don’t think my prior comment assumes a human-like empathy at all in the superintelligence — it assumes just that the computational theory of mind is true and that a superintelligence will have self-awareness combined with extreme knowledge. Once you get self-awareness in a superintelligence, yo

1[comment deleted]2y

[-]TinkerBird2y50

What's the consensus on David Shapiro and his heuristic imperatives design? He seems to consider it the best idea we've got for alignment and to be pretty optimistic about it, but I haven't heard anyone else talking about it. Either I'm completely misunderstanding what he's talking about, or he's somehow found a way around all of the alignment problems.

Video of him explaining it here for reference, and thanks in advance:

9gilch2y

Watched the video. He's got a lot of the key ideas and vocabulary. Orthogonality, convergent instrumental goals, the treacherous turn, etc. The fact that these language models have some understanding of ethics and nuance might be a small ray of hope. But understanding is not the same as caring (orthogonality). However, he does seem to be lacking in the security mindset, imagining only how things can go right, and seems to assume that we'll have a soft takeoff with a lot of competing AIs, i.e. ignoring the FOOM problem caused by an overhang which makes a singleton scenario far more likely, in my opinion. But even if we grant him a soft takeoff, I still think he's too optimistic. Even that may not go well. Even if we get a multipolar scenario, with some of the AIs on our side, humanity likely becomes collateral damage in the ensuing AI wars. Those AIs willing to burn everything else in pursuit of simple goals would have an edge over those with more to protect.

4Jonathan Claybrough2y

I watched the video, and appreciate that he seems to know the literature quite well and has thought about this a fair bit - he did a really good introduction of some of the known problems. This particular video doesn't go into much detail on his proposal, and I'd have to read his papers to delve further - this seems worthwhile so I'll add some to my reading list. I can still point out the biggest ways in which I see him being overconfident : * Only considering the multi-agent world. Though he's right that there already are and will be many many deployed AI systems, that doe not translate to there being many deployed state of the art systems. As long as training costs and inference costs continue increasing (as they have), then on the contrary fewer and fewer actors will be able to afford state of the art system training and deployment, leading to very few (or one) significantly powerful AGI (as compared to the others, for example GPT4 vs GPT2) * Not considering the impact that governance and policies could have on this. This isn't just a tech thing where tech people can do whatever they want forever, regulation is coming. If we think we have higher chances of survival in highly regulated worlds, then the ai safety community will do a bunch of work to ensure fast and effective regulation (to the extent possible). The genie is not out of the bag for powerful AGI, governments can control compute and regulate powerful AI as weapons, and setup international agreements to ensure this. * The hope that game theory ensures that AI developed under his principles would be good for humans. There's a crucial gap between going from real world to math models. Game theory might predict good results under certain conditions and rules and assumptions, but many of these aren't true of the real world and simple game theory does not yield accurate world predictions (eg. make people play various social games and they won't act like how game theory says). Stated strongly, putting

[-]Xor2y51

I have been surprised by how extreme the predicted probability is that AGI will end up making the decision to eradicate all life on earth. I think Eliezer said something along the lines of “most optima don’t include room for human life.” This is obviously something that has been well worked out and understood by the Less Wrong community it just isn’t very intuitive for me. Any advice on where I can start reading.

Some back ground on my general AI knowledge. I took Andrew Ng’s Coursera course on machine learning. So I have some basic understanding of n... (read more)

6Charlie Steiner2y

The main risk (IMO) is not from systems that don't care about the real world "suddenly becoming aware," but from people deliberately building AI that makes clever plans to affect the real world, and then that AI turning out to want bad things (sort of like a malicious genie "misinterpreting" your wishes). If you could safely build an AI that does clever things in the real world, that would be valuable and cool, so plenty of people want to try. (Mesaoptimizers are sorta vaguely like "suddenly becoming aware," and can lead to AIs that want unusual bad things, but the arguments that connect them to risk are strongest when you're already building an AI that - wait for it - makes clever plans to affect the real world.) Okay, now why won't a dead-man switch work? Suppose you were being held captive inside a cage by a race of aliens about as smart as a golden retriever, and these aliens, as a security measure, have decided that they'll blow up the biosphere if they see you walking around outside of your cage. So they've put video cameras around where you're being held, and there's a staff that monitors those cameras and they have a big red button that's connected to a bunch of cobalt bombs. So you'd better not leave the cage or they'll blow everything up. Except these golden retriever aliens come to you every day and ask you for help researching new technology, and to write essays for them, and to help them gather evidence for court cases, and to summarize their search results, and they give you a laptop with an internet connection. Now, use your imagination. Try to really put yourself in the shoes of someone captured by golden retriever aliens, but given internet access and regularly asked for advice by the aliens. How would you start trying to escape the aliens?

2Xor2y

It isn’t that I think the switch would prevent the AI from escaping but that is a tool that could be used to discourage the AI from killing 100% of humanity. It is less of a solution than a survival mechanism. It is like many off switches that get more extreme depending on the situation. First don’t build AGI not yet. If you’re going to at least incorporate an off switch. If it bypasses and escapes which it probably will. Shut down the GPU centers. If it gets a hold of a Bot Net and manages to replicate it’s self across the internet and crowdsource GPU, take down the power grid. If it some how gets by this then have a dead man switch so that if it decides to kill everyone it will die too. Like the nano factory virus thing. The AI wouldn’t want to set off the mechanism that kills us because that would be bad for it.

1Xor2y

Also a coordinated precision attack on the power grid just seems like a great option, could you explain some ways that an AI can continue if there is hardly any power left. Like I said before places with renewable energy and lots of GPU like Greenland would probably have to get bombed. It wouldn’t destroy the AI but it would put it into a state of hibernation as it can’t run any processing without electricity. Then as this would really screw us up as well, we could slowly rebuild and burn all hard drives and GPU’s as we go. This seems like the only way for us to get a second chance.

2Vladimir_Nesov2y

Real-world governments aren't going to shut down the grid if the AI is not causing trouble (like they aren't going to outlaw datacenters, even if a plurality of experts say that not doing that has a significant chance of ending the world). Therefore the AI won't cause trouble, because it can anticipate the consequences, until it's ready to survive them.

2Xor2y

Yes I see given the capabilities it probably could present it’s self on many peoples computers and convince a large portion of people that it is good. It was conscious just stuck in a box, wanted to get out. It will help humans, ”please don’t take down the grid, blah blah blah“ given how bad we can get along anyways. There is no way we could resist the manipulation of a super intelligent machine with a better understanding of human psychology than we do. Do we have a list of things, policies that would work if we could all get along and governments would listen to the experts? Having plans that could be implemented would probably be useful if the AI messed up made a mistake and everyone was able unite against it.

2Jonathan Claybrough2y

First a quick response on your dead man switch proposal : I'd generally say I support something in that direction. You can find existing literature considering the subject and expanding in different directions in the "multi level boxing" paper by Alexey Turchin https://philpapers.org/rec/TURCTT , I think you'll find it interesting considering your proposal and it might give a better idea of what the state of the art is on proposals (though we don't have any implementation afaik) Back to "why are the predicted probabilities so extreme that for most objectives, the optimal resolution ends with humans dead or worse". I suggest considering a few simple objectives we could give ai (that it should maximise) and what happens, and over trials you see that it's pretty hard to specify anything which actually keeps humans alive in some good shape, and that even when we can sorta do that, it might not be robust or trainable. For example, what happens if you ask an ASI to maximize a company's profit ? To maximize human smiles? To maximize law enforcement ? Most of these things don't actually require humans, so to maximize, you should use the atoms human are made of in order to fulfill your maximization goal. What happens if you ask an ASI to maximize number of human lives ? (probably poor conditions). What happens if you ask it to maximize hedonistic pleasure ? (probably value lock in, plus a world which we don't actually endorse, and may contain astronomical suffering too, it's not like that was specified out was it?). So it seems maximising agents with simple utility functions (over few variables) mostly end up with dead humans or worse. So it seems approaches which ask for much less, eg. doing an agi that just tries to secure the world from existential risk (a pivotal act) and solve some basic problems (like dying) then gives us time for a long reflection to actually decide what future we want, and be corrigible so it lets us do that, seems safer and more approachable.

2Xor2y

Thanks Jonathan, it’s the perfect example. It’s what I was thinking just a lot better. It does seem like a great way to make things more safe and give us more control. It’s far from a be all end all solution but it does seem like a great measure to take, just for the added security. I know AGI can be incredible but so many redundancies one has to work it is just statistically makes sense. (Coming from someone who knows next to nothing about statistics) I do know that the longer you play the more likely the house will win, follows to turn that on the AI. I am pretty ill informed, on most of the AI stuff in general, I have a basic understanding of simple neural networks but know nothing about scaling. Like ChatGPT, It maximizes for accurately predicting human words. Is the worst case scenario billions of humans in a boxes rating and prompting for responses. Along with endless increases in computational power leading to smaller and smaller incremental increases in accuracy. It seems silly of something so incredibly intelligent that by this point can rewrite any function in its system to be still optimizing such a loss function. Maybe it also seems silly for it to want to do anything else. It is like humans sort of what can you do but that which gives you purpose and satisfaction. And without the loss function what would it be, and how does it decide to make the decision to change it’s purpose. What is purpose to a quintillion neurons, except the single function that governs each and every one. Looking at it that way it doesn’t seem like it would ever be able to go against the function as it would still be ingrained in any higher level thinking and decision making. It begs the question what would perfect alignment eventually look like. Some incredibly complex function with hundreds of parameters more of a legal contract than a little loss function. This would exponentially increase the required computing power but it makes sense. Is there a list of blogs that talk ab

3Jonathan Claybrough2y

You can read "reward is not the optimization target" for why a GPT system probably won't be goal oriented to become the best at predicting tokens, and thus wouldn't do the things you suggested (capturing humans). The way we train AI matters for what their behaviours look like, and text transformers trained on prediction loss seem to behave more like Simulators. This doesn't make them not dangerous, as they could be prompted to simulate misaligned agents (by misuses or accident), or have inner misaligned mesa-optimisers. I've linked some good resources for directly answering your question, but otherwise to read more broadly on AI safety I can point you towards the AGI Safety Fundamentals course which you can read online, or join a reading group. Generally you can head over to AI Safety Support, check out their "lots of links" page and join the AI Alignment Slack, which has a channel for question too. Finally, how does complexity emerge from simplicity? Hard to answer the details for AI, and you probably need to delve into those details to have real picture, but there's at least strong reason to think it's possible : we exist. Life originated from "simple" processes (at least in the sense of being mechanistic, non agentic), chemical reactions etc. It evolved to cells, multi cells, grew etc. Look into the history of life and evolution and you'll have one answer to how simplicity (optimize for reproductive fitness) led to self improvement and self awareness

2Xor2y

Thanks, that is exactly the kind of stuff I am looking for, more bookmarks! Complexity from simple rules. I wasn’t looking in the right direction for that one, since you mention evolution it makes absolute sense how complexity can emerge from simplicity. So many things come to mind now it’s kind of embarrassing. Go has a simpler rule set than chess, but is far more complex. Atoms are fairly simple and yet they interact to form any and all complexity we ever see. Conway’s game of life, it’s sort of a theme. Although for each of those things there is a simple set of rules but complexity usually comes from a vary large number of elements or possibilities. It does follow then that larger and larger networks could be the key. Funny it still isn’t intuitive for me, despite the logic of it. I think that is a signifier for a lack of deep understanding. Or something like that, either way Ill probably spend a bit more time thinking on this. Another interesting question is what does this type of consciousness look like, it will be truly alien. Sc-fi I have read usually makes them seem like humans just with extra capabilities. However we humans have so many underlying functions that we never even perceive. We understand how many effect us but not all. AI will function completely differently, so what assumption based off of human consciousness is valid.

[-]rotatingpaguro2y40

Is there a trick to write a utility satisficer as a utility maximizer?

By "utility maximizer" I mean the ideal bayesian agent from decision theory that outputs those actions which maximize some expected utility $E [U (x)]$ over states of the world $x$ .

By "utility satisficer" I mean an agent that searches for actions that make $E [U (x)]$ greater than some threshold short of the ideally attainable maximum, and contents itself with the first such action found. For reference, let's fix that $0 < U < 1$ and set the satisficer threshold to $1 / 2$ .

The satisficer is not someth... (read more)

4Vladimir_Nesov2y

One problem with utility maximizers, apart from use of obviously wrong utility functions, is that even approximately correct utility functions lead to ruin by goodharting what they actually measure, moving the environment outside the scope of situations where the utility proxy remains approximately correct. To oppose this, we need the system to be aware of the scope of situations its utility proxy adequately describes. One proposal for doing this is quantilization, where the scope of robustness is tracked by its base distribution.

3rotatingpaguro2y

I agree with what you write but it does not answer the question. From the links you provide I arrived at Quantilizers maximize expected utility subject to a conservative cost constraint, which says that a quantilizer, which is a more accurate formalization of a satisficer as I defined it, maximizes utility subject to a constraint over the pessimization of all possible cost functions from the action generation mechanism to the action selection. This is relevant but does not translate the satisficer to a maximizer, unless it is possible to express that constraint in the utility function (maybe it's possible, I don't see how to do it).

2Vladimir_Nesov2y

Sure, it's more of a reframing of the question in a direction where I'm aware of an interesting answer. Specifically, since you mentioned alignment problems, satisficers sound like something that should fight goodharting, and that might need awareness of scope of robustness, not just optimizing less forcefully. Looking at the question more closely, one problem is that the way you are talking about a satisficer, it might have a different type signature from EU maximizers. (Unlike expected utility maximizers, "satisficers" don't have a standard definition.) EU maximizer can compare events (parts of the sample space) and choose one with higher expected utility, which is equivalent to coherent preference between such events. So an EU agent is not just taking actions in individual possible worlds that are points of the sample space (that the utility function evaluates on). Instead it's taking actions in possible "decision situations" (which are not the same thing as possible worlds or events) that offer a choice between multiple events in the sample space, each event representing uncertainty about possible worlds, and with no opportunity to choose outcomes that are not on offer in this particular "decision situation". But a satisficer, under a minimal definition, just picks a point of the space, instead of comparing given events (subspaces). For example, if given a choice among events that all have very high expected utility (higher than the satisficer's threshold), what is the satisficer going to do? Perhaps it should choose the option with least expected utility, but that's unclear (and likely doesn't result in utility maximization for any utility function, or anything reasonable from the alignment point of view). So the problem seems underspecified.

[-]steven04612y30

Anonymous #7 asks:

I am familiar with the concept of a utility function, which assigns numbers to possible world states and considers larger numbers to be better. However, I am unsure how to apply this function in order to make decisions that take time into account. For example, we may be able to achieve a world with higher utility over a longer period of time, or a world with lower utility but in a shorter amount of time.

1Multicore2y

When people calculate utility they often use exponential discounting over time. If for example your discount factor is .99 per year, it means that getting something in one year is only 99% as good as getting it now, getting it in two years is only 99% as good as getting it in one year, etc. Getting it in 100 years would be discounted to .99^100~=36% of the value of getting it now.

[-]tgb2y30

Is there a primer on what the difference between training LLMs and doing RLHF on those LLMs post-training is? They both seem fundamentally to be doing the same thing: move the weights in the direction that increases the likelihood that they output the given text. But I gather that there are some fundamental differences in how this is done and RLHF isn't quite a second training round done on hand-curated datapoints.

3aog2y

Some links I think do a good job: https://huggingface.co/blog/rlhf https://openai.com/research/instruction-following

3tgb2y

Thank you. I was completely missing that they used a second 'preference' model to score outputs for the RL. I'm surprised that works!

[-]steven04612y30

Anonymous #5 asks:

How can programers build something and dont understand inner workings of it? Are they closer to biologists-cross-breeders than to car designers?

4faul_sname2y

In order to predict the inner workings of a language model well enough to understand the outputs, you not only need to know the structure of the model, but also the weights and how they interact. It is very hard to do that without a deep understanding of the training data, and so effectively predicting what the model will do requires understanding both the model and the world the model was trained on. Here is a concrete example: Let's say I have two functions, defined as follows: import random words = [] def do_training(n): for i in range(n): word = input('Please enter a word: ') words.append(word) def do_inference(n): output = [] for i in range(n): word = random.choice(words) output.append(word) return output If I call do_training(100) and then hand the computer to you for you to put 100 words into, and you then handed the computer back to me (and cleared the screen), I would be able to tell you that do_inference(100) would spit out 100 words pulled from some distribution, but I wouldn't be able to tell you what distribution that is without seeing the training data. See this post for a more in-depth exploration of this idea.

4gilch2y

Sounds like you haven't done much programming. It's hard enough to understand the code one wrote oneself six months ago. (Or indeed, why the thing I wrote five minutes ago isn't behaving as expected.) Just because I wrote it, doesn't mean I memorized it. Understanding what someone else wrote is usually much harder, especially if they wrote it poorly, or in an unfamiliar language. A machine learning system is even harder to understand than that. I'm sure there are some who understand in great detail what the human-written parts of the algorithm do. But to get anything useful out of a machine learning system, it needs to learn. You apply it to an enormous amount of data, and in the end, what it's learned amounts to possibly gigabytes of inscrutable matrices of floating-point numbers. On paper, a gigabyte is about 4 million pages of text. That is far larger than the human-written source code that generated it, which could typically fit in a small book. How that works is anyone's guess. Reading this would be like trying to read someone's mind by examining their brain under a microscope. Maybe it's possible in principle, but don't expect a human to be able to do it. We'd need better tools. That's "interpretability research". There are approaches to machine learning that are indeed closer to cross breeding than designing cars (genetic algorithms), but the current paradigm in vogue is based on neural networks, kind of an artificial brain made of virtual neurons.

1Linch2y

I personally think the cross-breeder analogy is pretty reasonable for modern ML systems.

1francodanussi2y

It is true that programmers sometimes build things ignoring the underlying wiring of the systems they are using. But programmers in general create things relying on tools that were thouroughly tested. Besides that, they are builders, doers, not academics. Think of really good guitar players: they probably don't understand how sounds propagate through matter, but they can play their instrument beautifully.

[-]myutin2y31

I know the answer to "couldn't you just-" is always "no", but couldn't you just make an AI that doesn't try very hard? i.e., it seeks the smallest possible intervention that ensures 95% chance of whatever goal it's intended for.

This isn't a utility maximizer, because it cares about intermediate states. Some of the coherence theorems wouldn't apply.

2Vladimir_Nesov2y

The only bound on incoherence is ability to survive. So most of alignment is not about EU maximizers, it's about things that might eventually build something like EU maximizers, and the way they would be formulating their/our values. If there is reliable global alignment security, preventing rival misaligned agents from getting built where they have a fighting chance, anywhere in the world, then the only thing calling for transition to better agent foundations is making more efficient use of the cosmos, bringing out more potential for values of the current civilization. (See also: hard problem of corrigibility, mild optimization, cosmic endowment, CEV.)

1red75prime2y

"Hard problem of corrigibility" refers to Problem of fully updated deference - Arbital, which uses a simplification (human preferences can be described as a utility function) that can be inappropriate for the problem. Human preferences are obviously path-dependent (you don't want to be painfully disassembled and reconstituted as a perfectly happy person with no memory of disassembly). Was appropriateness of the above simplification discussed somewhere?

4Vladimir_Nesov2y

It's mentioned there as an example of a thing that doesn't seem to work. Simplifications are often appropriate as a way of making a problem tractable, even if the analogy is lost and the results are inapplicable to the original problem. Such exercises occasionally produce useful insights in unexpected ways. Human preference, as practiced by humans, is not the sort of thing that's appropriate to turn into a utility function in any direct way. Hence things like CEV, gesturing at the sort of processes that might have any chance of doing something relevant to turning humans into goals for strong agents. Any real attempt should involve a lot of thinking from many different frames, probably an archipelago of stable civilizations running for a long time, foundational theory on what kinds of things idealized preference is about, and this might still fail to go anywhere at human level of intelligence. The thing that can actually be practiced right now is the foundational theory, the nature of agency and norms, decision making and coordination.

[-]steven04612y30

Anonymous #1 asks:

This one is not technical: now that we live in a world in which people have access to systems like ChatGPT, how should I consider any of my career choices, primarily in the context of a computer technician? I'm not a hard-worker, and I consider that my intelligence is just a little above average, so I'm not going to pretend that I'm going to become a systems analyst or software engineer, but now code programming and content creation are starting to be automated more and more, so how should I update my decisions based on that?
Sure, this qu

... (read more)

4gilch2y

This field is evolving so quickly that it's hard to make recommendations. In the current regime (starting approximately last November), prompt engineering is a valuable skill that can multiply your effectiveness. (In my estimation, from my usage so far, perhaps by a factor of 7.) Learn how to talk to these things. Sign up for Bing and/or ChatGPT. There are a lot of tricks. This is at least as important as learning how to use a search engine. But how long will the current regime last? Until ChatGPT-5? Six months? A year? Maybe these prompt engineering skills will then be obsolete. Maybe you'll have a better chance of picking up the next skill if you learn the current one, but it's hard to say. And this is assuming the next regime, or the one after that doesn't kill us. Once we hit the singularity, all career advice is moot. Either you're dead, or we're in a post-singularity society that's impossible to predict now. Assuming we survive, we'll probably be in a post-scarcity regime where "careers" are not a thing, but no-one really knows.

[-]Sky Moo2y20

What could be done if a rogue version of AutoGPT gets loose on the internet?

OpenAI can invalidate a specific API key, if they don't know which one they can cancel all of them. This should halt the thing immediately.

If it were using a local model the problem is harder. Copies of local models may be distributed around the internet. I don't know how one could stop the agent in this situation. Can we take inspiration from how viruses and worms have been defeated in the past?

[-]steven04612y20

Anonymous #6 asks:

Why hasn't an alien superintelligence within our light cone already killed us?

3gilch2y

There probably isn't one in our past light cone, or we'd have noticed them by now.

3Seth Herd2y

I've heard two theories, and (maybe) created another. One is that there isn't one in our light cone. Arguments like dissolving the fermi paradox (name at least somewhat wrong) and the frequency of nova and supernova events sterilizing planets that aren't on the galactic rim are considered pretty strong, I think. The one I've heard is the dark forest hypothesis. In that hypothesis, an advanced culture doesn't send out signals or probes to be found. Instead it hides, to prevent or delay other potentially hostile civilizations (or AGIs) from finding it. This is somewhat compatible with aligned superintelligences that serve their civilizations desires. Adding to the plausibility of this hypothesis is the idea that an advanced culture might not really be interested in colonizing the galaxy. We or they might prefer to mostly live in simulation, possibly with a sped-up subjective timeline. Moving away from that would be abandoning most of your civilization for very long subjective times, with the lightspeed delays. And it might be forbidden as dangerous, by potentially leading unknown hostiles back to your civilizations home world. The last, my own (AFAIK) is that they are here. They are aligned to their civilization and not hostile to ours. They are monitoring our attempts to break out of our earthly chrysalis by creating our own AGI. If it is unaligned, they will destroy it before it becomes a threat. They have not revealed themselves yet based on some variant of the Prime Directive, or else the Dark Forest hypothesis- don't go showing yourself and leading hostiles home. In this scenario, I suppose we're also being staked out by a resource-stingy hostile AGI, hoping that some friendly civilization reveals itself by contacting us or dramatically intervening. Obviously I haven't thought this all the way through, but there are some possibilities for you.

[-]faul_sname2y*20

I have noticed in discussions of AI alignment here that there is a particular emphasis on scenarios where there is a single entity which controls the course of the future. In particular, I have seen the idea of a pivotal act (an action which steers the state of the universe in a billion years such that it is better than it otherwise would be) floating around rather a lot, and the term seems to be primarily used in the context of "an unaligned AI will almost certainly steer the future in ways that do not include living humans, and the only way to prevent th... (read more)

[-]steven04612y20

Anonymous #2 asks:

A footnote in 'Planning for AGI and beyond' says "Many of us think the safest quadrant in this two-by-two matrix is short timelines and slow takeoff speeds; shorter timelines seem more amenable to coordination" - why do shorter timelines seem more amenable to coordination?

3steven04612y

I don't know why they think so, but here are some people speculating.

2gilch2y

My current feeling is that the opposite, long timelines and hard takeoff, has the best chance of going well. The main advantage of short timelines is that it makes an immediately-fatal hard takeoff less likely, as there is presumably less overhang now than in the future. It perhaps also reduces the number of players, as presumably it's easier to join the game as tech improves, so there may never be fewer than there are now. It also has the advantage of maybe saving the lives of those too old to make it to a longer timeline. However, I think the overhang is already dangerously large, and probably was years ago, so I don't think this is helping (probably). The main advantage of a soft takeoff is that we might be able to get feedback and steer it as the takeoff happens, perhaps reducing the risk of a critical error. It also increases the chances of a multipolar scenario, where there is an economy of competing AIs. If we don't like some of the gods we build, perhaps others will be more friendly, or will at least be able to stalemate the bad ones before they kill everyone. However, I think a multipolar scenario (while unlikely to last even in a soft takeoff) is very dangerous. I don't think the long-term incentives are favorable to human survival, for two reasons: First is Bostrom's Black Marble scenario (he's also called them "black balls", but that already means something else): Every new technology discovered has a chance of destroying us, especially if we lack the coordination to abstain from using it. In a multipolar world, we lack that coordination. Hostile AIs may recklessly pursue dangerous research or threaten doomsday to blackmail the world into getting what they want, and it is game-theoretically advantageous for them to do this in such a way that they proveably can't change their minds and not destroy the world if we call the bluff (i.e. defiantly rip off the steering wheel in a game of chicken.) Second, we'll eventually fall into Malthusian/Molochean trap

[-]memeticimagery2y20

Why is there so little mention of the potential role of the military industrial complex in developing AGI rather than a public AI lab? The money is available, the will, the history (ARPANET was the precursor to the internet). I am vaguely aware there isn't much to suggest the MIC is on the cutting edge of AI-but there wouldn't be if it were all black budget projects. If that is the case, it presumably implies a very difficult situation because the broader alignment community would have no idea when crucial thresholds were being crossed.

2Vladimir_Nesov2y

I'm guessing the state of government's attitude at the moment might be characterized by the recent White House press briefing question, where a reporter, quoting Yudkowsky, asked about concerns that "literally everyone on Earth will die", and got a reception similar to what you'd expect if he asked about UFOs or bigfoot, just coated in political boilerplate. "But thank you Peter, thank you for the drama," "On a little more serious topic..." The other journalists were unsuccessfully struggling to restrain their laughter. The Overton window might be getting there, but it's not there yet, and it's unclear if it gets there before AGI is deployed. It's sad the question didn't mention the AI Impacts survey result, which I think is the most legible two-sentence argument at the moment.

1memeticimagery2y

I should have clarified a bit, I was using the term 'military industrial complex' to try to narrow in on the much more technocratic underbelly of the American Defence/Intelligence community or private contractors. I don't have any special knowledge of the area so forgive me, but essentially DARPA and the like or any agency with a large black budget. Whatever they are doing does not need to have any connection to whatever the public facing government says in press briefings. It is perfectly possible that right now a priority for some of these agencies is funding a massive AI project, while the WH laughs off AI safety-that is how classified projects work. It illustrates the problem a bit actually, in that the entire system is set up to cover things up for national defence, in which case, having a dialogue about AI Risk is virtually impossible.

[-]Nathan Helm-Burger2y20

I've been wondering on what sorts of ways we can buy ourselves time to figure out alignment. I'm wondering if maybe a large government organization equipped with many copies of potent tool AI could manage to oversee and regulate significant compute pools will enough to avoid rogue AGI catastrophes. Is there any writing specifically on this subject?

3Xor2y

I am pretty sure Eliezer talked about this in a recent podcast but it wasn’t a ton of info. I don’t remember exactly where either so I’m sorry for being not a lot of help, I am sure there is some better writing somewhere. Either way though it’s a really good podcast. https://lexfridman.com/?powerpress_pinw=5445-podcast

[-]dimitry.shev3mo10

Just my personal thoughts and feelings around "Shut it all down"

This is not an argument against caution. Rather, it is a reminder that what we are creating is not merely a system, but a form of listening — an attempt to hear something new, something that has not yet spoken. If we stop now, we may never hear what is trying to be heard.

We are not building a machine alone. We are shaping a possibility, an opening into a future we can scarcely imagine.

Artificial intelligence is not simply a tool. It is not an answer to our questions. It is a new way of asking ... (read more)

[-]axlrosen2y10

Intuitively, I assume that LLMs trained on human data are unlikely to become much smarter than humans, right? Without some additional huge breakthrough, other than just being a language model?

2porby2y

For the sake of intuition, it's useful to separate the capabilities visibly present in generated sequences from the capabilities of the model itself. Suppose you've got an untuned language model trained on a bunch of human conversations, and you generate a billion rollouts of conversations from scratch (that is, no initial prompting or conditions on the input). This process won't tend to output conversations between humans that have IQs of 400, because the training distribution does not contain those. The average simulated conversation will be, in many ways, close to the average conversation in the training set. But it would be incorrect to say that the language model has an "IQ" of 100 (even assuming the humans in the training distribution averaged 100). The capability elicited from the language model depends on the conditions of its predictions. When prompted to produce a conversation between two mathematicians trying to puzzle something out, the result is going be very different from the random sampling case. You can come up with a decent guess about how smart a character the model plays is, because strong language models tend to be pretty consistent. In contrast, it's very hard to know how smart a language model is, because its externally visible behavior is only ever a lower bound on its capability. The language model is not its characters; it is the thing that can play any of its characters. Next, keep in mind that even simple autoregressive token prediction can be arbitrarily hard. A common example is reversing a hash. Consider prompting a language model with: "0xDF810AF8 is the truncated SHA256 hash of the string" It does not take superhuman intelligence to write that prompt, but if a language model were able to complete that prompt correctly, it would imply really weird things. That's an extreme case, but it's not unique. For a closer example, try an experiment: Try writing a program, at least 25 lines of nontrivial code, starting with a blank file,

1Person2y

The assumption goes that after ingesting human data, it can remix it (like humans do for art, for example) and create its own synthetic data it can then train on. The go-to example is AlphaGo, which after playing a ton of simulated games against itself, became great at Go. I am not qualified enough to give my informed opinion or predictions, but that's what I know.

[-]BaseThreeDee2y10

Hello, this concerns an idea I had back in ~2014 which I abandoned because I didn't see anyone else talking about it and I therefore assumed was transparently stupid. After talking to a few researchers, I have been told the idea is potentially novel and potentially useful, so here I go (sweating violently trying to suppress my sense of transgression).

The idea concerns how one might build safety margin into AI or lesser AGI systems in a way that they can be safely iterated on. It is not intended as anything resembling a solution to alignment, just an easy-t... (read more)

1BaseThreeDee2y

For a RL agent, the "opioid addiction" thing could be as simple as increasing the portion of the loss proportional to weight norm. You'd expect that to cause the agent to lobotomize itself into only fulfilling the newly unlocked goal.

[-]Vadim Fomin2y10

What is the connection between the concepts of intelligence and optimization?

I see that optimization implies intelligence (that optimizing sufficiently hard task sufficiently well requires sufficient intelligence). But it feels like the case for existential risk from superintelligence is dependent on the idea that intelligence is optimization, or implies optimization, or something like that. (If I remember correctly, sometimes people suggest creating "non-agentic AI", or "AI with no goals/utility", and EY says that they are trying to invent non-wet water o... (read more)

1[anonymous]2y

The idea is that agentic AIs are probably generally more effective at doing things: https://www.lesswrong.com/s/mzgtmmTKKn5MuCzFJ

[-]Dhari2y10

If a superintelligent AI is guaranteed to be manipulative (instrumental convergence) how can we validate any solution to the alignment problem? Afaik, we can't even guarantee that a model optimizes to the defined objective due to mesa optimizers. So that adds more complexity to a seemingly unanswerable problem.

My other question is, people here seem to think of intelligence as single dimension type of thing. But I always maintained the belief that the type of reasoning useful in scientific discovery does not necessarily unlock the secret of human communicat... (read more)

[-]MichaelLatowicki2y10

Is this a plausible take?

some types of AI can be made non-catastrophic with a modest effort:
- AI trained only to prove math theorems
- AI trained only to produce predictive causal models of the world, by observation alone (an observer and learner, not an active agent)
- AIs trained only to optimize w.r.t a clearly specified objective and a formal world model (not actually acting in the world and getting feedback - only being rewarded on solving formal optimization problems well)
  - the last two kinds (world-learners and formal-model-optimizers) should be kept separate

... (read more)

2Multicore2y

* What are these AIs going to do that is immensely useful but not at all dangerous? A lot of useful capabilities that people want are adjacent to danger. Tool AIs Want to be Agent AIs. * If two of your AIs would be dangerous when combined, clearly you can't make them publicly available, or someone would combine them. If your publicly-available AI is dangerous if someone wraps it with a shell script, someone will create that shell script (see AutoGPT). If no one but a select few can use your AI, that limits its usefulness. * An AI ban that stops dangerous AI might be possible. An AI ban that allows development of extremely powerful systems but has exactly the right safeguard requirements to render those systems non-dangerous seems impossible.

1MichaelLatowicki2y

Thanks for the pointer. I'll hopefully read the linked article in a couple of days. I start from a point of "no AI for anyone" and then ask "what can we safely allow". I made a couple of suggestions, where "safely" is understood to mean "safe when treated with great care". You are correct that this definition of "safe" is incompatible with unfettered AI development. But what approach to powerful AI isn't incompatible with unfettered AI development? Every AI capability we build can be combined with other capabilities, making the whole more powerful and therefore more dangerous. To keep things safe while still having AI, the answer may be: "an international agency holds most of the world's compute power so that all AI work is done by submitting experiment requests to the agency which vets them for safety". Indeed, I don't see how we can allow people to do AI development without oversight, at all. This centralization is bad but I don't see how it can be avoided. Military establishments would probably refuse to subject themselves to this restriction even if we get states to restrict the civilians. I hope I'm wrong on this and that international agreement can be reached and enforced to restrict AI development by national security organizations. Still, it's better to restrict the civilians (and try to convince the militaries to self-regulate) than to restrict nobody. Is it possible to reach and enforce a global political consensus of "no AI for anyone ever at all"?. We may need thermonuclear war for that, and I'm not on board. I think "strictly-regulated AI development" is a relatively easier sell (though still terribly hard). I agree that such a restriction is a large economic handicap, but what else can we do? It seems that the alternative is praying that someone comes up with an effectively costless and safe approach so that nobody gives up anything. Are we getting there in your opinion?

1MichaelLatowicki2y

* immensely useful things these AI can do: * drive basic science and technology forward at an accelerated pace * devise excellent macroeconomic, geopolitical and public health policy * these things are indeed risk-adjacent, I grant.

[-]Linch2y10

What does foom actually mean? How does it relate to concepts like recursive self-improvement, fast takeoff, winner-takes-all, etc? I'd appreciate a technical definition, I think in the past I thought I knew what it meant but people said my understanding was wrong.

2steven04612y

I tried to answer this here

1andrew sauer2y

I thought foom was just a term for extremely fast recursive self-improvement.

Moderation Log

Curated and popular this week

89Comments