LESSWRONG
LW

All of Droopyhammock's Comments + Replies

First of all, I basically agree with you. It seems to me that in scenarios where we are preserved, preservation is likely to be painless and most likely just not experienced by those being preserved.

But, my confidence that this is the case is not that high. As a general comment, I do get concerned that a fair amount of pushback on the likelihood of s-risk scenarios is based on what “seems” likely.

I usually don’t disagree on what “seems” likely, but it is difficult for me to know if “seems” means a confidence level of 60%, or 99%.

Droopyhammock's Shortform

Droopyhammock1y52

Should we be worried about being preserved in an unpleasant state?

I’ve seen surprisingly little discussion about the risk of everyone being “trapped in a box for a billion years”, or something to that affect. There are many plausible reasons why keeping us around could be worth it, such as to sell us to aliens in the future. Even if it turns out to be not worth it for an AI to keep us around, it may take a long time for it to realise this.

Should we not expect to be kept alive, atleast until an AI has extremely high levels of confidence that we aren’t useful? If so, is our state of being likely to be bad while we are preserved?

This seems like one of the most likely s-risks to me.

4Vladimir_Nesov1y

Archiving as data (to be reconstructed as needed) seems cheaper and more agreeable with whatever use is implied for humanity in this scenario. Similarly with allowing humans to have a nice civilization, if humanity remaining alive uninterrupted is a consideration.

AGI isn't just a technology

Droopyhammock2y31

In a similar vein to this, I think that AI’s being called “tools” is likely to be harmful. It is a word which I believe downplays the risks, while also objectifying the AI’s. The objectification of something which may actually be conscious seems like an obvious step in a bad direction.

2Seth Herd2y

Yes, I think "tools" is an even more obvious red flag that this person isn't thinking about an agentic, self-aware system.

Droopyhammock's Shortform

Droopyhammock2y10

Takeover speeds?

For the purpose of his shortform, I am considering “takeover” to start when crazy things begin happening or it is clear that an unaligned AGI/AGI’s are attempting to takeover. I consider “takeover“ to have ended when humanity is extinct or similarly subjugated. This is also under the assumption that a takeover does happen.

From my understanding of Eliezer’s views, he believes takeover will be extremely fast (possibly seconds). Extremely fast takeovers make a lot more sense if you assume that a takeover will be more like a sneak attack.

How fa... (read more)

Droopyhammock's Shortform

Droopyhammock2y10

Your response does illustrate that there are holes in my explanation. Bob 1 and Bob 2 do not exist at the same time. They are meant to represent one person at two different points in time.

A separate way I could try to explain what kind of resurrection I am talking about is to imagine a married couple. An omniscient husband would have to care as much about his wife after she was resurrected as he did before she died.

I somewhat doubt that I could patch all of the holes that could be found in my explanation. I would appreciate it if you try to answer what I am trying to ask.

1nim2y

What I'm hearing here is that you want me to make up a version of your initial question that's coherent, and offer an answer that you find satisfying. However, I have already proposed a refinement of your question that seems answerable, and you've rejected that refinement as missing the point. If you want to converse with someone capable of reading your mind and discerning not only what answer you want but also what question you want the answer to, I'm sorry to inform you that I am unable to use those powers on you at this time. My inability to provide an answer which satisfies you stems directly from my inability to understand what question you want answered, so I don't think this is a constructive conversation to continue. Thank you for your time and discourse in challenging me to articulate why your seemingly intended question seems unanswerable, even though I don't think I've articulated that in a way that's made sense to you.

Adumbrations on AGI from an outsider

Droopyhammock2y70

I seem to remember your P(doom) being 85% a short while ago. I’d be interested to know why it has dropped to 70%, or in another way of looking at it, why you believe our odds of non-doom have doubled.

Daniel Kokotajlo2y*162

Whereas my timelines views are extremely well thought-through (relative to most people that is) I feel much more uncertain and unstable about p(doom). That said, here's why I updated:

Hinton and Bengio have come out as worried about AGI x-risk; the FLI letter and Yudkowsky's tour of podcasts, while incompetently executed, have been better received by the general public and elites than I expected; the big labs (especially OpenAI) have reiterated that superintelligent AGI is a thing, that it might come soon, that it might kill everyone, and that regulation is... (read more)

Droopyhammock's Shortform

Droopyhammock2y50

I have edited my shortform to try to better explain what I mean by “the same”. It is kind of hard to do so, especially as I am not very knowledgeable on the subject, but hopefully it is good enough.

1nim2y

This supposes that Bob 1 knows about Bob 2's experiences. That seems impossible if Bob 1 died before Bob 2 came into being, which is what's typically understood by the term "resurrect" used in the context of death ("restore (a dead person) to life."). If Bob 1 and Bob 2 exist at the same time, whatever's happening is probably not resurrection. Let's stick with standard resurrection though: Bob 1 dies and then Bob 2 comes into existence. We're measuring their sameness, at your request, by the expected sentiment of each toward the other. If I was unethical researcher in the present day, I could name a child Bob 2 and raise it to be absolutely certain that it was the reincarnation of Bob 1. It would be nice if the child happened to share some genes with Bob 1, but not absolutely essential. The child would not have an easy life, as it would be accused of various mental disorders and probably identity theft, but it would technically meet the "sameness is individual belief" criterion that you require. As an unethical researcher, I would of course select the individual Bob 1 to be someone who believes that reincarnation is possible, and thus cares about the wellbeing of their expected reincarnated self (whom they probably define as 'the person who believes they're my reincarnation', because most people don't think adversarially about such things) as much as they care about their own. There you go, a hypothetical pair of individuals who meet your criteria, created using no technology more advanced than good ol' cult brainwashing. So for this definition, I'd say the percentage chance that it's possible matches the percentage chance that someone would be willing to set their qualms aside and ruin Bob 2's life prospects for the sake of the experiment. (yes, this is an unsatisfying answer, but I hope it might illustrate something useful if you see how its nature follows directly from the nature of your question)

Droopyhammock's Shortform

Droopyhammock2y*50

Do you believe that resurrection is possible?

By resurrection I mean the ability to bring back people, even long after they have died and their body has decayed or been destroyed. I do not mean simply bringing someone back who has been cryonically frozen. I also mean bringing back the same person who died, not simply making a clone.

I will try to explain what I mean by “the same”. Lets call the person before they died “Bob 1” and the resurrected version ”Bob 2”. Bob 1 and Bob 2 are completely selfish and only care about themselves. In the version of resurrec... (read more)

2nim2y

I believe we'll build eventually systems that we call resurrection, and which some people believe qualify as it. You haven't provided enough information for me to guess whether we'll build systems that you believe qualify as it, though. If I brought you someone and said "this is your great-grandparent resurrected", how would you decide whether you believed that resurrection was real? If I brought you someone and said "this is your ancestor from 100 generations ago resurrected", how would you decide whether you believed that resurrection was real? If I brought you someone and said "this is Abraham Lincoln resurrected", how would you decide whether you believed that resurrection was real? If I brought you someone and said "this is a member of the species Homo Erectus resurrected", how would you decide whether you believed that resurrection was real? You must explain what you mean by "the same" before anyone can give you a useful answer about how likely it is that such a criterion will ever be met.

But why would the AI kill us?

Droopyhammock2y60

I just want to express my surprise at the fact that it seems that the view that the default outcome from unaligned AGI is extinction is not as prevalent as I thought. I was under the impression that literally everyone dying was considered by far the most likely outcome, making up probably more than 90% of the space of outcomes from unaligned AGI. From comments on this post, this seems to not be the case.

I am know distinctly confused as to what is meant by “P (doom)”. Is it the chance of unaligned AGI? Is it the chance of everyone dying? Is it the chance of just generally bad outcomes?

Droopyhammock's Shortform

Droopyhammock2y60

Is there something like a pie chart of outcomes from AGI?

I am trying to get a better understanding of the realistic scenarios and their likelihoods. I understand that the likelihoods are very disagreed upon.

My current opinion looks a bit like this:

30%: Human extinction

10%: Fast human extinction

20%: Slower human extinction

30%: Alignment with good outcomes

20%: Alignment with at best mediocre outcomes

20%: Unaligned AGI, but at least some humans are still alive

12%: We are instrumentally worth not killing

6%: The AI wireheads us

2%: S-risk... (read more)

2Vladimir_Nesov2y

I think the scenario of "aligned AI, that then builds a stronger ruinous misaligned AI" deserves a special mention. I was briefly unusually hopeful last fall, after concluding that LLMs have a reasonable chance of loose NotKillEveryone-level alignment, but then realized that they also have a reasonable chance of starting out as autonomous AGIs at mearely near-human level (in rationality/coordination), in which case they are liable to build ruinous misaligned AGIs for exactly the same reasons the humans are currently rushing ahead, or under human instruction to do so, just faster. I'm still more hopeful than a year ago, but not by much, and most of my P(doom) is in this scenario. I worry that a lot of good takes on alignment optimism are about alignment of first AGIs and don't at all take into account this possibility. An aligned superintelligence won't sort everything else out if it's not a superintelligence yet or if it's still under human control (in a sense that's distinct from alignment).

Droopyhammock's Shortform

Droopyhammock2y50

I have had more time to think about this since I posted this shortform. I also posted a shortform after that which asked pretty much the same question, but with words, rather than just a link to what I was talking about (the one about why is it assumed an AGI would just use us for our atoms and not something else).

I think that there is a decent chance that an unaligned AGI will do some amount of human experimentation/ study, but it may well be on a small amount of people, and hopefully for not very long.
To me, one of the most concerning w... (read more)

Droopyhammock's Shortform

Droopyhammock2y60

Quick question:

How likely is AGI within 3 months from now?

For the purpose of this question I am basically defining AGI as the point at which, if it is unaligned, stuff gets super weird. By “Super weird“ I mean things that are obvious to the general public, such as everybody dropping dead or all electronics being shut down or something of similar magnitude. For the purposes of this question, the answer can’t be “already happened” even if you believe we already have AGI by your definition.

I get the impression that the general opinion is “pretty unlikely” but... (read more)

Droopyhammock's Shortform

Droopyhammock2y95

This seems like a good way to reduce S-risks, so I want to get this idea out there.

This is copied from the r/SufferingRisk subreddit here: https://www.reddit.com/r/SufferingRisk/wiki/intro/

As people get more desperate in attempting to prevent AGI x-risk, e.g. as AI progress draws closer & closer to AGI without satisfactory progress in alignment, the more reckless they will inevitably get in resorting to so-called "hail mary" and more "rushed" alignment techniques that carry a higher chance of s-risk. These are less careful and "principled"/formal... (read more)

AI: Practical Advice for the Worried

Droopyhammock2y1711

Not necessarily

Suicide will not save you from all sources of s-risk and may make some worse. If quantum immortality is true, for example. If resurrection is possible, this then makes things more complicated.

The possibility for extremely large amounts of value should also be considered. If alignment is solved and we can all live in a Utopia, then killing yourself could deprive yourself of billions+ years of happiness.

I would also argue that choosing to stay alive when you know of the risk is different from inflicting the risk on a new being you have created... (read more)

2avturchin2y

AI may kill all humans, but it will preserve all our texts forever. Even will internalise them as training data. Thus it is rational either publish as much as possible, – or write nothing. -- Cowardy AI can create my possible children even if I don't have children.

AI: Practical Advice for the Worried

Droopyhammock2y93

S-risks can cover quite a lot of things. There are arguably s-risks which are less bad than x-risks, because although there is astronomical amounts of suffering, it may be dwarfed by the amount of happiness. Using common definitions of s-risks, if we simply took Earth and multiplied it by 1000 so that we have 1000 Earths, identical to ours with the same amount of organisms, it would be an s-risk. This is because the amount of suffering would be 1000 times greater. It seems to me that when people talk about s-risks they often mean somewhat different things.... (read more)

AI: Practical Advice for the Worried

Droopyhammock2y144

A consideration which I think you should really have in regards to whether you have kids or not is remembering that s-risks are a thing. Personally, I feel very averse to the idea of having children, largely because I feel very uncomfortable about the idea of creating a being that may suffer unimaginably.

There are certainly other things to bare in mind, like the fact that your child may live for billions of years in utopia, but I think that you really have to bare in mind that extremely horrendous outcomes are possible.

It seems to me that the likelihood of... (read more)

7Anirandis2y

I'm a little confused by the agreement votes with this comment - it seems to me that the consensus around here is that s-risks in which currently-existing humans suffer maximally are very unlikely to occur. This seems an important practical question; could the people who agreement-upvoted elaborate on why they find this kind of thing plausible? The examples discussed in e.g. the Kaj Sotala interview linked later down the chain tend to regard things like "suffering subroutines", for example.

3johnlawrenceaspden2y

Doesn't any such argument also imply that you should commit suicide?

4Jacob Watts2y

You said that multiple people have looked into s-risks and consider them of similar likelihood to x-risks. That is surprising to me and I would like to know more. Would you be willing to share your sources?

Droopyhammock's Shortform

Droopyhammock2y64

It doesn’t seem to me that you have addressed the central concern here. I am concerned that a paperclip maximiser would study us.

There are plenty of reasons I can imagine for why we may contain helpful information for a paperclip maximiser. One such example could be that a paperclip maximiser would want to know what an alien adversary may be like, and would decide that studying life on Earth should give insights about that.

Incentives and Selection: A Missing Frame From AI Threat Discussions?

Droopyhammock2y10

This is why I hope that we either contain virtually no helpful information, or at least that the information is extremely quick for an AI to gain.

Droopyhammock's Shortform

Droopyhammock2y65

Why is it assumed that an AGI would just kill us for our atoms, rather than using us for other means?

There are multiple reasons I understand for why this is a likely outcome. If we pose a threat, killing us is an obvious solution, although I’m not super convinced killing literally everyone is the easiest solution to this. It seems to me that the primary reason to assume an AGI will kill us is just that we are made of atoms which can be used for another purpose.

If there is a period where we pose a genuine threat to an AGI, then I can understand the as... (read more)

-3[anonymous]2y

In short, surveillance costs (e.g., "make sure they aren't plotting against you and try detonating a nuke or just starting a forest fire out of spite") might be higher than the costs of simply killing the vast majority of people. Of course, there is some question to be had about whether it might consider it worthwhile to study some 0.00001% of humans locked in cages, but again that might involve significantly higher costs than if it just learned how to recreate humans from scratch as it did a lot of other learning about the world. But I'll grant that I don't know how an AGI would think or act, and I can't definitively rule out the possibility, at least within the first 100 years or so.

4Vladimir_Nesov2y

Caring about our well-being is similar to us being interesting to study, both attitudes are paying attention to us specifically, whether because we in particular made it into ASI's values (a fragile narrow target), or because graceful extrapolation of status quo made it into their values (which I think is more likely), so that the fact that we've been living here in the past becomes significant. So if alignment is unlikely, s-risk is similarly unlikely. And if alignment works via robustness of moral patienthood (for ASIs that got to care about such concepts), it's a form of respecting boundaries, so probably doesn't pose s-risk. There might also be some weight to trade with aliens argument, if in a few billions of years our ASI makes contact with an alien-aligned alien ASI that shares their builders' assignment of moral patienthood to a wide range of living sapient beings. Given that the sky is empty, possibly for a legible reason even, and since the amount of reachable stuff is not unbounded, this doesn't seem very likely. Also, the alien ASI would need to be aligned, though a sapient species not having fingers might be sufficient to get there, getting a few more millions of years of civilization and theory before AGI. But all this is likely to buy humanity is a cold backup, which needs to survive all the way to a stable ASI, through all intervening misalignments, and encryption strong enough to be unbreakable by ASIs is not too hard, so there might be some chance of losing the backup even if it's initially made.

2Dagon2y

I think you need to refine your model of "us". There is no homogeneous value for the many billions of humans, and there's a resource cost to keeping them around. Averages and sums don't matter to the optimizer. There may be value in keeping some or many humans around, for some time. It's not clear that you or I will be in that set, or even how big it is. There's a lot of different intermediate equilibria that may make it easier to allow/support something like an autonomous economy to keep sufficient humans aligned with it's needs. Honestly, self-reproducing self-organizing disposable agents, where the AI controls them at a social/economic level, seems pretty resource-efficient.

Droopyhammock's Shortform

Droopyhammock2y66

Can someone please tell me why this S-risk is unlikely?

It seems almost MORE likely than extinction to me.

https://www.reddit.com/r/SufferingRisk/comments/113fonm/introduction_to_the_human_experimentation_srisk/?utm_source=share&utm_medium=ios_app&utm_name=iossmf

2the gears to ascension2y

once you have the technology to do them, brain scans are quick and easy. It is not necessary to simulate all the way through a human brain in order to extract the information in it - there are lossless abstractions that can be discovered which will greatly speed up insights from brains. humans have not yet found them but particularly strong AIs could either get very close to them or could actually find them. In that sense I don't think the high thermal cost suffering simulation possibility is likely. however it does seem quite plausible to me that if we die we get vacuumed up first and used for parts.

1[comment deleted]2y

1span12y

Why do you think it more likely than extinction?

Droopyhammock's Shortform

Droopyhammock2y3-4

Is it possible that the fact we are still alive means that there is a core problem to the idea of existential risk from AI?

There are people who think that we already have AGI, and this number has only grown with the recent Bing situation. Maybe we have already passed the threshold for RSI, maybe we passed it years ago.

Is there something to the idea that you can slightly decrease your pdoom for every day we are still alive?

It seems possible to me that AI will just get better and better and we’ll just continue to raise the bar for when it is going to kill us... (read more)

2jimrandomh2y

Current AI systems aren't capable of doing large-scale software development in non-AI domains. Eg there is no AI-written operating system, compiler, or database (even though minimal versions of these are often done by individual students as single semester capstone projects). I don't think we can infer much of anything from the fact that recursive self improvement hasn't happened yet until after that threshold is crossed.

Hashing out long-standing disagreements seems low-value to me

Droopyhammock2y30

Do you think that the cause of the disagreements is mostly emotional or mostly factual?

Emotional being something like someone not wanting to be convinced of something that will raise their pdoom by a lot. This can be on a very subconscious level.

Factual being that they honestly just don’t agree, all emotions aside.

So yeah, I’m asking what you think is “mostly” the reason.

Droopyhammock's Shortform

Droopyhammock2y10

In this context, what I mean by “aligned” is something like won’t prevent itself being shut off and will not do things that could be considered bad, such as hacking or manipulating people.

My impression was that actually being able to give an AI a goal is something that might be learnt at some point. You said “A task, maybe?”. I don’t know what the meaningful distinction is between a task and a goal in this case.

I won’t be able to keep up with the technical side of things here, I just wanted my idea to be out there, in case it is helpful in some way.

2Vladimir_Nesov2y

What's the point of that? It's mostly unfamiliar, not really technical. I wish there was something technical that would be relevant to say on the topic.

Droopyhammock's Shortform

Droopyhammock2y10

Can someone explain to me why this idea would not work?

This is a proposal of a way to test if an AGI has safeguards active or not, such as allowing itself to be turned off.

Perhaps we could essentially manufacture a situation in which the AGI has to act fast to prevent itself from being turned off. Like we could make it automatically turn off after 1 minute say, this could mean that if it is not aligned properly it has no choice but to try prevent that. No time for RSI, no time to bide it’s time.

Basically if we put the AGI in a situation where i... (read more)

2Vladimir_Nesov2y

That's not a thing that people know how to do. A task, maybe? Wrapper-minds don't seem likely as first AGIs. But they might come later, heralding transitive AI risk. What's "aligned" here? It's an umbrella term for things that are good with respect to AI risk, and means very little in particular. In the context of something feasible in practice, it means even less. Like, are you aligned?

Droopyhammock's Shortform

Droopyhammock2y70

I wonder how much the AI alignment community will grow in 2023. As someone who only properly became aware of the alignment problem a few months ago, with the release of ChatGPT, it seems like the world has gone from nearly indifferent to AI to obsessed with it. This will lead to more and more people researching things about AI and it will also lead to more and more people becoming aware of the alignment problem.

I really hope that this leads to more of the right kind of attention for AI safety issues. It might also mean that it’s easier to get highly skilled people to work on alignment and take it seriously.

Droopyhammock's Shortform

Droopyhammock2y10

Is an 8-year median considered long or short or about average? I’m specifically asking in relation to the opinion of people who pay attention to AGI capabilities and are aware of the alignment problem. I’m just hoping you can give me an idea of what is considered “normal” among AGI/ alignment people in regards to AGI timelines.

Droopyhammock's Shortform

Droopyhammock2y10

I’m just a layperson so I don’t understand much of this, but some people on the machine learning subreddit seem to think this means AGI is super close. What should I make of that? Does this update timelines to be significantly shorter?

2Vladimir_Nesov2y

As someone with 8-year median timelines to AGI, I don't see this as obviously progress, but it frames possible applications that would be. The immediate application might be to get rid of some hallucinated factually incorrect claims, by calling tools for fact-checking with small self-contained queries and possibly self-distilling corrected text. This could bring LLM-based web search (for which this week's announcements from Microsoft and Google are starting a race) closer to production quality. But this doesn't seem directly AGI-relevant. A more loose AGI-relevant application is to use this for teaching reliable use of specific reasoning/agency skills, automatically formulating "teachable moments" in the middle of any generated text. An even more vague inspiration from the paper is replacing bureaucracies (which is the hypothetical approach to teaching reasoning/agency skills when not already having them reliably available) with trees of (chained) tool calls, obviating the need to explicitly set up chatrooms where multiple specialized chatbots discuss a problem with each other at length to make better progress than possible to do immediately/directly.

Droopyhammock's Shortform

Droopyhammock2y50

What is your take on this?

https://arxiv.org/abs/2302.04761?fbclid=IwAR16Lzg2z1mIbtQ1iZmy7UQAlfIW4HzoufZYYhkKeHBknnevgHtGfdFI5r8

People on the machinelearning subreddit seem to think this is a big deal.

2Gunnar_Zarncke2y

I don't think this is a decisive step but an interesting capability step. It will also have a lot of security issues (depending on the tools used). Keyword: Toolformer

2Vladimir_Nesov2y

The tool calls being spontaneously emitted inline as ordinary tokens is interesting, this interface could be a more HCH-like alternative to chat-like bureaucracy/debate when the tools are other LLMs (or the same LLM).

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y10

(PLEASE READ THIS POST)

Sorry for putting that there, but I am somewhat paranoid about the idea of having the solution and people just not seeing it.

WHY WOULD THIS IDEA NOT WORK?

Basically if we put the AGI in a situation where it... (read more)

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y10

How likely are extremely short timelines?

To prevent being ambiguous, I’ll define “extremely short“ as AGI before 1st July 2024.

I have looked at surveys, which generally suggest the overall opinion to be that it is highly unlikely. As someone who only started looking into AI when ChatGPT was released and gained a lot of public interest, it feels like everything is changing very rapidly. It seems like I see new articles every day and people are using AI for more and more impressive things. It seems like big companies are putting lots more money into AI as we... (read more)

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y00

Yeah I guess it is more viable in a situation where there is a group far ahead of the competition who are also safety conscious. Don’t know how likely that is though.

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y10

What are the groups aiming for (and most likely to achieve) AGI going for in regards to alignment?

Is the goal for the AGI to be controlled or not?

Like is the idea to just make it “good” and let it do whatever is “good”?

Does “good” include “justice“? Are we all going to be judged and rewarded/ punished for our actions? This is of concern to me because plenty of people think that extremely harsh punishments or even eternal punishments are deserved in some cases. I think that having an AGI which dishes out “justice” could be very bad and create S-risks.... (read more)

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y10

How much AI safety work is on caring about the AI’s themselves?

In the paperclip maximiser scenario, for example, I assume that the paperclip maximiser itself will be around for a very long time, and maybe forever. What if it is conscious and suffering?

Is enough being done to try to make sure that even if we do all die, we have not created a being which will suffer forever while it is forced to pursue some goal?

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y10

I’m aware this idea has significant problems (like the ones you outlined), but could it still be better than other options?

We don’t want perfectionism to prevent us from taking highly flawed but still somewhat helpful safety measures.

1mruwnik2y

It could be somewhat helpful, for sure. And certainly better than nothing (unless it creates a false sense of security). Though I doubt it would be adopted, because of how much it would slow things down.

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y10

Would an AI which is automatically turned off every second, for example, be safer?

If you had an AI which was automatically turned off every second (and required to be manually turned on again) could this help prevent bad outcomes? It occurs to me that a powerful AI might be able to covertly achieve its goals even in this situation, or it might be able to convince people to stop the automatic turning off.

But even if this is still flawed, might it be better than alternatives?

It would allow us to really consider the AI’s actions in as much time as we wa... (read more)

1mruwnik2y

Also check out the AI boxing tag

1mruwnik2y

You could make it more convenient by it asking its operators before doing anything - the result would be pretty much the same. The problem with this is that you assume that humans will be able to: * notice what the AI is planning * understand what the AI is planning * foresee the consequences of whatever it is planning which greatly limits the usefulness of the system, in that it won't be able to suggest anything radically different (or radically faster) from what a human would suggest - human oversight is the limiting factor of the whole system. It also encourages the AI to hide what it's doing. There is also the problem that given enough intelligence (computing speed, whatever think-oomph), it'll still be able to think circles around its operators.

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y10

When I say “make as many paperclips as possible in the next hour” I basically mean “undergo such actions that in one hours time will result in as many paperclips as possible” so if you tell the AI to do this at 12:00 it only cares about how many paperclips it has made when the time hits 13:00 and does not care at all about a time past 13:00.

If you make a paperclip maximiser and you don’t specify any time limit or anything, how much does it care about WHEN the paperclips are made. I assume it would rather have 20 now than 20 in a months time, but woul... (read more)

1mruwnik2y

The answer to all of these is a combination of "dunno" and "it depends", in that implementation details would be critical. In general, you shouldn't read too much into the paperclip maximiser, or rather shouldn't go too deep into its specifics. Mainly because it doesn't exist. It's fun to think about, but always remember that each additional detail makes the overall scenario less likely. I was unclear about why I asked for clarification of “make as many paperclips as possible in the next hour”. My point there was that you should assume that whatever is not specified should be interpreted in whatever way is most likely to blow up in your face.

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y20

I do pretty much mean wireheading, but also similar situations where the AI doesn’t go as far as wireheading, like making us eat chocolate forever.

I feel like these scenarios can be broken down into two categories, scenarios where the AI succeeds in “making us happy”, but through unorthodox means, and scenarios where the AI tries, but fails, to “make us happy” which can quickly go into S-risk territory.

The main reason why I wondered if the chance of these kind of outcomes might be fairly high was because “make people happy” seems like the kind of goa... (read more)

1mruwnik2y

Making a wireheading AGI probably would be easier than getting a properly aligned one, because maximisers are generally simpler than properly aligned AGIS, since they have fewer things to do correctly (I'm being very vague here - sorry). That being said, having a coherent target is a different problem than being able to aim it in the first place. Both are very important, but it seems that being able to tell an AI to do something and being quite confident in it doing so (with the ability to correct it in case of problems). I'm cynical, but I reckon that giving a goal like "make people happy" is less likely than "make me rich" or "make me powerful".

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y20

How likely is the “Everyone ends up hooked up to morphine machines and kept alive forever” scenario? Is it considered less likely than extinction for example?

Obviously it doesn’t have to be specifically that, but something to the affect of it.

Also, is this scenario included as an existential risk in the overall X-risk estimates that people make?

1mruwnik2y

Do you mean something more specific than general wireheading? I'd reckon it a lot less likely, just because it's a lot simpler to kill everyone than to keep them alive and hooked up to a machine. That is, there are lots and lots of scenarios where everyone is dead, but a lot fewer where people end up wireheaded. Wireheading is often given as a specific example of outer misalignment, where the agent was told to make people happy and does it in a very unorthodox manner.

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y10

Do AI timeline predictions factor in increases in funding and effort put into AI as it becomes more mainstream and in the public eye? Or are they just based on things carrying on about the same? If the latter is the case then I would imagine that the actual timeline is probably considerably shorter.

Similarly, is the possibility for companies, governments, etc being further along in developing AGI than is publicly known, factored in to AI timeline predictions?

1mruwnik2y

Depends on who's predicting, but usually yes. Although the resulting predictions are much more fuzzy, since you have to estimate how funding will change and how far ahead are the known secret labs, and how many really secret labs are in existence. Then you also have to factor in foreign advances, e.g. China.

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y10

I apologise for the non-conciseness of my comment. I just wanted to really make sure that I explained my concerns properly, which may have lead to me restating things or over-explaining.

It’s good to hear it reiterated that there is recognition of these kind of possible outcomes. I largely made this comment to just make sure that these concerns were out there, not because I thought people weren’t actually aware. I guess I was largely concerned that these scenarios might be particularly likely ones, as supposed to just falling into the general category of po... (read more)

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y10

When do maximisers maximise for?

For example, if an ASI is told to ”make as many paperclips as possible”, when is it maximising for? The next second? The next year? Indefinitely?

If a paperclip maximiser only cared about making as many paperclips as possible over the next hour say, and every hour this goal restarts, maybe it would never be optimal to spend the time to do things such as disempower humanity because it only ever cares about the next hour and disempowering humanity would take too long.

Would a paperclip maximiser rather make 1 thousan... (read more)

1mruwnik2y

Good questions. I think the assumption is that unless it's specified, then without limit. Like if you said "make me as much money as you can" - you probably don't want it stopping any time soon. The same would apply to the colour of the paperclips - seeing as you didn't say they should be red, you shouldn't assume they will be. The issue with maximisers is precisely that they maximise. They were introduced to illustrate the problems with just trying to get a number as high as possible. At some point they'll sacrifice something else of value, just to get a tiny advantage. You could try to provide a perfect utility function, which always gives exactly correct weights for every possible action, but then you'd have solved alignment. Does “make as many paperclips as possible in the next hour” mean “undergo such actions that in one hours time will result in as many paperclips as possible” or “for the next hour do whatever will result in the most paperclips overall, including in the far future”?

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Droopyhammock2y*00

(THIS IS A POST ABOUT S-RISKS AND WORSE THAN DEATH SCENARIOS)

Putting the disclaimer there, as I don’t want to cause suffering to anyone who may be avoiding the topic of S-risks for their mental well-being.

To preface this: I have no technical expertise and have only been looking into AI and it’s potential affects for a bit under 2 months. I also have OCD, which undoubtedly has some affect on my reasoning. I am particularly worried about S-risks and I just want to make sure that my concerns are not being overlooked by the people working on this stuff.

H... (read more)

1Hoagy2y

Not sure why you're being downvoted on an intro thread, though it would help if you were more concise. S-risks in general have obviously been looked at as a possible worst-case outcome by theoretical alignment researchers going back to at least Bostrom, as I expect you've been reading and I would guess that most people here are aware of the possibility. The scenarios you described I don't think are 'overlooked' because they fall into the general pattern of AI having huge power combined with moral systems are we would find abhorrent and most alignment work is ultimately intended to prevent this scenario. Lots of Eliezer's writing on why alignment is hard talks about somewhat similar cases where superficially reasonable rules lead to catastrophes. I don't know if they're addressed specifically anywhere, as most alignment work is about how we might implement any ethics or robust ontologies rather than addressing specific potential failures. You could see this kind of work as implicit in RLHF though, where outputs like 'we should punish people in perfect retribution for intent, or literal interpretation of their words' would hopefully be trained out as incompatible with harmlessness.