All of dsj's Comments + Replies

dsj40

I don’t know much background here so I may be off base, but it’s possible that the motivation of the trust isn’t to bind leadership’s hands to avoid profit-motivated decision making, but rather to free their hands to do so, ensuring that shareholders have no claim against them for such actions, as traditional governance structures might have provided.

3Zac Hatfield-Dodds
Incorporating as a Public Benefit Corporation already frees directors' hands; Delaware Title 8, §365 requires them to "balance the pecuniary interests of the stockholders, the best interests of those materially affected by the corporation’s conduct, and the specific public benefit(s) identified in its certificate of incorporation".
dsj81

(Unless "employees who signed a standard exit agreement" is doing a lot of work — maybe a substantial number of employees technically signed nonstandard agreements.)

Yeah, what about employees who refused to sign? Have we gotten any clarification on their situation?

dsj32

Thank you, I appreciated this post quite a bit. There's a paucity of historical information about this conflict which isn't colored by partisan framing, and you seem to be coming from a place of skeptical, honest inquiry. I'd look forward to reading what you have to say about 1967.

dsjΩ175

Thanks for doing this! I think a lot of people would be very interested in the debate transcripts if you posted them on GitHub or something.

1Ansh Radhakrishnan
Just pasted a few transcripts into the post, thanks for the nudge!
1Sam Bowman
Is there anything you'd be especially excited to use them for? This should be possible, but cumbersome enough that we'd default to waiting until this grows into a full paper (date TBD). My NYU group's recent paper on a similar debate setup includes a data release, FWIW.
dsj32

Okay. I do agree that one way to frame Matthew’s main point is that MIRI thought it would be hard to specify the human value function, and an LM that understands human values and reliably tells us the truth about that understanding is such a specification, and hence falsifies that belief.

To your second question: MIRI thought we couldn’t specify the value function to do the bounded task of filling the cauldron, because any value function we could naively think of writing, when given to an AGI (which was assumed to be a utility argmaxer), leads to all sorts ... (read more)

dsj32

I think this reply is mostly talking past my comment.

I know that MIRI wasn't claiming we didn't know how to safely make deep learning systems, GOFAI systems, or what-have-you fill buckets of water, but my comment wasn't about those systems. I also know that MIRI wasn't issuing a water-bucket-filling challenge to capabilities researchers.

My comment was specifically about directing an AGI (which I think GPT-4 roughly is), not deep learning systems or other software generally. I *do* think MIRI was claiming we didn't know how to make AGI systems safely do mun... (read more)

dsj4-5

Okay, that clears things up a bit, thanks. :) (And sorry for delayed reply. Was stuck in family functions for a couple days.)

This framing feels a bit wrong/confusing for several reasons.

  1. I guess by “lie to us” you mean act nice on the training distribution, waiting for a chance to take over the world while off distribution. I just … don’t believe GPT-4 is doing this; it seems highly implausible to me, in large part because I don’t think GPT-4 is clever enough that it could keep up the veneer until it’s ready to strike if that were the case.

  2. The term “l

... (read more)
2Lauro Langosco
I'm not saying that GPT-4 is lying to us - that part is just clarifying what I think Matthew's claim is. Re cauldron: I'm pretty sure MIRI didn't think that. Why would they?

I think the old school MIRI cauldron-filling problem pertained to pretty mundane, everyday tasks. No one said at the time that they didn’t really mean that it would be hard to get an AGI to do those things, that it was just an allegory for other stuff like the strawberry problem. They really seemed to believe, and said over and over again, that we didn’t know how to direct a general-purpose AI to do bounded, simple, everyday tasks without it wanting to take over the world. So this should be a big update to people who held that view, even if there are still

... (read more)
dsj2-1

Hmm, you say “your claim, if I understand correctly, is that MIRI thought AI wouldn't understand human values”. I’m disagreeing with this. I think Matthew isn’t claiming that MIRI thought AI wouldn’t understand human values.

5Lauro Langosco
I think maybe there's a parenthesis issue here :) I'm saying "your claim, if I understand correctly, is that MIRI thought AI wouldn't (understand human values and also not lie to us)".
dsj51

I think you’re misunderstanding the paragraph you’re quoting. I read Matthew, in that paragraph as acknowledging the difference between the two problems, and saying that MIRI thought value specification (not value understanding) was much harder than it’s looking to actually be.

1Lauro Langosco
I think we agree - that sounds like it matches what I think Matthew is saying.
dsj117

I know this is from a bit ago now so maybe he’s changed his tune since, but I really wish he and others would stop repeating the falsehood that all international treaties are ultimately backed by force on the signatory countries. There are countless trade, climate reduction, and nuclear disarmament agreements which are not backed by force. I’d venture to say that the large majority of agreements are backed merely by the promise of continued good relations and tit-for-tat mutual benefit or defection.

dsj40

A key distinction is between linearity in the weights vs. linearity in the input data.

For example, the function  is linear in the arguments  and  but nonlinear in the arguments  and , since  and  are nonlinear.

Similarly, we have evidence that wide neural networks  are (almost) linear in the parameters , despite being nonlinear in the input data  (due e.g. to nonlinear activation functions such as ReLU). So nonlinear activati... (read more)

dsj3013

The more I stare at this observation, the more it feels potentially more profound than I intended when writing it.

Consider the “cauldron-filling” task. Does anyone doubt that, with at most a few very incremental technological steps from today, one could train a multimodal, embodied large language model (“RobotGPT”), to which you could say, “please fill up the cauldron”, and it would just do it, using a reasonable amount of common sense in the process — not flooding the room, not killing anyone or going to any other extreme lengths, and stopping if asked? I... (read more)

7arabaga
Indeed, isn't PaLM-SayCan an early example of this?
1cubefox
It is worth thinking about why ChatGPT, an Oracle AI which can execute certain instructions, does not fail text equivalents of the cauldron task. It seems the reason why it doesn't fail is that it is pretty good at understanding the meaning of expressions. (If an AI floods the room with water because this maximizes the probability that the cauldron will be filled, then the AI hasn't fully understood the instruction "fill the cauldron", which only asks for a satisficing solution.) And why is ChatGPT so good as interpreting the meaning of instructions? Because its base model was trained with some form of imitation learning, which gives it excellent language understanding and the ability to mimic the the linguistic behavior of human agents. This requires special prompting in a base model, but supervised learning on dialogue examples (instruction tuning) lets it respond adequately to instructions. (Of course, at this stage it would not refuse any dangerous requests, which comes only in with RLHF, which seems a rather imperfect tool.)
4quanticle
On the flip side, as gwern pointed out in his Clippy short story, it's possible for a "neutral" GPT-like system to discover agency and deception in its training data and execute upon those prompts without any explicit instruction to do so from its human supervisor. The actions of a tool-AI programmed with a more "obvious" explicit utility function is easier to predict, in some ways, than the actions of something like ChatGPT, where the actions that it's making visible to you may be a subset (and a deliberately deceptively chosen subset) of all the actions that it is actually taking.
dsj135

Though interestingly, aligning a langchainesque AI to the user’s intent seems to be (with some caveats) roughly as hard as stating that intent in plain English.

dsj3013

The more I stare at this observation, the more it feels potentially more profound than I intended when writing it.

Consider the “cauldron-filling” task. Does anyone doubt that, with at most a few very incremental technological steps from today, one could train a multimodal, embodied large language model (“RobotGPT”), to which you could say, “please fill up the cauldron”, and it would just do it, using a reasonable amount of common sense in the process — not flooding the room, not killing anyone or going to any other extreme lengths, and stopping if asked? I... (read more)

dsj1-5

My guess is “today” was supposed to refer to some date when they were doing the investigation prior to the release of GPT-4, not the date the article was published.

Minerva (from June 2022) used 3e24; there's no way "several orders of magnitude larger" was right when the article was being written. I think the author just made a mistake.

Epoch says 2.2e25. Skimming that page, it seems like a pretty unreliable estimate. They say their 90% confidence interval is about 1e25 to 5e25.

dsj21

Nitpick: the paper from Eloundou et al is called “GPTs are GPTs”, not “GPTs and GPTs”.

1Stephen Fowler
Cheers
dsj10

Probably I should get around to reading CAIS, given that it made these points well before I did.

I found it's a pretty quick read, because the hierarchical/summary/bullet point layout allows one to skip a lot of the bits that are obvious or don't require further elaboration (which is how he endorsed reading it in this lecture).

dsj20

We don’t know with confidence how hard alignment is, and whether something roughly like the current trajectory (even if reckless) leads to certain death if it reaches superintelligence.

There is a wide range of opinion on this subject from smart, well-informed people who have devoted themselves to studying it. We have a lot of blog posts and a small number of technical papers, all usually making important (and sometimes implicit and unexamined) theoretical assumptions which we don’t know are true, plus some empirical analysis of much weaker systems.

We do not have an established, well-tested scientific theory like we do with pathogens such as smallpox. We cannot say with confidence what is going to happen.

dsj10

I agree that if you're absolutely certain AGI means the death of everything, then nuclear devastation is preferable.

I think the absolute certainty that AGI does mean the death of everything is extremely far from called for, and is itself a bit scandalous.

(As to whether Eliezer's policy proposal is likely to lead to nuclear devastation, my bottom line view is it's too vague to have an opinion. But I think he should have consulted with actual AI policy experts and developed a detailed proposal with them, which he could then point to, before writing up an emotional appeal, with vague references to air strikes and nuclear conflict, for millions of lay people to read in TIME Magazine.)

1dr_s
I think the absolute certainty in general terms would not be warranted; the absolute certainty if AGI is being developed in a reckless manner is more reasonable. Compare someone researching smallpox in a BSL-4 lab versus someone juggling smallpox vials in a huge town square full of people, and what probability does each of them make you assign to a smallpox pandemic being imminent. I still don't think AGI would mean necessarily doom simply because I don't fully buy that its ability to scale up to ASI is 100% guaranteed. However, I also think in practice that would matter little, because states might still see even regular AGI as a major threat. Having infinite cognitive labour is such a broken hax tactic it basically makes you Ruler of the World by default if you have an exclusive over it. That alone might make it a source of tension.
dsj10

One's credibility would be less of course, but Eliezer is not the one who would be implementing the hypothetical policy (that would be various governments), so it's not his credibility that's relevant here.

I don't have much sense he's holding back his real views on the matter.

1dr_s
But on the object level, if you do think that AGI means certain extinction, then that's indeed the right call (consider also that a single strike on a data centre might mean a risk of nuclear war, but that doesn't mean it's a certainty. If one listened to Putin's barking, every bit of help given to Ukraine is a risk of nuclear war, but in practice Russia just swallows it up and lets it go, because no one is actually very eager to push that button, and they still have way too much to lose from it). The scenario in which Eliezer's approach is just wrong is if he is vastly overestimating the risk of an AGI extinction event or takeover. This might be the case, or might become so in the future (for example imagine a society in which the habit is to still enforce the taboo, but alignment has actually advanced enough to make friendly AI feasible). It isn't perfect, it isn't necessarily always true, but it isn't particularly scandalous. I bet you lots of hawkish pundits during the Cold War have said that nuclear annihilation would have been preferable to the worldwide victory of Communism, and that is a substantially more nonsensical view.
dsj41

Did you intend to say risk off, or risk of?

If the former, then I don't understand your comment and maybe a rewording would help me.

If the latter, then I'll just reiterate that I'm referring to Eliezer's explicitly stated willingness to trade off the actuality of (not just some risk of) nuclear devastation to prevent the creation of AGI (though again, to be clear, I am not claiming he advocated a nuclear first strike). The only potential uncertainty in that tradeoff is the consequences of AGI (though I think Eliezer's been clear that he thinks it means certain doom), and I suppose what follows after nuclear devastation as well.

dsj128

Right, but of course the absolute, certain implication from “AGI is created” to “all biological life on Earth is eaten by nanotechnology made by an unaligned AI that has worthless goals” requires some amount of justification, and that justification for this level of certainty is completely missing.

In general such confidently made predictions about the technological future have a poor historical track record, and there are multiple holes in the Eliezer/MIRI story, and there is no formal, canonical write up of why they’re so confident in their apparently sec... (read more)

2Amalthea
The trade-off you're gesturing at is really risk of AGI vs. risk of nuclear devastation. So you don't need absolute certainty on either side in order to be willing to make it.
dsj65

There’s a big difference between pre-committing to X so you have a credible threat against Y, vs. just outright preferring X over Y. In the quoted comment, Eliezer seems to have been doing the latter.

2dr_s
And how credible would your precommitment be if you made it clear that you actually prefer Y, you're just saying you'd do X for game theoretical reasons, and you'd do it, swear? These are the murky cognitive waters in which sadly your beliefs (or at least, your performance of them) affects the outcome.
4CronoDAS
"Most humans die in a nuclear war, but human extinction doesn't happen" is presumably preferable to "all biological life on Earth is eaten by nanotechnology made by an unaligned AI that has worthless goals". It should go without saying that both are absolutely terrible outcomes, but one actually is significantly more terrible than the other. Note that this is literally one of the examples in the OP - discussion of axiology in philosophy.
dsj80

I don’t agree billions dead is the only realistic outcome of his proposal. Plausibly it could just result in actually stopping large training runs. But I think he’s too willing to risk billions dead to achieve that.

dsj88

In response to the question,

“[Y]ou’ve gestured at nuclear risk. … How many people are allowed to die to prevent AGI?”,

he wrote:

“There should be enough survivors on Earth in close contact to form a viable reproductive population, with room to spare, and they should have a sustainable food supply. So long as that's true, there's still a chance of reaching the stars someday.”

He later deleted that tweet because he worried it would be interpreted by some as advocating a nuclear first strike.

I’ve seen no evidence that he is advocating a nuclear first strike, but it does seem to me to be a fair reading of that tweet that he would trade nuclear devastation for preventing AGI.

4dr_s
Most nuclear powers are willing to trade nuclear devastation for preventing the other side's victory. If you went by sheer "number of surviving humans", your best reaction to seeing the ICBMs fly towards you should be to cross your arms, make your peace, and let them hit without lifting a finger. Less chance of a nuclear winter and extinction that way. But the way deterrence prevents that from happening is by pre-commitment to actually just blowing it all up if someone ever tries something funny. That is hardly less insane than what EY suggests, but it kinda makes sense in context (but still, with a God's eye view on humanity, it's insane, and just the best way we could solve our particular coordination problem).
1Noosphere89
Yeah, at the very least it's calling for billions dead across the world, because once we realize what Eliezer wants, this is the only realistic outcome.
dsj10

If there were a game-theoretically reliable way to get everyone to pause all together, I'd support it.

dsjΩ110

In §3.1–3.3, you look at the main known ways that altruism between humans has evolved — direct and indirect reciprocity, as well as kin and group selection[1] — and ask whether we expect such altruism from AI towards humans to be similarly adaptive.

However, as observed in R. Joyce (2007). The Evolution of Morality (p. 5),

Evolutionary psychology does not claim that observable human behavior is adaptive, but rather that it is produced by psychological mechanisms that are adaptations. The output of an adaptation need not be adaptive.

This is a subtle dist... (read more)

dsjΩ8100

A similar point is (briefly) made in K. E. Drexler (2019). Reframing Superintelligence: Comprehensive AI Services as General Intelligence, §18 “Reinforcement learning systems are not equivalent to reward-seeking agents”:

Reward-seeking reinforcement-learning agents can in some instances serve as models of utility-maximizing, self-modifying agents, but in current practice, RL systems are typically distinct from the agents they produce … In multi-task RL systems, for example, RL “rewards” serve not as sources of value to agents, but as signals that guide trai

... (read more)
5TurnTrout
Thanks so much for these references. Additional quotes: Probably I should get around to reading CAIS, given that it made these points well before I did.
dsj20

So the the way that you are like taking what is probably basically the same architecture in GPT-3 and throwing 20 times as much compute at it, probably, and getting out GPT-4.

Indeed, GPT-3 is almost exactly the same architecture as GPT-2, and only a little different from GPT.

dsj91

X-risks tend to be more complicated beasts than lions in bushes, in that successfully avoiding them requires a lot more than reflexive action: we’re not going to navigate them by avoiding carefully understanding them.

2James B
I actually agree entirely. I just don't think that we need to explore those x-risks by exposing ourselves to them. I think we've already advanced AI enough to start understanding and thinking about those x-risks, and an indefinite (perhaps not permanent) pause in development will enable us to get our bearings.   Say what you need to say now to get away from the potential lion. Then back at the campfire, talk it through.
dsj35

Thanks, I appreciate the spirit with which you've approached the conversation. It's an emotional topic for people I guess.

dsj43

The negation of the claim would not be "There is definitely nothing to worry about re AI x-risk." It would be something much more mundane-sounding, like "It's not the case that if we go ahead with building AGI soon, we all die."

I debated with myself whether to present the hypothetical that way. I chose not to, because of Eliezer's recent history of extremely confident statements on the subject. I grant that the statement I quoted in isolation could be interpreted more mundanely, like the example you give here.

When the stakes are this high and the policy pr... (read more)

1Daniel Kokotajlo
Thank you for your service! You may be interested to know that I think Yudkowsky writing this article will probably have on balance more bad consequences than good; Yudkowsky is obnoxious, arrogant, and most importantly, disliked, so the more he intertwines himself with the idea of AI x-risk in the public imagination, the less likely it is that the public will take those ideas seriously. Alas. I don't blame him too much for it because I sympathize with his frustration & there's something to be said for the policy of "just tell it like it is, especially when people ask." But yeah, I wish this hadn't happened. (Also, sorry for the downvotes, I at least have been upvoting you whilst agreement-downvoting)
dsj32

Would you say the same thing about the negations of that claim? If you saw e.g. various tech companies and politicians talking about how they're going to build AGI and then [something that implies that people will still be alive afterwards] would you call them out and say they need to qualify their claim with uncertainty or else they are being unreasonable?

Yes, I do in fact say the same thing to professions of absolute certainty that there is nothing to worry about re: AI x-risk.

The negation of the claim would not be "There is definitely nothing to worry about re AI x-risk." It would be something much more mundane-sounding, like "It's not the case that if we go ahead with building AGI soon, we all die." 

That said, yay -- insofar as you aren't just applying a double standard here, then I'll agree with you. It would have been better if Yud added in some uncertainty disclaimers.

dsj3029

There simply don't exist arguments with the level of rigor needed to justify a claim such as this one without any accompanying uncertainty:

If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.

I think this passage, meanwhile, rather misrepresents the situation to a typical reader:

When the insider conversation is about the grief of seeing your daughter lose her first tooth, and thinking she’s not going to get a chance to grow up, I believe we are past the point of playing political chess about a s

... (read more)
1James B
This is a case where the precautionary principle grants a great deal of rhetorical license. If you think there might be a lion in the bush, do you have a long and nuanced conversation about it, or do you just tell your tribe, “There’s a line in that bush. Back away.”?
-4JNS
Proposition 1: Powerful systems come with no x-risk Proposition 2: Powerful systems come with x-risk You can prove / disprove 2 by proving or disproving 1. Why is it that a lot of [1,0] people believe that the [0,1] group should prove their case? [1] 1. ^ And also ignore all the arguments that have been offered.

Would you say the same thing about the negations of that claim? If you saw e.g. various tech companies and politicians talking about how they're going to build AGI and then [something that implies that people will still be alive afterwards] would you call them out and say they need to qualify their claim with uncertainty or else they are being unreasonable?

Re: the insider conversation: Yeah, I guess it depends on what you mean by 'the insider conversation' and whether you think the impression random members of the public will get from these passages brings... (read more)

dsj20

A somewhat reliable source has told me that they don't have the compute infrastructure to support making a more advanced model available to users.

That might also reflect limited engineering efforts to optimize state-of-the-art models for real world usage (think of the performance gains from GPT-3.5 Turbo) as opposed to hitting benchmarks for a paper to be published.

dsj10

I believe Anthropic is committed to not pushing at the state-of-the-art, so they may not be the most relevant player in discussions of race dynamics.

dsj40

Yes, although the chat interface was necessary but insufficient. They also needed a capable language model behind it, which OpenAI already had, and Google still lacks months later.

dsj180

I agree that those are possibilities.

On the other hand, why did news reports[1] suggest that Google was caught flat-footed by ChatGPT and re-oriented to rush Bard to market?

My sense is that Google/DeepMind's lethargy in the area of language models is due to a combination of a few factors:

  1. They've diversified their bets to include things like protein folding, fusion plasma control, etc. which are more application-driven and not on an AGI path.
  2. They've focused more on fundamental research and less on productizing and scaling.
  3. Their language model experts m
... (read more)
5Douglas_Knight
I think talking about Google/DeepMind as a unitary entity is a mistake. I'm gonna guess that Peter agrees, and that's why he specified DeepMind. Google's publications identify at least two internal language models superior to Lambda, so their release of Bard based on Lambda doesn't tell us much. They are certainly behind in commercializing chatbots, but is that a weak claim. How DeepMind compares to OpenAI is difficult. Four people going to OpenAI is damning, though.
9Kaj_Sotala
OpenAI seems to also have been caught flat-footed by ChatGPT, or more specifically by the success it got. It seems like the success came largely from the chat interface that made it intuitive for people on the street to use - and none of the LLM techies at any company realized what a difference that would make.
dsj101

If a major fraction of all resources at the top 5–10 labs were reallocated to "us[ing] this pause to jointly develop and implement a set of shared safety protocols", that seems like it would be a good thing to me.

However, the letter offers no guidance as to what fraction of resources to dedicate to this joint safety work. Thus, we can expect that DeepMind and others might each devote a couple teams to that effort, but probably not substantially halt progress at their capabilities frontier.

The only player who is effectively being asked to halt progress at its capabilities frontier is OpenAI, and that seems dangerous to me for the reasons I stated above.

dsj233

Currently, OpenAI has a clear lead over its competitors.[1] This is arguably the safest arrangement as far as race dynamics go, because it gives OpenAI some breathing room in case they ever need to slow down later on for safety reasons, and also because their competitors don't necessarily have a strong reason to think they can easily sprint to catch up.

So far as I can tell, this petition would just be asking OpenAI to burn six months of that lead and let other players catch up. That might create a very dangerous race dynamic, where now you have multip... (read more)

2Ben Pace
It's not clear to me that OpenAI has a clear lead over Anthropic in terms of capabilities.

DeepMind might be more cautious about what it releases, and/or developing systems whose power is less legible than GPT. I have no real evidence here, just vague intuitions.

4Evan R. Murphy
That might be true if nothing is actually done in the 6+ months to improve AI safety and governance. But the letter proposes:
dsj61

Another interesting section:

Silva-Braga: Are we close to the computers coming up with their own ideas for improving themselves?

Hinton: Uhm, yes, we might be.

Silva-Braga: And then it could just go fast?

Hinton: That's an issue, right. We have to think hard about how to control that.

Silva-Braga: Yeah. Can we?

Hinton: We don't know. We haven't been there yet, but we can try.

Silva-Braga: Okay. That seems kind of concerning.

Hinton: Uhm, yes.

dsj10

Of course the choice of what sort of model we fit to our data can sometimes preordain the conclusion.

Another way to interpret this is there was a very steep update made by the community in early 2022, and since then it’s been relatively flat, or perhaps trending down slowly with a lot of noise (whereas before the update it was trending up slowly).

dsj21

Seems to me there's too much noise to pinpoint the break at a specific month. There are some predictions made in early 2022 with an even later date than those made in late 2021.

But one pivotal thing around that time might have been the chain-of-thought stuff which started to come to attention then (even though there was some stuff floating around Twitter earlier).

dsj51

It's a terribly organized and presented proof, but I think it's basically right (although it's skipping over some algebraic details, which is common in proofs). To spell it out:

Fix any  and . We then have,

.

Adding  to both sides,

.

Therefore, if (by assumption in that line of the proof)  and , we'd have,

,

which contradicts our assumption that .

5hold_my_fish
Thanks. When it's written as g(x)+g(y)>x2+y2≥2xy, I can see what's going on. (That one intermediate step makes all the difference!) I was wrong then to call the proof "incorrect". I think it's fair to call it "incomplete", though. After all, it could have just said "the whole proof is an exercise for the reader", which is in some sense correct I guess, but not very helpful (and doesn't tell you much about the model's ability), and this is a bit like that on a smaller scale. (Although, reading again, "...which contradicts the existence of y∗ given x" is a quite strange thing to say as well. I'm not sure I can exactly say it's wrong, though. Really, that whole section makes my head hurt.) If a human wrote this, I would be wondering if they actually understand the reasoning or are just skipping over a step they don't know how to do. The reason I say that is that g(x)+g(y∗)>2xy∗ is the obvious contradiction to look for, so the section reads a bit like "I'd really like g(y∗)<(y∗)2 to be true, and surely there's a contradiction somehow if it isn't, but I don't really know why, but this is probably the contradiction I'd get if I figured it out". The typo-esque use of y instead of y∗ bolsters this impression.
dsj30

Thanks! This is clearer. (To be pedantic, the  distance should have a  root around the integral, but it's clear what you mean.)

3Jesse Hoogland
Thank you. Pedantic is good (I fixed the root)!
dsj205

Perhaps of interest to this community is GPT-4 using a Linux terminal to iteratively problem-solve locating and infiltrating a poorly-secured machine on a local network:

dsj42

That's because the naive inner product suggested by the risk is non-informative,

Hmm, how is this an inner product? I note that it lacks, among other properties, positive definiteness:

Edit: I guess you mean a distance metric induced by an inner product (similar to the examples later on, where you have distance metrics induced by a norm), not an actual inner product? I'm confused by the use of standard inner product notation if that's the intended meaning. Also, in this case, this ... (read more)

3Jesse Hoogland
You're right, thanks for pointing that out! I fixed the notation. Like you say, the difference of risks doesn't even qualify as a metric (the other choices mentioned do, however).
dsj76

I think this is overcomplicating things.

We don't have to solve any deep philosophical problems here finding the one true pointer to "society's values", or figuring out how to analogize society to an individual.

We just have to recognize that the vast majority of us really don't want a single rogue to be able to destroy everything we've all built, and we can act pragmatically to make that less likely.

1Noosphere89
I agree with this, in a nutshell. After all, you can put almost whatever values you like and it will work, which is the point of my long commennt. My point is once you have the instrumental goals done like survival and technological progress down for everyone, alignment in practice should reduce to this: And the alignment problem is simple enough: How do you brainwash an AI to have your goals?
dsj70

Well, admittedly “alignment to humanity as a whole” is open to interpretation. But would you rather everyone have their own personal superintelligence that they can brainwash to do whatever they want?

1Noosphere89
Basically, yes. This is mostly because I think even a best case alignment scenario can't be ever more than "everyone have their own personal superintelligence that they can brainwash to do whatever they want." This is related to fundamental disagreements I have around morality and values that make me pessimistic around trying to align groups of people, or indeed trying to align with the one true morality/values. To state the disagreements I have: 1. I think to the extent that moral realism is right, morality/values is essentially trivial, in that every morality is correct, and I suspect that there is no non-arbitrary to restrict morality or values without sneaking in your own values. Essentially, it's trivialism, applied to morality, with a link below: https://en.m.wikipedia.org/wiki/Trivialism 1. The reason reality doesn't face the problem of being trivial is because for our purposes, we don't have the power to warp reality to what you want to (Often talked about by different names, including omnipotentence, administrator access to reality, and more), whereas in morality, we do have the power to change our values to anything else, this generating inconsistent, but complete values, in contrast to the universe we find ourselves in, which is probably consistent and incomplete. 2. There is no way to coherently talk about something like a society or humanity's values in the general case, and in the case where everyone is aligned, all we can talk about is optimal redistribution of goods. This makes a lot of attempts to analogize society or humanity's values to say, an individual person rely on two techniques that are subjective: That means it is never a nation or humanity that acts on morals or values, but specific people with their own values take those actions. Here's a link to it. https://www.lesswrong.com/posts/YYuB8w4nrfWmLzNob/thatcher-s-axiom So my conclusion is, yes I do really bite the bullet here and support "everyone have their own personal sup
Load More