LESSWRONG
LW

All of dsj's Comments + Replies

Did Christopher Hitchens change his mind about waterboarding?

And mine.

Maybe Anthropic's Long-Term Benefit Trust is powerless

I don’t know much background here so I may be off base, but it’s possible that the motivation of the trust isn’t to bind leadership’s hands to avoid profit-motivated decision making, but rather to free their hands to do so, ensuring that shareholders have no claim against them for such actions, as traditional governance structures might have provided.

3Zac Hatfield-Dodds9mo

Incorporating as a Public Benefit Corporation already frees directors' hands; Delaware Title 8, §365 requires them to "balance the pecuniary interests of the stockholders, the best interests of those materially affected by the corporation’s conduct, and the specific public benefit(s) identified in its certificate of incorporation".

Zach Stein-Perlman's Shortform

dsj10mo81

(Unless "employees who signed a standard exit agreement" is doing a lot of work — maybe a substantial number of employees technically signed nonstandard agreements.)

Yeah, what about employees who refused to sign? Have we gotten any clarification on their situation?

Book Review: 1948 by Benny Morris

dsj1y32

Thank you, I appreciated this post quite a bit. There's a paucity of historical information about this conflict which isn't colored by partisan framing, and you seem to be coming from a place of skeptical, honest inquiry. I'd look forward to reading what you have to say about 1967.

Anthropic Fall 2023 Debate Progress Update

dsj1yΩ175

Thanks for doing this! I think a lot of people would be very interested in the debate transcripts if you posted them on GitHub or something.

1Ansh Radhakrishnan1y

Just pasted a few transcripts into the post, thanks for the nudge!

1Sam Bowman1y

Is there anything you'd be especially excited to use them for? This should be possible, but cumbersome enough that we'd default to waiting until this grows into a full paper (date TBD). My NYU group's recent paper on a similar debate setup includes a data release, FWIW.

Evaluating the historical value misspecification argument

dsj1y32

Okay. I do agree that one way to frame Matthew’s main point is that MIRI thought it would be hard to specify the human value function, and an LM that understands human values and reliably tells us the truth about that understanding is such a specification, and hence falsifies that belief.

To your second question: MIRI thought we couldn’t specify the value function to do the bounded task of filling the cauldron, because any value function we could naively think of writing, when given to an AGI (which was assumed to be a utility argmaxer), leads to all sorts ... (read more)

Evaluating the historical value misspecification argument

dsj1y32

I think this reply is mostly talking past my comment.

I know that MIRI wasn't claiming we didn't know how to safely make deep learning systems, GOFAI systems, or what-have-you fill buckets of water, but my comment wasn't about those systems. I also know that MIRI wasn't issuing a water-bucket-filling challenge to capabilities researchers.

My comment was specifically about directing an AGI (which I think GPT-4 roughly is), not deep learning systems or other software generally. I *do* think MIRI was claiming we didn't know how to make AGI systems safely do mun... (read more)

Evaluating the historical value misspecification argument

dsj1y4-5

Okay, that clears things up a bit, thanks. :) (And sorry for delayed reply. Was stuck in family functions for a couple days.)

This framing feels a bit wrong/confusing for several reasons.

I guess by “lie to us” you mean act nice on the training distribution, waiting for a chance to take over the world while off distribution. I just … don’t believe GPT-4 is doing this; it seems highly implausible to me, in large part because I don’t think GPT-4 is clever enough that it could keep up the veneer until it’s ready to strike if that were the case.
The term “l

... (read more)

2Lauro Langosco1y

I'm not saying that GPT-4 is lying to us - that part is just clarifying what I think Matthew's claim is. Re cauldron: I'm pretty sure MIRI didn't think that. Why would they?

Rob Bensinger1y123

I think the old school MIRI cauldron-filling problem pertained to pretty mundane, everyday tasks. No one said at the time that they didn’t really mean that it would be hard to get an AGI to do those things, that it was just an allegory for other stuff like the strawberry problem. They really seemed to believe, and said over and over again, that we didn’t know how to direct a general-purpose AI to do bounded, simple, everyday tasks without it wanting to take over the world. So this should be a big update to people who held that view, even if there are still

dsj1y2-1

Hmm, you say “your claim, if I understand correctly, is that MIRI thought AI wouldn't understand human values”. I’m disagreeing with this. I think Matthew isn’t claiming that MIRI thought AI wouldn’t understand human values.

5Lauro Langosco1y

I think maybe there's a parenthesis issue here :) I'm saying "your claim, if I understand correctly, is that MIRI thought AI wouldn't (understand human values and also not lie to us)".

Evaluating the historical value misspecification argument

dsj1y51

I think you’re misunderstanding the paragraph you’re quoting. I read Matthew, in that paragraph as acknowledging the difference between the two problems, and saying that MIRI thought value specification (not value understanding) was much harder than it’s looking to actually be.

1Lauro Langosco1y

I think we agree - that sounds like it matches what I think Matthew is saying.

A transcript of the TED talk by Eliezer Yudkowsky

dsj2y117

I know this is from a bit ago now so maybe he’s changed his tune since, but I really wish he and others would stop repeating the falsehood that all international treaties are ultimately backed by force on the signatory countries. There are countless trade, climate reduction, and nuclear disarmament agreements which are not backed by force. I’d venture to say that the large majority of agreements are backed merely by the promise of continued good relations and tit-for-tat mutual benefit or defection.

Deep learning models might be secretly (almost) linear

dsj2y*40

A key distinction is between linearity in the weights vs. linearity in the input data.

For example, the function $f (a, b, x, y) = a sin (x) + b cos (y)$ is linear in the arguments $a$ and $b$ but nonlinear in the arguments $x$ and $y$ , since $sin$ and $cos$ are nonlinear.

Similarly, we have evidence that wide neural networks $f (x; θ)$ are (almost) linear in the parameters $θ$ , despite being nonlinear in the input data $x$ (due e.g. to nonlinear activation functions such as ReLU). So nonlinear activati... (read more)

Can we evaluate the "tool versus agent" AGI prediction?

dsj2y*3013

The more I stare at this observation, the more it feels potentially more profound than I intended when writing it.

Consider the “cauldron-filling” task. Does anyone doubt that, with at most a few very incremental technological steps from today, one could train a multimodal, embodied large language model (“RobotGPT”), to which you could say, “please fill up the cauldron”, and it would just do it, using a reasonable amount of common sense in the process — not flooding the room, not killing anyone or going to any other extreme lengths, and stopping if asked? I... (read more)

7arabaga2y

Indeed, isn't PaLM-SayCan an early example of this?

1cubefox2y

It is worth thinking about why ChatGPT, an Oracle AI which can execute certain instructions, does not fail text equivalents of the cauldron task. It seems the reason why it doesn't fail is that it is pretty good at understanding the meaning of expressions. (If an AI floods the room with water because this maximizes the probability that the cauldron will be filled, then the AI hasn't fully understood the instruction "fill the cauldron", which only asks for a satisficing solution.) And why is ChatGPT so good as interpreting the meaning of instructions? Because its base model was trained with some form of imitation learning, which gives it excellent language understanding and the ability to mimic the the linguistic behavior of human agents. This requires special prompting in a base model, but supervised learning on dialogue examples (instruction tuning) lets it respond adequately to instructions. (Of course, at this stage it would not refuse any dangerous requests, which comes only in with RLHF, which seems a rather imperfect tool.)

4quanticle2y

On the flip side, as gwern pointed out in his Clippy short story, it's possible for a "neutral" GPT-like system to discover agency and deception in its training data and execute upon those prompts without any explicit instruction to do so from its human supervisor. The actions of a tool-AI programmed with a more "obvious" explicit utility function is easier to predict, in some ways, than the actions of something like ChatGPT, where the actions that it's making visible to you may be a subset (and a deliberately deceptively chosen subset) of all the actions that it is actually taking.

Can we evaluate the "tool versus agent" AGI prediction?

dsj2y135

Though interestingly, aligning a langchainesque AI to the user’s intent seems to be (with some caveats) roughly as hard as stating that intent in plain English.

dsj2y*3013

The more I stare at this observation, the more it feels potentially more profound than I intended when writing it.

Anthropic is further accelerating the Arms Race?

dsj2y1-5

My guess is “today” was supposed to refer to some date when they were doing the investigation prior to the release of GPT-4, not the date the article was published.

Zach Stein-Perlman2y125

Minerva (from June 2022) used 3e24; there's no way "several orders of magnitude larger" was right when the article was being written. I think the author just made a mistake.

Anthropic is further accelerating the Arms Race?

dsj2y40

Got a source for this estimate?

Zach Stein-Perlman2y135

Epoch says 2.2e25. Skimming that page, it seems like a pretty unreliable estimate. They say their 90% confidence interval is about 1e25 to 5e25.

No Summer Harvest: Why AI Development Won't Pause

dsj2y21

Nitpick: the paper from Eloundou et al is called “GPTs are GPTs”, not “GPTs and GPTs”.

1Stephen Fowler2y

Cheers

Reward is not the optimization target

dsj2y10

Probably I should get around to reading CAIS, given that it made these points well before I did.

I found it's a pretty quick read, because the hierarchical/summary/bullet point layout allows one to skip a lot of the bits that are obvious or don't require further elaboration (which is how he endorsed reading it in this lecture).

Policy discussions follow strong contextualizing norms

dsj2y20

We don’t know with confidence how hard alignment is, and whether something roughly like the current trajectory (even if reckless) leads to certain death if it reaches superintelligence.

There is a wide range of opinion on this subject from smart, well-informed people who have devoted themselves to studying it. We have a lot of blog posts and a small number of technical papers, all usually making important (and sometimes implicit and unexamined) theoretical assumptions which we don’t know are true, plus some empirical analysis of much weaker systems.

We do not have an established, well-tested scientific theory like we do with pathogens such as smallpox. We cannot say with confidence what is going to happen.

Policy discussions follow strong contextualizing norms

dsj2y10

I agree that if you're absolutely certain AGI means the death of everything, then nuclear devastation is preferable.

I think the absolute certainty that AGI does mean the death of everything is extremely far from called for, and is itself a bit scandalous.

(As to whether Eliezer's policy proposal is likely to lead to nuclear devastation, my bottom line view is it's too vague to have an opinion. But I think he should have consulted with actual AI policy experts and developed a detailed proposal with them, which he could then point to, before writing up an emotional appeal, with vague references to air strikes and nuclear conflict, for millions of lay people to read in TIME Magazine.)

1dr_s2y

I think the absolute certainty in general terms would not be warranted; the absolute certainty if AGI is being developed in a reckless manner is more reasonable. Compare someone researching smallpox in a BSL-4 lab versus someone juggling smallpox vials in a huge town square full of people, and what probability does each of them make you assign to a smallpox pandemic being imminent. I still don't think AGI would mean necessarily doom simply because I don't fully buy that its ability to scale up to ASI is 100% guaranteed. However, I also think in practice that would matter little, because states might still see even regular AGI as a major threat. Having infinite cognitive labour is such a broken hax tactic it basically makes you Ruler of the World by default if you have an exclusive over it. That alone might make it a source of tension.

Policy discussions follow strong contextualizing norms

dsj2y10

One's credibility would be less of course, but Eliezer is not the one who would be implementing the hypothetical policy (that would be various governments), so it's not his credibility that's relevant here.

I don't have much sense he's holding back his real views on the matter.

1dr_s2y

But on the object level, if you do think that AGI means certain extinction, then that's indeed the right call (consider also that a single strike on a data centre might mean a risk of nuclear war, but that doesn't mean it's a certainty. If one listened to Putin's barking, every bit of help given to Ukraine is a risk of nuclear war, but in practice Russia just swallows it up and lets it go, because no one is actually very eager to push that button, and they still have way too much to lose from it). The scenario in which Eliezer's approach is just wrong is if he is vastly overestimating the risk of an AGI extinction event or takeover. This might be the case, or might become so in the future (for example imagine a society in which the habit is to still enforce the taboo, but alignment has actually advanced enough to make friendly AI feasible). It isn't perfect, it isn't necessarily always true, but it isn't particularly scandalous. I bet you lots of hawkish pundits during the Cold War have said that nuclear annihilation would have been preferable to the worldwide victory of Communism, and that is a substantially more nonsensical view.

Policy discussions follow strong contextualizing norms

dsj2y41

Did you intend to say risk off, or risk of?

If the former, then I don't understand your comment and maybe a rewording would help me.

If the latter, then I'll just reiterate that I'm referring to Eliezer's explicitly stated willingness to trade off the actuality of (not just some risk of) nuclear devastation to prevent the creation of AGI (though again, to be clear, I am not claiming he advocated a nuclear first strike). The only potential uncertainty in that tradeoff is the consequences of AGI (though I think Eliezer's been clear that he thinks it means certain doom), and I suppose what follows after nuclear devastation as well.

Policy discussions follow strong contextualizing norms

dsj2y128

Right, but of course the absolute, certain implication from “AGI is created” to “all biological life on Earth is eaten by nanotechnology made by an unaligned AI that has worthless goals” requires some amount of justification, and that justification for this level of certainty is completely missing.

In general such confidently made predictions about the technological future have a poor historical track record, and there are multiple holes in the Eliezer/MIRI story, and there is no formal, canonical write up of why they’re so confident in their apparently sec... (read more)

2Amalthea2y

The trade-off you're gesturing at is really risk of AGI vs. risk of nuclear devastation. So you don't need absolute certainty on either side in order to be willing to make it.

Policy discussions follow strong contextualizing norms

dsj2y65

There’s a big difference between pre-committing to X so you have a credible threat against Y, vs. just outright preferring X over Y. In the quoted comment, Eliezer seems to have been doing the latter.

2dr_s2y

And how credible would your precommitment be if you made it clear that you actually prefer Y, you're just saying you'd do X for game theoretical reasons, and you'd do it, swear? These are the murky cognitive waters in which sadly your beliefs (or at least, your performance of them) affects the outcome.

4CronoDAS2y

"Most humans die in a nuclear war, but human extinction doesn't happen" is presumably preferable to "all biological life on Earth is eaten by nanotechnology made by an unaligned AI that has worthless goals". It should go without saying that both are absolutely terrible outcomes, but one actually is significantly more terrible than the other. Note that this is literally one of the examples in the OP - discussion of axiology in philosophy.

Policy discussions follow strong contextualizing norms

dsj2y80

I don’t agree billions dead is the only realistic outcome of his proposal. Plausibly it could just result in actually stopping large training runs. But I think he’s too willing to risk billions dead to achieve that.

Policy discussions follow strong contextualizing norms

dsj2y88

In response to the question,

“[Y]ou’ve gestured at nuclear risk. … How many people are allowed to die to prevent AGI?”,

he wrote:

“There should be enough survivors on Earth in close contact to form a viable reproductive population, with room to spare, and they should have a sustainable food supply. So long as that's true, there's still a chance of reaching the stars someday.”

He later deleted that tweet because he worried it would be interpreted by some as advocating a nuclear first strike.

I’ve seen no evidence that he is advocating a nuclear first strike, but it does seem to me to be a fair reading of that tweet that he would trade nuclear devastation for preventing AGI.

4dr_s2y

Most nuclear powers are willing to trade nuclear devastation for preventing the other side's victory. If you went by sheer "number of surviving humans", your best reaction to seeing the ICBMs fly towards you should be to cross your arms, make your peace, and let them hit without lifting a finger. Less chance of a nuclear winter and extinction that way. But the way deterrence prevents that from happening is by pre-commitment to actually just blowing it all up if someone ever tries something funny. That is hardly less insane than what EY suggests, but it kinda makes sense in context (but still, with a God's eye view on humanity, it's insane, and just the best way we could solve our particular coordination problem).

1Noosphere892y

Yeah, at the very least it's calling for billions dead across the world, because once we realize what Eliezer wants, this is the only realistic outcome.

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

dsj2y10

If there were a game-theoretically reliable way to get everyone to pause all together, I'd support it.

AI and Evolution

dsj2yΩ110

In §3.1–3.3, you look at the main known ways that altruism between humans has evolved — direct and indirect reciprocity, as well as kin and group selection^[1] — and ask whether we expect such altruism from AI towards humans to be similarly adaptive.

However, as observed in R. Joyce (2007). The Evolution of Morality (p. 5),

Evolutionary psychology does not claim that observable human behavior is adaptive, but rather that it is produced by psychological mechanisms that are adaptations. The output of an adaptation need not be adaptive.

This is a subtle dist... (read more)

Reward is not the optimization target

dsj2yΩ8100

A similar point is (briefly) made in K. E. Drexler (2019). Reframing Superintelligence: Comprehensive AI Services as General Intelligence, §18 “Reinforcement learning systems are not equivalent to reward-seeking agents”:

Reward-seeking reinforcement-learning agents can in some instances serve as models of utility-maximizing, self-modifying agents, but in current practice, RL systems are typically distinct from the agents they produce … In multi-task RL systems, for example, RL “rewards” serve not as sources of value to agents, but as signals that guide trai

... (read more)

5TurnTrout2y

Thanks so much for these references. Additional quotes: Probably I should get around to reading CAIS, given that it made these points well before I did.

"Dangers of AI and the End of Human Civilization" Yudkowsky on Lex Fridman

dsj2y20

So the the way that you are like taking what is probably basically the same architecture in GPT-3 and throwing 20 times as much compute at it, probably, and getting out GPT-4.

Indeed, GPT-3 is almost exactly the same architecture as GPT-2, and only a little different from GPT.

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

dsj2y91

X-risks tend to be more complicated beasts than lions in bushes, in that successfully avoiding them requires a lot more than reflexive action: we’re not going to navigate them by avoiding carefully understanding them.

2James B2y

I actually agree entirely. I just don't think that we need to explore those x-risks by exposing ourselves to them. I think we've already advanced AI enough to start understanding and thinking about those x-risks, and an indefinite (perhaps not permanent) pause in development will enable us to get our bearings. Say what you need to say now to get away from the potential lion. Then back at the campfire, talk it through.

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

dsj2y35

Thanks, I appreciate the spirit with which you've approached the conversation. It's an emotional topic for people I guess.

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

dsj2y43

The negation of the claim would not be "There is definitely nothing to worry about re AI x-risk." It would be something much more mundane-sounding, like "It's not the case that if we go ahead with building AGI soon, we all die."

I debated with myself whether to present the hypothetical that way. I chose not to, because of Eliezer's recent history of extremely confident statements on the subject. I grant that the statement I quoted in isolation could be interpreted more mundanely, like the example you give here.

When the stakes are this high and the policy pr... (read more)

1Daniel Kokotajlo2y

Thank you for your service! You may be interested to know that I think Yudkowsky writing this article will probably have on balance more bad consequences than good; Yudkowsky is obnoxious, arrogant, and most importantly, disliked, so the more he intertwines himself with the idea of AI x-risk in the public imagination, the less likely it is that the public will take those ideas seriously. Alas. I don't blame him too much for it because I sympathize with his frustration & there's something to be said for the policy of "just tell it like it is, especially when people ask." But yeah, I wish this hadn't happened. (Also, sorry for the downvotes, I at least have been upvoting you whilst agreement-downvoting)

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

dsj2y32

Would you say the same thing about the negations of that claim? If you saw e.g. various tech companies and politicians talking about how they're going to build AGI and then [something that implies that people will still be alive afterwards] would you call them out and say they need to qualify their claim with uncertainty or else they are being unreasonable?

Yes, I do in fact say the same thing to professions of absolute certainty that there is nothing to worry about re: AI x-risk.

Daniel Kokotajlo2y1610

The negation of the claim would not be "There is definitely nothing to worry about re AI x-risk." It would be something much more mundane-sounding, like "It's not the case that if we go ahead with building AGI soon, we all die."

That said, yay -- insofar as you aren't just applying a double standard here, then I'll agree with you. It would have been better if Yud added in some uncertainty disclaimers.

Pausing AI Developments Isn't Enough. We Need to Shut it All Down by Eliezer Yudkowsky

dsj2y3029

There simply don't exist arguments with the level of rigor needed to justify a claim such as this one without any accompanying uncertainty:

If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.

I think this passage, meanwhile, rather misrepresents the situation to a typical reader:

When the insider conversation is about the grief of seeing your daughter lose her first tooth, and thinking she’s not going to get a chance to grow up, I believe we are past the point of playing political chess about a s

... (read more)

1James B2y

This is a case where the precautionary principle grants a great deal of rhetorical license. If you think there might be a lion in the bush, do you have a long and nuanced conversation about it, or do you just tell your tribe, “There’s a line in that bush. Back away.”?

-4JNS2y

Proposition 1: Powerful systems come with no x-risk Proposition 2: Powerful systems come with x-risk You can prove / disprove 2 by proving or disproving 1. Why is it that a lot of [1,0] people believe that the [0,1] group should prove their case? [1] 1. ^ And also ignore all the arguments that have been offered.

Daniel Kokotajlo2y4740

Would you say the same thing about the negations of that claim? If you saw e.g. various tech companies and politicians talking about how they're going to build AGI and then [something that implies that people will still be alive afterwards] would you call them out and say they need to qualify their claim with uncertainty or else they are being unreasonable?

Re: the insider conversation: Yeah, I guess it depends on what you mean by 'the insider conversation' and whether you think the impression random members of the public will get from these passages brings... (read more)

FLI open letter: Pause giant AI experiments

dsj2y20

A somewhat reliable source has told me that they don't have the compute infrastructure to support making a more advanced model available to users.

That might also reflect limited engineering efforts to optimize state-of-the-art models for real world usage (think of the performance gains from GPT-3.5 Turbo) as opposed to hitting benchmarks for a paper to be published.

FLI open letter: Pause giant AI experiments

dsj2y10

I believe Anthropic is committed to not pushing at the state-of-the-art, so they may not be the most relevant player in discussions of race dynamics.

FLI open letter: Pause giant AI experiments

dsj2y40

Yes, although the chat interface was necessary but insufficient. They also needed a capable language model behind it, which OpenAI already had, and Google still lacks months later.

FLI open letter: Pause giant AI experiments

dsj2y180

I agree that those are possibilities.

On the other hand, why did news reports^[1] suggest that Google was caught flat-footed by ChatGPT and re-oriented to rush Bard to market?

My sense is that Google/DeepMind's lethargy in the area of language models is due to a combination of a few factors:

They've diversified their bets to include things like protein folding, fusion plasma control, etc. which are more application-driven and not on an AGI path.
They've focused more on fundamental research and less on productizing and scaling.
Their language model experts m

... (read more)

5Douglas_Knight2y

I think talking about Google/DeepMind as a unitary entity is a mistake. I'm gonna guess that Peter agrees, and that's why he specified DeepMind. Google's publications identify at least two internal language models superior to Lambda, so their release of Bard based on Lambda doesn't tell us much. They are certainly behind in commercializing chatbots, but is that a weak claim. How DeepMind compares to OpenAI is difficult. Four people going to OpenAI is damning, though.

9Kaj_Sotala2y

OpenAI seems to also have been caught flat-footed by ChatGPT, or more specifically by the success it got. It seems like the success came largely from the chat interface that made it intuitive for people on the street to use - and none of the LLM techies at any company realized what a difference that would make.

FLI open letter: Pause giant AI experiments

dsj2y*101

If a major fraction of all resources at the top 5–10 labs were reallocated to "us[ing] this pause to jointly develop and implement a set of shared safety protocols", that seems like it would be a good thing to me.

However, the letter offers no guidance as to what fraction of resources to dedicate to this joint safety work. Thus, we can expect that DeepMind and others might each devote a couple teams to that effort, but probably not substantially halt progress at their capabilities frontier.

The only player who is effectively being asked to halt progress at its capabilities frontier is OpenAI, and that seems dangerous to me for the reasons I stated above.

FLI open letter: Pause giant AI experiments

dsj2y*233

Currently, OpenAI has a clear lead over its competitors.^[1] This is arguably the safest arrangement as far as race dynamics go, because it gives OpenAI some breathing room in case they ever need to slow down later on for safety reasons, and also because their competitors don't necessarily have a strong reason to think they can easily sprint to catch up.

So far as I can tell, this petition would just be asking OpenAI to burn six months of that lead and let other players catch up. That might create a very dangerous race dynamic, where now you have multip... (read more)

2Ben Pace2y

It's not clear to me that OpenAI has a clear lead over Anthropic in terms of capabilities.

PeterMcCluskey2y262

DeepMind might be more cautious about what it releases, and/or developing systems whose power is less legible than GPT. I have no real evidence here, just vague intuitions.

4Evan R. Murphy2y

That might be true if nothing is actually done in the 6+ months to improve AI safety and governance. But the letter proposes:

Geoffrey Hinton - Full "not inconceivable" quote

dsj2y61

Another interesting section:

Silva-Braga: Are we close to the computers coming up with their own ideas for improving themselves?
Hinton: Uhm, yes, we might be.
Silva-Braga: And then it could just go fast?
Hinton: That's an issue, right. We have to think hard about how to control that.
Silva-Braga: Yeah. Can we?
Hinton: We don't know. We haven't been there yet, but we can try.
Silva-Braga: Okay. That seems kind of concerning.
Hinton: Uhm, yes.

Metaculus Predicts Weak AGI in 2 Years and AGI in 10

dsj2y10

Of course the choice of what sort of model we fit to our data can sometimes preordain the conclusion.

Another way to interpret this is there was a very steep update made by the community in early 2022, and since then it’s been relatively flat, or perhaps trending down slowly with a lot of noise (whereas before the update it was trending up slowly).

Metaculus Predicts Weak AGI in 2 Years and AGI in 10

dsj2y21

Seems to me there's too much noise to pinpoint the break at a specific month. There are some predictions made in early 2022 with an even later date than those made in late 2021.

But one pivotal thing around that time might have been the chain-of-thought stuff which started to come to attention then (even though there was some stuff floating around Twitter earlier).

Microsoft Research Paper Claims Sparks of Artificial Intelligence in GPT-4

dsj2y51

It's a terribly organized and presented proof, but I think it's basically right (although it's skipping over some algebraic details, which is common in proofs). To spell it out:

Fix any $x$ and $y$ . We then have,

$x^{2} - 2 x y + y^{2} = (x - y)^{2} \geq 0$ .

Adding $2 x y$ to both sides,

$x^{2} + y^{2} \geq 2 x y$ .

Therefore, if (by assumption in that line of the proof) $g (x) > x^{2}$ and $g (y) \geq y^{2}$ , we'd have,

$g (x) + g (y) > x^{2} + y^{2} \geq 2 x y$ ,

which contradicts our assumption that $g (x) + g (y) \leq 2 x y$ .

5hold_my_fish2y

Thanks. When it's written as g(x)+g(y)>x2+y2≥2xy, I can see what's going on. (That one intermediate step makes all the difference!) I was wrong then to call the proof "incorrect". I think it's fair to call it "incomplete", though. After all, it could have just said "the whole proof is an exercise for the reader", which is in some sense correct I guess, but not very helpful (and doesn't tell you much about the model's ability), and this is a bit like that on a smaller scale. (Although, reading again, "...which contradicts the existence of y∗ given x" is a quite strange thing to say as well. I'm not sure I can exactly say it's wrong, though. Really, that whole section makes my head hurt.) If a human wrote this, I would be wondering if they actually understand the reasoning or are just skipping over a step they don't know how to do. The reason I say that is that g(x)+g(y∗)>2xy∗ is the obvious contradiction to look for, so the section reads a bit like "I'd really like g(y∗)<(y∗)2 to be true, and surely there's a contradiction somehow if it isn't, but I don't really know why, but this is probably the contradiction I'd get if I figured it out". The typo-esque use of y instead of y∗ bolsters this impression.

Empirical risk minimization is fundamentally confused

dsj2y30

Thanks! This is clearer. (To be pedantic, the $ℓ^{p}$ distance should have a $p^{th}$ root around the integral, but it's clear what you mean.)

3Jesse Hoogland2y

Thank you. Pedantic is good (I fixed the root)!

Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Microsoft Research

dsj2y205

Perhaps of interest to this community is GPT-4 using a Linux terminal to iteratively problem-solve locating and infiltrating a poorly-secured machine on a local network:

Empirical risk minimization is fundamentally confused

dsj2y42

That's because the naive inner product suggested by the risk is non-informative,
$⟨ f, g ⟩ = R (f) - R (g) = \int (ℓ (f (x), y) - ℓ (g (x), y)) q (x, y) d x d y .$

Hmm, how is this an inner product? I note that it lacks, among other properties, positive definiteness:

$⟨ f, f ⟩ = R (f) - R (f) = 0$

Edit: I guess you mean a distance metric induced by an inner product (similar to the examples later on, where you have distance metrics induced by a norm), not an actual inner product? I'm confused by the use of standard inner product notation if that's the intended meaning. Also, in this case, this ... (read more)

3Jesse Hoogland2y

You're right, thanks for pointing that out! I fixed the notation. Like you say, the difference of risks doesn't even qualify as a metric (the other choices mentioned do, however).

A tension between two prosaic alignment subgoals

dsj2y76

I think this is overcomplicating things.

We don't have to solve any deep philosophical problems here finding the one true pointer to "society's values", or figuring out how to analogize society to an individual.

We just have to recognize that the vast majority of us really don't want a single rogue to be able to destroy everything we've all built, and we can act pragmatically to make that less likely.

1Noosphere892y

I agree with this, in a nutshell. After all, you can put almost whatever values you like and it will work, which is the point of my long commennt. My point is once you have the instrumental goals done like survival and technological progress down for everyone, alignment in practice should reduce to this: And the alignment problem is simple enough: How do you brainwash an AI to have your goals?

A tension between two prosaic alignment subgoals

dsj2y70

Well, admittedly “alignment to humanity as a whole” is open to interpretation. But would you rather everyone have their own personal superintelligence that they can brainwash to do whatever they want?

1Noosphere892y

Basically, yes. This is mostly because I think even a best case alignment scenario can't be ever more than "everyone have their own personal superintelligence that they can brainwash to do whatever they want." This is related to fundamental disagreements I have around morality and values that make me pessimistic around trying to align groups of people, or indeed trying to align with the one true morality/values. To state the disagreements I have: 1. I think to the extent that moral realism is right, morality/values is essentially trivial, in that every morality is correct, and I suspect that there is no non-arbitrary to restrict morality or values without sneaking in your own values. Essentially, it's trivialism, applied to morality, with a link below: https://en.m.wikipedia.org/wiki/Trivialism 1. The reason reality doesn't face the problem of being trivial is because for our purposes, we don't have the power to warp reality to what you want to (Often talked about by different names, including omnipotentence, administrator access to reality, and more), whereas in morality, we do have the power to change our values to anything else, this generating inconsistent, but complete values, in contrast to the universe we find ourselves in, which is probably consistent and incomplete. 2. There is no way to coherently talk about something like a society or humanity's values in the general case, and in the case where everyone is aligned, all we can talk about is optimal redistribution of goods. This makes a lot of attempts to analogize society or humanity's values to say, an individual person rely on two techniques that are subjective: That means it is never a nation or humanity that acts on morals or values, but specific people with their own values take those actions. Here's a link to it. https://www.lesswrong.com/posts/YYuB8w4nrfWmLzNob/thatcher-s-axiom So my conclusion is, yes I do really bite the bullet here and support "everyone have their own personal sup