davekasten's Shortform

davekasten

LESSWRONG
LW

davekasten's Shortform

1st May 2024

1 min read

2

This is a special post for quick takes by davekasten. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Mentioned in

58AI #86: Just Think of the Potential

24Whether governments will control AGI is important and neglected

davekasten's Shortform

2Alexander Gietelink Oldenziel

91 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:37 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]davekasten1mo*13011

The Trump administration (or, more specifically, the White House Office of Science and Technology Policy, but they are in the lead on most AI policy, it seems), are asking for comment on what their AI Action Plan should include. Literally anyone can comment on it. You should consider commenting on it, comments are due Saturday at 8:59pm PT/11:59pm ET via an email address. These comments will actually be read, and a large number of comments on an issue usually does influence any White House's policy. I encourage you to submit comments!

regulations.gov/document/NSF_FRDOC_0001-3479… (Note that all submissions are public and will be published)

(Disclosure: I am working on a submission for this for my dayjob but this particular post is in my personal capacity)

(Edit note: I originally said this was due Friday; I cannot read a calendar, it is in fact due 24 hours later. Consider this a refund that we have all received for being so good at remembering the planning fallacy all these years.)

[-]peterslattery1mo*139

In the future, there should be some organization or some group of individuals in the LW community who raise awareness about these sorts of opportunities and offer content and support to ensure submissions from the most knowledgeable and relevant actors. This seems like a very low-hanging fruit and is something several groups I know are doing.

6Seth Herd1mo

Edit: I finished that post on this topic: Whether governments will control AGI is important and neglected. I'm hoping for discussion on that post, and quite ready to change my draft comment, or not submit one, based on those arguments. After putting a bunch of thought into it, my planned comment will recommend forming a committee that can work in private to investigate the opportunities and risks of AI development, to inform future policy. I will note that this was Roosevelt's response to Einstein's letter on the potential of nuclear weaponry. I hope that such a committee will conclude that yeah, there are some big dangers on expectation. I will emphasize the disagreement among experts, and suggest that the sane thing to do is put real effort into sorting out the many conflicting claims and possibilities, while also pursuing our current best guesses. I think any request for a slowdown is wasted, given the request's note about reducing regulatory barriers. But I will note that there are dangers to both our economy from potential rapid job loss, and large security risks from adversaries stealing or copying our AI, such that we may be currently building tools and weapons that will be used against us. I think I will not emphasize x-risk, and may not even include it. But I will probably mention that predictions of reaching human-level autonomous operation are very mixed, so we're not sure how far we are from creating what's effectively a new intelligent species. I'm hoping that triggers the right intuitions of danger. Again, I'm highly uncertain and very open to changing my mind on what to say. Original comment: This raises the question: what should we say? Fortunately, I've almost finished a post about this. It analyzes many aspects of the question "do we want governments to recognize the potential of AGI?". Unfortunately, it doesn't answer the question. There are strong points on both sides, and it needs more careful thought. Nonetheless, I'll probably get it

[-]davekasten1mo139

I think on net, there are relatively fewer risks related to getting governments more AGI-pilled vs. them continuing on their current course; governments are broadly AI-pilled even if not AGI/ASI-pilled and are doing most of the accelerating actions an AGI-accelerator would want.

7Seth Herd1mo

I wasn't able to finish that post in the few minutes I've got so far today, so here's the super short version. I remain highly uncertain whether my comments will include any mention of AGI. (Edit: I finally finished it: Whether governments will control AGI is important and neglected) I think whether AGI-pilling governments is a good idea is quite complex. Pushing the government to become aware of AGI x-risks will probably decelerate progress, but it could even accelerate it if the conclusion is "build it first, don't worry we'll be super careful when we get close". Even if it does help with alignment, it's not necessarily net good. If governments take control early enough to prevent proliferation of AGI, that helps a lot with the risks of misalignment and catastrophic misuse. The US could even cooperate with China to prevent proliferation to other countries and to nongovermental groups, just as the US cooperated with Russia on nuclear nonproliferation. But government control also raises the risks of power concentration. Intent-aligned AGI in untrustworthy hands could create a permanent dictatorship and unbreakable police state. The current governments of both the US and China don't seem like the best types to control the future. So it's a matter of balancing Fear of centralized power vs. fear of misaligned AGI. This also needs to be balanced agains the possibility of misuse of intent-aligned AGI if it does proliferate broadly; see If we solve alignment, do we die anyway? If I had a firm estimate of how hard technical alignment is, I'd have a better answer. But I don't, and I think the best objective conclusion, taking in all of the arguments made to date and the very wide variance in opinion even among those who've thought deeply about it, is that nobody has a very good estimate. (Edit: I mean estimates between very very hard and modestly tricky. I don't know of anyone who's addressed the hard parts and concluded that it happens by default.) Neither do we h

[-]davekasten11mo4511

Epistemic status: not a lawyer, but I've worked with a lot of them.

As I understand it, an NDA isn't enforceable against a subpoena (though the former employer can seek a protective order for the testimony). Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...

[-]metachirality11mo103

A subpoena for what?

[-]davekasten6mo4413

Okay, I spent much more time with the Anthropic RSP revisions today. Overall, I think it has two big thematic shifts for me:

1. It's way more "professionally paranoid," but needs even more so on non-cyber risks. A good start, but needs more on being able to stop human intelligence (i.e., good old fashioned spies)

2. It really has an aggressively strong vibe of "we are actually using this policy, and We Have Many Line Edits As A Result." You may not think that RSPs are sufficient -- I'm not sure I do, necessarily -- but I am heartened slightly that they genuinely seem to take the RSP seriously to the point of having mildly-frustrated-about-process-hiccup footnoes about it. (Free advice to Anthropic PR: interview a bunch of staff about this on camera, cut it together, and post it, it will be lovely and humanizing and great recruitment material, I bet).

[-]davekasten10mo3513

I think one thing that is poorly-understood by many folks outside of DC is just how baseline the assumption is that China is a by-default faithless negotiating partner and that by-default China will want to pick a war with America in 2027 or later.

(I am reporting, not endorsing. For example, it is deeply unclear to me why we should take another country's statements about the year they're gonna do a war at surface level)

[-]Thomas Kwa10mo*1611

"want to pick a war with America" is really strange wording because China's strategic goals are not "win a war against nuclear-armed America", but things like "be able to control its claims in the South China Sea including invading Taiwan without American interference". Likewise Russia doesn't want to "pick a war with the EU" but rather annex Ukraine; if they were stupid enough to want the former they would have just bombed Paris. I don't know whether national security people relate to the phrasing the same way but they do understand this.

1davekasten10mo

I totally understand your point, agree that many folks would use your phrasing, and nonetheless think there is something uniquely descriptively true about the phrasing I chose and I stand by it.

3William Riker10mo

Has China has made a statment about starting a war in 2027 or later? Who exactly is the belief that "by-default China will want to pick a war with America in 2027 or later" held by and how confident are you that they hold it?

2Garrett Baker10mo

It is supposedly their goal for when they will have modernized their military.

1William Riker10mo

Thanks for the link! The one mention of starting war was a quote from this 2006 white paper: Is this what you're referring to or did I miss something?

4davekasten10mo

The general belief in Washington is that Xi Jinping has ordered his military to be ready to invade Taiwan by then. (See, e.g., https://www.reuters.com/world/china/logistics-war-how-washington-is-preparing-chinese-invasion-taiwan-2024-01-31/ )

2Nathan Helm-Burger10mo

Sufficient AI superiority will mean overwhelming military superiority. If we remain ahead in AI it won't matter what other countries do. I expect this effect will dominate the strategic landscape by 2027.

1davekasten10mo

Say more ?

4Garrett Baker10mo

No, the belief is that China isn’t going to start a war before it has a modernized military, and they plan to have a modernized military by 2027. Therefore they won’t start a war before 2027. China has also been drooling over Taiwan for the past 100 years. Thus, if you don’t think diplomatic or economic ties mean much to them, and they’ll contend with the US’s military might before 2027, and neither party will use nukes in such a conflict, then you expect a war after 2027.

1William Riker10mo

Ah, I misread your comment. Thanks for clarifying!

2Garrett Baker10mo

I don't think they have stated they'll to to war after 2027. 2027 is the year of their "military modernization" target.

[-]davekasten6mo288

It's a small but positive sign that Anthropic sees taking 3 days beyond their RSP's specified timeframe to conduct a process without a formal exception as an issue. Signals that at least some members of the team there are extremely attuned to normalization of deviance concerns.

[-]davekasten2mo254

Ok, so it seems clear that we are, for better or worse, likely going to try to get AGI to do our alignment homework.

Who has thought through all the other homework we might give AGI that is as good of an idea, assuming a model that isn't an instant-game-over for us? E.G., I remember @Buck rattling off a list of other ideas that he had in his The Curve talk, but I feel like I haven't seen the list of, e.g., "here are all the ways I would like to run an automated counterintelligence sweep of my organization" ideas.

(Yes, obviously, if the AI is sneakily misaligned, you're just dead because it will trick you into firing all your researchers, etc.; this is written in a "playing to your outs" mentality, not an "I endorse this as a good plan" mentality.)

[-]Buck2mo142

@ryan_greenblatt is working on a list of alignment research applications. For control applications, you might enjoy the long list of control techniques in our original post.

5trevor2mo

How to build a lie detector app/program to release to the public (preferably packaged with advice/ideas on ways to use and strategies for marketing the app, e.g. packaging it with an animal body-language to english translator).

4Thane Ruthenis2mo

Technology for efficient human uploading. Ideally backed by theory we can independently verify as correct and doing what it's intended to do (rather than e. g. replacing the human upload with a copy of the AGI who developed this technology).

3Ebenezer Dukakis2mo

I think unlearning could be a good fit for automated alignment research. Unlearning could be a very general tool to address a lot of AI threat models. It might be possible to unlearn deception, scheming, manipulation of humans, cybersecurity, etc. I challenge you to come up with an AI safety failure story that can't, in principle, be countered through targeted unlearning in some way, shape, or form. Relative to some other kinds of alignment research, unlearning seems easy to automate, since you can optimize metrics for how well things have been unlearned. I like this post.

2Noosphere892mo

The big one probably has to do with being able to corrupt the metrics so totally that whatever you think you made them unlearn actually didn't happen, or just being able to relearn the knowledge so fast that unlearning doesn't matter, but yes unlearning is a very underrated direction for AI automation, because it targets so many threat models. It also satisfies the property of addressing a bottleneck (in this case, capabilities being so dangerous as to threaten any test), and while I wouldn't call it the best, it's still quite underrated how much unlearning will be useful. Similarly, domain-limiting AIs would be quite useful for control of AI.

3Ebenezer Dukakis2mo

I favor proactive approaches to unlearning which prevent the target knowledge from being acquired in the first place. E.g. for gradient routing, if you can restrict "self-awareness and knowledge of how to corrupt metrics" to a particular submodule of the network during learning, then if that submodule isn't active, you can be reasonably confident that the metrics aren't currently being corrupted. (Even if that submodule sandbags and underrates its own knowledge, that should be fine if the devs know to be wary of it. Just ablate that submodule whenever you're measuring something that matters, regardless of whether your metrics say it knows stuff!) Some related thoughts here Unlearning techniques should probably be battle-tested in low-stakes "model organism" type contexts, where metrics corruption isn't expected. Curious what areas you are most excited about!

3Noosphere892mo

I basically agree with this, and on this question: My big areas of excitement are AI control (in a broad sense) and synthetic dataset making for AI alignment of successors.

2Quinn2mo

I'm working on making sure we get high quality critical systems software out of early AGI. Hardened infrastructure buys us a lot in the slightly crazy story of "self-exfiltrated model attacks the power grid", but buys us even more in less crazy stories about all the software modules adjacent to AGI having vulnerabilities rapidly patched at crunchtime.

1yams2mo

Preliminary thoughts from Ryan Greenblatt on this here.

[-]davekasten5mo246

At LessOnline, there was a big discussion one night around the picnic tables with @Eliezer_Yudkovsky , @habryka , and some interlocutors from the frontier labs (you'll momentarily see why I'm being vague on the latter names).

One question was: "does DC actually listen to whistleblowers?" and I contributed that, in fact, DC does indeed have a script for this, and resigning in protest is a key part of it, especially ever since the Nixon years.

Here is a usefully publicly-shareable anecdote on how strongly this norm is embedded in national security decision-making, from the New Yorker article "The U.S. Spies Who Sound the Alarm About Election Interference" by David Kirkpatrick, Oct 21, 2024:
(https://archive.ph/8Nkx5)

The experts’ chair insisted that in this cycle the intelligence agencies had not withheld information “that met all five of the criteria”—and did not risk exposing sources and methods. Nor had the leaders’ group ever overruled a recommendation by the career experts. And if they did? It would be the job of the chair of the experts’ group to stand up or speak out, she told me: “That is why we pick a career civil servant who is retirement-eligible.” In other words, she can resign in protest.

[-]gwern5mo*146

Also of relevance is the wave of resignations from the DC newspaper The Washington Post the past few days over Jeff Bezos suddenly exerting control.

6davekasten5mo

Yup. The fact that the profession that writes the news sees "I should resign in protest" as their own responsibility in this circumstance really reveals something.

[-]davekasten26d205

The elites do want you to know it: you can just email a Congressional office and get a meeting

6Seth Herd26d

Would I have to go to DC? Because I hate going to DC. Not that I wouldn't to save the world, but I'd want to be sure it was necessary. Only partly kidding. Maybe if people got a rationalist enclave in DC going we'd be less averse?

[-]davekasten26d147

You can definitely meet your own district's staff locally (e.g., if you're in Berkeley, Congresswoman Simon has an office in Oakland, Senator Padilla has an office in SF, and Senator Schiff's offices look not to be finalized yet but undoubtedly will include a Bay Area Office).

You can also meet most Congressional offices' staff via Zoom or phone (though some offices strongly prefer in-person meetings).

There is also indeed a meaningful rationalist presence in DC, though opinions vary as to whether the enclave is in Adams Morgan-Columbia Heights, Northern Virginia, or Silver Spring.*

*This trichotomy is funny, but hard to culturally translate unless you want a 15,000 word thesis on DC-area housing and federal office building policy since 1945 and its related cultural signifiers. Just...just trust me on this.

2Alexander Gietelink Oldenziel24d

The people require it, sir.

[-]davekasten6mo150

Basic Q: has anyone written much down about what sorts of endgame strategies you'd see just-before-ASI from the perspective of "it's about to go well, and we want to maximize the benefits of it" ?

For example: if we saw OpenPhil suddenly make a massive push to just mitigate mortality at the cost of literally every other development goal they have, I might suspect that they suspect that we're about to all be immortal under ASI, and they're trying to get as many people possible to that future...

7Linch6mo

My guess is that we wouldn't actually know with high confidence before (and likely even some time after) things-will-definitely-be-fine. E.g. 3 months after safe ASI people might still be publishing their alignment takes.

1davekasten6mo

Oh, to be clear I'm not sure this is at all actually likely, but I was curious if anyone had explored the possibility conditional on it being likely

5Seth Herd6mo

Endgame strategies from who? A lot of powerful people would focus on being the ones to control it when it happens, so they'd control the future - and not be subject to some else's control of the future. OpenPhil is about the only org that would think first of the public benefit and not the dangers of other humans controlling it. And not a terribly powerful org, particularly relative to governments.

4davekasten6mo

I was being intentionally broad, here. I am probably less interested for purposes of this particular post only in the question of "who controls the future" swerves and more about "what else would interested, agentic actors do" questions. It is not at all clear to me that OpenPhil is the only org who feels this way -- I can think of several non-EA-ish charities that if they genuinely 100% believed "none of the people you care for will die of the evils you fight if you can just keep them alive for the next 90 days" would plausibly do some interestingly agentic stuff.

[-]davekasten4d140

We're hiring at ControlAI for folks who walk to work on UK and US policy advocacy. Come talk to Congress and Parliament and stop risks from unsafe superintelligences! controlai.com/careers

(Admins: I don't tend to see many folks posting this sort of thing here, so feel free to nuke this post if not the sort of content you're going for. But given audience here, figured might be of interest)

1Felix C.4d

Thank you for posting this. Are there any opportunities for students about to graduate to apply themselves, particularly without a C.S background? My undergraduate experience was focused on Business and IR (Cold War history, Sino-U.S relations) before I pivoted my long term focus to AI safety policy, and it's been difficult to find good entry points for EA work in this field as a new grad. I've been monitoring 80,000 hours and applying to research fellowships where I can so far, but I'm always looking for new positions. If you or anyone else knows an org looking to onboard some fresh talent, I'd be happy to help. Edit: Application submitted.

[-]davekasten9mo142

A random observation from a think tank event last night in DC -- the average person in those rooms is convinced there's a problem, but that it's the near-term harms, the AI ethics stuff, etc. The highest-status and highest-rank people in those rooms seem to be much more concerned about catastrophic harms.

This is a very weird set of selection effects. I'm not sure what to make of it, honestly.

6habryka8mo

Random psychologizing explanation that resonates most with me: Claiming to address big problems requires high-status. A low-rank person is allowed to bring up minor issues, but they are not in a position to bring up big issues that might reflect on the status of many high-status people. This is a pretty common phenomenon that I've observed. Many people react with strong social slap-down motions if you (for example) call in question whether the net-effect of a whole social community or economic sector is negative, where the underlying cognitive reality seems similar to "you are not high status enough to bring forward this grievance".

1davekasten8mo

I think this is plausibly describing some folks! But I also think there's a separate piece -- I observe, with pretty high odds that it isn't just an act, that at least some people are trying to associate themselves with the near-term harms and AI ethics stuff because they think that is the higher-status stuff, despite direct obvious evidence that the highest-status people in the room disagree.

6Dagon9mo

There are (at least) two models which could partially explain this: 1) The high-status/high-rank people have that status because they're better at abstract and long-term thinking, and their role is more toward preventing catastrophe rather than nudging toward improvements. They leave the lesser concerns to the underlings, with the (sometimes correct) belief that it'll come out OK without their involvement. 2) The high-status/high-rank people are rich and powerful enough to be somewahat insulated from most of the prosaic AI risks, while the average member can legitimately be hurt by such things. So everyone is just focusing on the things most likely to impact themselves. edit: to clarify, these are two models that do NOT imply the obvious "smarter/more powerful people are correctly worried about the REAL threats, and the average person's concerns are probably unimportant/uninformed". It's quite possible that this division doesn't tell us much about the relative importance of those different risks.

1davekasten9mo

Yup1 I think those are potentially very plausible, and similar things were on my short list of possible explanations. I would be very not shocked if those are the true reasons. I just don't think I have anywhere near enough evidence yet to actually conclude that, so I'm just reporting the random observation for now :)

1ZY6mo

Does "highest status" here mean highest expertise in a domain generally agreed by people in that domain, and/or education level, and/or privileged schools, and/or from more economically powerful countries etc? It is also good to note that sometimes the "status" is dynamic, and may or may not imply anything causal with their decision making or choice on priorities. One scenario is "higher status" might correlates with better resources to achieve those statuses, and a possibility is as a result they haven't experienced or they are not subject to many near-term harms. In other words, it is not really about the difference between "average" and "high status"'s people's intelligence, but more about what kind of world they are exposed to. I do think it is good to hear all different perspectives to stay curious/open-minded. edit: I just saw Dragon nicely listed two potential reasons, with scenario 2 mentioning something similar with my comment here. But something slightly specific in my thinking, is that these choices made by "average" and "high status" people may or may not be conscious, but rather from the experience from their lives and the world they are exposed to.

1davekasten6mo

I mean, functionally all of those things. (Well, minus the country dynamic. Everyone at this event I talked to was US, UK, or Canadian, which is all sorta one team for purposes of status dynamics at that event)

2[comment deleted]9mo

[-]davekasten9mo108

I really dislike the term "warning shot," and I'm trying to get it out of my vocabulary. I understand how it came to be a term people use. But, if we think it might actually be something that happens, and when it happens, it plausibly and tragically results the deaths of many folks, isn't the right term "mass casualty event" ?

[-]habryka9mo103

I think many mass casualty events would be warning shots, but not all warning shots would be mass casualty events. I think an agentic AI system getting most of the way towards escaping containment or a major fraud being perpetrated by an AI system would both be meaningful warning shots, but wouldn't involve mass casualties.

I do agree with what I think you are pointin at, which is that there is something Orwellian about the "warning shot" language. Like, in many of these scenarios we are talking about large negative consequences, and it seems good to have a word that owns that (in-particular in as much as people are thinking about making warning shots more likely before an irrecoverable catastrophe occurs).

4davekasten9mo

I totally think it's true that there are warning shots that would be non-mass-casualty events, to be clear, and I agree that the scenarios you note could maybe be those. (I was trying to use "plausibly" to gesture at a wide range of scenarios, but I totally agree the comment as written isn't clearly meaning that). I don't think folks intended anything Orwellian, just sort of something we stumbled into, and heck, if we can both be less Orwellian and be more compelling policy advocates at the same time, why not, I figure.

5robo9mo

I think a lot of people losing their jobs would probably do the trick, politics-wise. For most people the crux is "will AIs will be more capable than humans", not "might AIs more capable than humans be dangerous".

1davekasten9mo

You know, you're not the first person to make that argument to me recently. I admit that I find it more persuasive than I used to. Put another way: "will AI take all the jobs" is another way of saying* "will I suddenly lose the ability to feed and protect those I love." It's an apocalypse in microcosm, and it's one that doesn't require a lot of theory to grasp. *Yes, yes, you could imagine universal basic income or whatever. Do you think the average person is Actually Expecting to Get That ?

[-]davekasten6mo98

Has anyone thought about the things that governments are uniquely good at when it comes to evaluating models?

Here are at least 3 things I think they have as benefits:
1. Just an independent 3rd-party perspective generally

2. The ability to draw insights across multiple labs' efforts, and identify patterns that others might not be able to

3. The ability to draw on classified threat intelligence to inform its research (e.g., Country X is using model Y for bad behavior Z) and to test the model for classified capabilities (bright line example: "can you design an accurate classified nuclear explosive lensing arrangement").

Are there others that come to mind?

[-]davekasten3mo85

One point that maybe someone's made, but I haven't run across recently: if you want to turn AI development into a Manhattan Project, you will by-default face some real delays from the reorganization of private efforts into one big national effort. In a close race, you might actually see pressures not to do so, because you don't want to give up 6 months to a year on reorg drama -- so in some possible worlds, the Project is actually a deceleration move in the short term, even if it accelerates in the long term!

3Nathan Helm-Burger3mo

This is a point that's definitely come up in private discussions I've been a part of. I don't remember if I saw it said publicly somewhere.

3davekasten3mo

I am (sincerely!) glad that this is obvious to other people too and that they are talking about it already!

[-]davekasten6mo70

It seems like the current meta is to write a big essay outlining your opinions about AI (see, e.g., Gladstone Report, Situational Awareness, various essays recently by Sam Altman and Dario Amodei, even the A Narrow Path report I co-authored).

Why do we think this is the case?
I can imagine at least 3 hypotheses:
1. Just path-dependence; someone did it, it went well, others imitated

2. Essays are High Status Serious Writing, and people want to obtain that trophy for their ideas

3. This is a return to the true original meaning of an essay, under Mont... (read more)

[-]gwern6mo*1910

Well, what's the alternative? If you think there is something weird enough and suboptimal about essay formats that you are reaching for 'random chance' or 'monkey see monkey do' level explanations, that implies you think there is some much superior format they ought to be using instead. But I can't see what. I think it might be helpful to try to make the case for doing these things via some of the alternatives:

a peer-reviewed Nature paper which would be published 2 years from now, maybe, behind a paywall
a published book, published 3 years from starting the first draft now, which some people might get around to reading a year or two after that, and dropping halfway through (assuming you finish and didn't burn out writing it)
a 1 minute Tiktok video by an AI person with non-supermodel looks
a 5-minute heavily-excerpted interview on CNN
a 750-word WSJ or NYT op-ed
a 10-page Arxiv paper in the standard LaTeX template
a Twitter thread of 500 tweets (which can only be read by logged-in users)
a Medium post (which can't be read because it is written in a light gray font illegible to anyone over the age of 20. Also, it's paywalled 90% of the time.)
a 6 hour Lex Fridman podcast interview

... (read more)

[-]Seth Herd6mo142

I think those are the meta because they have just enough space to not only give opinions but to mention reasons for those opinions and expertise/background to support the many unstated judgment calls.

Note that the essays by Altman and Amodei are popular because their positions are central beyond the others because they have not only demonstrable backgrounds in AI but lots of name recognition (we're mostly assuming Altman has bothered learning a lot about how Transformers work even if we don't like him). And that the Gladstone report got itself commissioned by at least a little piece of the government.

A Narrow Path just demonstrates in the text that you and your co-authors have thought deeply about the topic. Shorter essays leave more guesswork on the authors' expertise and depth of consideration.

[-]davekasten3mo50

Incidentally, spurred by @Mo Putera's posting of Vernor Vinge's A Fire Upon The Deep annotations, I want to remind folks that Vinge's Rainbows End is very good and doesn't get enough attention, and will give you a less-incorrect understanding of how national security people think.

[-]davekasten8mo50

I'm pretty sure that I think "infohazard" is a conceptual dead-end concept that embeds some really false understandings of how secrets are used by humans. It is an orphan of a concept -- it doesn't go anywhere. Ok, the information's harmful. You need humans to touch that info anyways to do responsible risk-mitigation. So now what ?

[-]ABlue8mo136

That "so now what" doesn't sound like a dead end to me. The question of how to mitigate risk when normal risk-mitigation procedures are themselves risky seems like an important one.

7Dagon8mo

I agree that it's not terribly useful beyond identifying someone's fears. Using almost any taxonomy to specify what the speaker is actually worried about lets you stop saying "infohazard" and start talking about "bad actor misuse of information" or "naive user tricked by partial (but true) information". These ARE often useful, even though the aggregate term "infohazard" is limited.

3Zac Hatfield-Dodds8mo

See e.g. Table 1 of https://nickbostrom.com/information-hazards.pdf

1davekasten8mo

Yeah, that's a useful taxonomy to be reminded of. I think it's interesting how the "development hazard", item 8, with maybe a smidge of "adversary hazard", is the driver of people's thinking on AI. I'm pretty unconvinced that good infohazard doctrine, even for AI, can be written based on thinking mainly about that!

7Shankar Sivarajan8mo

I suggest there is a concept distinct enough to warrant the special term, but if it's expansive enough to include secrets, beneficial information that some people prefer others not know, that renders it worthless. "Infohazard" ought to be reserved for information that harms the mind that contains it, with spoilers as the most mild examples, SCP-style horrors as the extreme fictional examples.

5habryka8mo

I think within a bayesian framework where in-general you assume information has positive value, it's useful to have an explicit term when that is not the case. It's a relatively rare occurrence, and as such your usual ways of dealing with information will probably backfire. The obvious things to do is to not learn about that information in the first place (i.e. avoid dangerous research), understand and address the causes for why this information is dangerous (because e.g. you can't coordinate on not building dangerous technology), or as a last resort, silo the information and limit the spread of it. I do think that it would be useful to have different words that distinguish between "infohazard to the average individual" and "societal infohazard". The first one is really exceedingly rare. The second one is still rare but more common because society has a huge distribution of beliefs and enough crazy people that if information can be used dangerously, there is a non-trivial chance it will.

2tailcalled8mo

I still like the term "recipe for destruction" when limiting it to stuff similar to dangerous technology.

1davekasten8mo

I think a lot of my underlying instinctive opposition to this concept boils down to thinking that we can and do coordinate on this stuff quite a lot. Arguably, AI is the weird counterexample of a thought that wants to be thunk -- I think modern Western society is very nearly tailor-made to seek a thing that is abstract, maximizing, systematizing of knowledge, and useful, especially if it fills a hole left by the collapse of organized religion. I think for most other infohazards, the proper approach requires setting up an (often-government) team that handles them, which requires those employees to expose themselves to the infohazard to manage it. And, yeah, sometimes they suffer real damage from it. There's no way to analyze ISIS beheading videos to stop their perpetrators without seeing some beheading videos; I think that's the more-common varietal of infohazard I'm thinking of.

3[anonymous]8mo

I think one of the points is that you should now focus on selective rather than corrective or structural means to figure out who is nonetheless allowed to work on the basis of this information. Calling something an infohazard, at least in my thinking, generally implies both that: * any attempts to devise galaxy-brained incentive structures that try to get large groups of people to nonetheless react in socially beneficial ways when they access this information are totally doomed and should be scrapped from the beginning. * you absolutely should not give this information to anyone that you have doubts would handle it well; musings along the lines of "but maybe I can teach/convince them later on what the best way to go about this is" are generally wrong and should also be dismissed. So what do you do if you nonetheless require that at least some people are keeping track of things? Well, as I said above, you use selective methods instead. More precisely, you carefully curate a very short list of human beings that are responsible people and likely also share your meta views on how dangerous truths ought to be handled, and you do your absolute best to make sure the group never expands beyond those you have already vetted as capable of handling the situation properly.

1davekasten8mo

I think at the meta level I very much doubt that I am responsible enough to create and curate a list of human beings for the most dangerous hazards. For example, I am very confident that I could not 100% successfully detect a foreign government spy inside my friend group, because even the US intelligence community can't do that... you need other mitigating controls, instead.

[-]davekasten4mo42

I have a few weeks off coming up shortly, and I'm planning on spending some of it monkeying around AI and code stuff. I can think of two obvious tacks: 1. Go do some fundamentals learning on technical stuff I don't have hands-on technical experience with or 2. go build on new fun stuff.

Does anyone have particular lists of learning topics / syllabi / similar things like that that would be a good fit for "fairly familiar with the broad policy/technical space, but his largest shipped chunk of code is a few hundred lines of python" person like me?

3Joseph Miller4mo

The ARENA curriculum is very good.

[-]davekasten7mo20

Preregistering intent to write "Every Bay Area Walled Compound" (hat tip: Emma Liddell)

Unrelatedly, I am in Berkeley through Wednesday afternoon, let me know if you're around and would like to chat

1davekasten7mo

I suspect this won't get published until November at the earliest, but I am already delightfully pleased with this bit:

[-]davekasten9mo21

I feel like one of the trivially most obvious signs that AI safety comms hasn't gone actually mainstream yet is that we don't say, "yeah, superintelligent AI is very risky. No, I don't mean Terminator. I'm thinking more Person of Interest, you know, that show with the guy from the Sound of Freedom and the other guy who was on Lost and Evil?"

2[anonymous]9mo

I agree (minor spoilers below).

1davekasten9mo

(This is not an endorsement of Jim Caviezel's beliefs, in case anyone somehow missed my point here.)

[-]davekasten5mo10

I'll be in Berkeley Weds evening through next Monday, would love to chat with, well, basically anyone who wants to chat. (I'll be at The Curve Fri-Sun, so if you're already gonna be there, come find me there between the raindrops!)

[-]davekasten8mo10

Why are we so much more worried about LLMs having CBRN risk than super-radicalization risk, precisely ?

(or is this just a expected-harm metric rather than a probability metric ?)

[-]davekasten8mo10

I am (speaking personally) pleasantly surprised by Anthropic's letter. https://cdn.sanity.io/files/4zrzovbb/website/6a3b14a98a781a6b69b9a3c5b65da26a44ecddc6.pdf

[-]davekasten11mo10

I'll be at LessOnline this upcoming weekend -- would love to talk to folks about what things they wish someone would write about to explain how DC policy stuff and LessWrong-y themes could be better connected.

[-]davekasten1y10

Hypothesis, super weakly held and based on anecdote:
One big difference between US national security policy people and AI safety people is that the "grieving the possibility that we might all die" moment happened, on average, more years ago for the national security policy person than the AI safety person.

This is (even more weakly held) because the national security field has existed for longer, so many participants literally had the "oh, what happens if we get nuked by Russia" moment in their careers in the Literal 1980s...

[-]davekasten5mo-10

Ok, so Anthropic's new policy post (explicitly NOT linkposting it properly since I assume @Zac Hatfield-Dodds or @Evan Hubinger or someone else from Anthropic will, and figure the main convo should happen there, and don't want to incentivize fragmenting of conversation) seems to have a very obvious implication.

Unrelated, I just slammed a big AGI-by-2028 order on Manifold Markets.

[+][comment deleted]8mo10

Deleted by davekasten, 08/23/2024

Reason: duplicate

Moderation Log