Andrew Critch lists several research areas that seem important to AI existential safety, and evaluates them for direct helpfulness, educational value, and neglect. Along the way, he argues that the main way he sees present-day technical research helping is by anticipating, legitimizing and fulfilling governance demands for AI technology that will arise later.

Customize

Quick Takes

leogao3h290

random brainstorming ideas for things the ideal sane discourse encouraging social media platform would have: * have an LM look at the comment you're writing and real time give feedback on things like "are you sure you want to say that? people will interpret that as an attack and become more defensive, so your point will not be heard". addendum: if it notices you're really fuming and flame warring, literally gray out the text box for 2 minutes with a message like "take a deep breath. go for a walk. yelling never changes minds" * have some threaded chat component bolted on (I have takes on best threading system). big problem is posts are fundamentally too high effort to be a way to think; people want to talk over chat (see success of discord). dialogues were ok but still too high effort and nobody wants to read the transcript. one stupid idea is have an LM look at the transcript and gently nudge people to write things up if the convo is interesting and to have UI affordances to make it low friction (eg a single button that instantly creates a new post and automatically invites everyone from the convo to edit, and auto populates the headers) * inspired by the court system, the most autistically rule following part of the US government: have explicit trusted judges who can be summoned to adjudicate claims or meta level "is this valid arguing" claims. top level judges are selected for fixed terms by a weighted sortition scheme that uses some game theoretic / schelling point stuff to discourage partisanship * recommendation system where you can say what kind of stuff you want to be recommended in some text box in the settings. also when people click "good/bad rec" buttons on the home page, try to notice patterns and occasionally ask the user whether a specific noticed pattern is correct and ask whether they want it appended to their rec preferences * opt in anti scrolling pop up that asks you every few days what the highest value interaction you had recently on the

leogao2h120

one big problem with using LMs too much imo is that they are dumb and catastrophically wrong about things a lot, but they are very pleasant to talk to, project confidence and knowledgeability, and reply to messages faster than 99.99% of people. these things are more easily noticeable than subtle falsehood, and reinforce a reflex of asking the model more and more. it's very analogous to twitter soundbites vs reading long form writing and how that eroded epistemics. hotter take: the extent to which one finds current LMs smart is probably correlated with how much one is swayed by good vibes from their interlocutor as opposed to the substance of the argument (ofc conditional on the model actually giving good vibes, which varies from person to person. I personally never liked chatgpt vibes until I wrote a big system prompt)

johnswentworth1dΩ39112-7

I was a relatively late adopter of the smartphone. I was still using a flip phone until around 2015 or 2016 ish. From 2013 to early 2015, I worked as a data scientist at a startup whose product was a mobile social media app; my determination to avoid smartphones became somewhat of a joke there. Even back then, developers talked about UI design for smartphones in terms of attention. Like, the core "advantages" of the smartphone were the "ability to present timely information" (i.e. interrupt/distract you) and always being on hand. Also it was small, so anything too complicated to fit in like three words and one icon was not going to fly. ... and, like, man, that sure did not make me want to buy a smartphone. Even today, I view my phone as a demon which will try to suck away my attention if I let my guard down. I have zero social media apps on there, and no app ever gets push notif permissions when not open except vanilla phone calls and SMS. People would sometimes say something like "John, you should really get a smartphone, you'll fall behind without one" and my gut response was roughly "No, I'm staying in place, and the rest of you are moving backwards". And in hindsight, boy howdy do I endorse that attitude! Past John's gut was right on the money with that one. I notice that I have an extremely similar gut feeling about LLMs today. Like, when I look at the people who are relatively early adopters, making relatively heavy use of LLMs... I do not feel like I'll fall behind if I don't leverage them more. I feel like the people using them a lot are mostly moving backwards, and I'm staying in place.

Zach Furman14h353

I’ve been trying to understand modules for a long time. They’re a particular algebraic structure in commutative algebra which seems to show up everywhere any time you get anywhere close to talking about rings - and I could never figure out why. Any time I have some simple question about algebraic geometry, for instance, it almost invariably terminates in some completely obtuse property of some module. This confused me. It was never particularly clear to me from their definition why modules should be so central, or so “deep.” I’m going to try to explain the intuition I have now, mostly to clarify this for myself, but also incidentally in the hope of clarifying this for other people. I’m just a student when it comes to commutative algebra, so inevitably this is going to be rather amateur-ish and belabor obvious points, but hopefully that leads to something more understandable to beginners. This will assume familiarity with basic abstract algebra. Unless stated otherwise, I’ll restrict to commutative rings because I don’t understand much about non-commutative ones. The typical motivation for modules: "vector spaces but with rings" The typical way modules are motivated is simple: they’re just vector spaces, but you relax the definition so that you can replace the underlying field with a ring. That is, an R-module M is a ring R and an abelian group M, and an operation ⋅:R×M→M that respects the ring structure of R, i.e. for all r,s∈R and x,y∈M: * r⋅(x+y)=r⋅x+r⋅y * (r+s)⋅x=r⋅x+s⋅x * (rs)⋅x=r⋅(s⋅x) * 1⋅x=x This is literally the definition of a vector space, except we haven’t required our scalars R to be a field, only a ring (i.e. multiplicative inverses don’t have to always exist). So, like, instead of multiplying vectors by real numbers or something, you multiply vectors by integers, or a polynomial - those are your scalars now. Sounds simple, right? Vector spaces are pretty easy to understand, and nobody really thinks about the underlying field of a vector space

Alexander Gietelink Oldenziel1d*784

Highly recommended video on drone development in the Ukraine-Russia war, interview with a Russian private military drone developer. some key takeaways * Drones now account for >70% of kills on the battlefields. * There are few to none effective counters to drones. The on * Electronic jamming is a rare exception but drones carrying 5-15km fiber optic cables are immune to jamming. In the future AI-controlled drones will be immune to jamming. * 'Laser is currently a joke. It works in theory, not in practice. Western demonstrations at expos are always in ideal conditions. ' but he also says that both Russia and Ukraine are actively working on the technology and he thinks it could be an effective weapon. * Nets can be effective but fiber-optic drones can fly very low and not lose connection are increasingly used to slip under the nets. * Soldiers are increasingly opting for bikes instead of vehicles as the latter don't offer much protection to drones. * The big elephant in the room: AI drones. * It seems like the obvious next step - why hasn't it happened yet? * 'at Western military expos everybody is talking AI-controlled drones. This is nonsense of course' Apparently the limitation is that it's currently too expensive to run AI locally on a drone but this is rapidly changing with new nVidea chips. He expects chips to become small and cheap soon enough that AI drones will appear soon. * There is a line of 'Vampire' drones that are autonomous and deadly but use older pre-programmed tactics instead of modern AI * One of the most lethal tactics is drone mining: let a drone lie in wait somewhere in the bushes until a human or vehicle passes by. * This tactic was pioneered by the Ukranians. " Early on, soldiers would try to scavenge fallen drones... then Boom" . * Western drones are trash compared to Ukranian and Russian forces * Swishblade, Phoenix Ghost and a consortium of Boeing designed drones are ineffective, fragile and wildly overpri

Popular Comments

Recent Discussion

habryka1h2328

Don't Eat Honey

My guess is this is obvious, but IMO it seems extremely unlikely to me for bee-experience to be remotely as important to care about as cow experience, as to not make statements like this just approximately insane: > 97% of years of animal life brought about by industrial farming have been through the honey industry (though this doesn’t take into account other insect farming). Like, no, this isn't how this works. This obviously isn't how this works. You can't add up experience hours like this. At the very least use some kind of neuron basis. > The median estimate, from the most detailed report ever done on the intensity of pleasure and pain in animals, was that bees suffer 7% as intensely as humans. The mean estimate was around 15% as intensely as people. Bees were guessed to be more intensely conscious than salmon! If anyone remotely thinks a bee suffering is 15% (!!!!!!!!) as important as a human suffering, you do not sound like someone who has thought about this reasonably at all. It is so many orders of magnitude away from what sounds reasonable to me that I find myself wanting to look somewhere else but the arguments in things like the Rethink Priorities report (which I have read, and argued with people about for many hours, and still sound insane to me, and do not hold up logically), but instead look towards things like there being some kind of social signaling madness where someone is trying to signal commitment to some group standard of dedication, which involves some runaway set of extreme beliefs. Edit: And to avoid a slipping of local norms here. I am only leaving this comment here now after I have seriously entertained the hypothesis that I might be wrong, that maybe there do exist good arguments for moral weights that seem crazy to from where I was originally, but no, after looking into the arguments for quite a while, they still seem crazy to me, and so now I feel comfortable moving on and trying to think about what psychological or social process produces posts like this. And still, I am hesitant about it, because many readers have probably not gone through the same journey, and I don't want a culture of dismissing things just because they are big and would imply drastic actions.

Zach Stein-Perlman7h233

Substack and Other Blog Recommendations

Pitching my AI safety blog: I write about what AI companies are doing in terms of safety. My best recent post is AI companies' eval reports mostly don't support their claims. See also my websites ailabwatch.org and aisafetyclaims.org collecting and analyzing public information on what companies are doing; my blog will soon be the main way to learn about new content on my sites.

Rohin Shah4d10244

A case for courage, when speaking of AI danger

While I disagree with Nate on a wide variety of topics (including implicit claims in this post), I do want to explicitly highlight strong agreement with this: > I have a whole spiel about how your conversation-partner will react very differently if you share your concerns while feeling ashamed about them versus if you share your concerns as if they’re obvious and sensible, because humans are very good at picking up on your social cues. If you act as if it’s shameful to believe AI will kill us all, people are more prone to treat you that way. If you act as if it’s an obvious serious threat, they’re more likely to take it seriously too. The position that is "obvious and sensible" doesn't have to be "if anyone builds it, everyone dies". I don't believe that position. It could instead be "there is a real threat model for existential risk, and it is important that society does more to address it than it is currently doing". If you're going to share concerns at all, figure out the position you do have courage in, and then discuss that as if it is obvious and sensible, not as if you are ashamed of it. (Note that I am not convinced that you should always be sharing your concerns. This is a claim about how you should share concerns, conditional on having decided that you are going to share them.)

leogao3h290

leogao2h120

johnswentworth1dΩ39112-7

Zach Furman14h353

Alexander Gietelink Oldenziel1d*784

TurnTrout's shortform feed

TurnTrout

Ω 106y

19Duncan Sabien (Inactive)24m

This post seems to me like very strong evidence that Nate was absolutely correct to block Alex. For context, I have a deep and abiding fondness for both Alex and Nate, and have spent the last several years off to the side sort of aghast and dismayed at the deterioration in their relationship. I've felt helpless to bridge the gap, and have mostly ended up saying very little to either party about it. But the above feels to me like a particularly grotesque combination of [petty] and [disingenuous], and it's unfortunately in-line with my sense that Alex has been something-like hounding Nate for a while. Actively nursing a grudge, taking every cheap opportunity to grind an axe, deliberately targeting "trash Nate's reputation via maximally uncharitable summaries and characterizations" rather than something like "cause people to accurately understand the history of our disagreement so they can form their own judgments," locating all of the grievance entirely within Nate and taking no responsibility for his own contributions to the dynamic/the results of his consensual interactions and choices, etc. etc. etc. I've genuinely been trying to cling to neutrality in this feud between two people I respect, but at this point it's no longer possible. (I'll note that my own sense, looking in from the outside, is that something like a full year of friendly-interactions-with-Nate passed between the conversations Alex represents as having been so awful, and the start of Alex's public vendetta, which was more closely coincident with some romantic drama. If I had lower epistemic standards, I might find it easy to write a sentence like "Therefore, I conclude that Alex's true grievance is about a girl, and he is only pretending that it's about their AI conversations because that's a more-likely-to-garner-sympathy pretext." I actually don't conclude that, because concluding that would be irresponsible and insufficiently justified; it's merely my foremost hypothesis among several.) A

RationalElf4m10

I appreciate you writing this, and think it was helpful. I don't have a strong take on Nate's object-level decisions here, why TurnTrout said what he said, etc. But I wanted to flag that the following seems like a huge understatement:

The concerns about Nate's conversational style, and the impacts of the way he comports himself, aren't nonsense. Some people in fact manage to never bruise another person, conversationally, the way Nate has bruised more than one person.
But they're objectively overblown, and they're objectively overblown in exactly the wa

... (read more)

10Recurrented1h

This doesn’t match my experience of Nate and I wonder if you may hold a bias here

6Said Achmiz3h

I strongly endorse this. Data Secrets Lox makes very heavy use of the “move post”/“split topic”/“merge topics” functionality of the forum software we use, and it works spectacularly well. It almost completely defuses or prevents a huge swath of arguments, disagreements, annoyances, etc. about what’s appropriate to post when and where, etc. Thanks to this capability, DSL is able to maintain a policy of “never delete content” (except for obvious spam, or content that is outright illegal), while still ensuring that discussions don’t get clogged up with totally off-topic digressions. Moving/splitting/merging instead of deletion makes a forum much more pleasant to use.

Proposal for making credible commitments to AIs.

Cleo Nardo

Acknowledgments: The core scheme here was suggested by Prof. Gabriel Weil.

There has been growing interest in the dealmaking agenda: humans make deals with AIs (misaligned but lacking decisive strategic advantage) where they promise to be safe and useful for some fixed term (e.g. 2026-2028) and we promise to compensate them in the future, conditional on (i) verifying the AIs were compliant, and (ii) verifying the AIs would spend the resources in an acceptable way.^[1]

I think the dealmaking agenda breaks down into two main subproblems:

How can we make credible commitments to AIs?
Would credible commitments motivate an AI to be safe and useful?

There are other issues, but when I've discussed dealmaking with people, (1) and (2) are the most common issues raised. See footnote for some other issues in...

(See More – 514 more words)

Cleo Nardo5m20

Would you agree that what we have now is nothing like that?

Yes.

7Buck3h

We made a similar proposal here:

4Cleo Nardo1h

Yep, this is a very similar proposal. Making Deals with Early Schemers describes a "Chartered Trust scheme", which I'd say is half-way between the "Basic Scheme" and "Weil's Scheme". I first heard about the Chartered Trust scheme from @KFinn, but no doubt the idea has been floating around for a while. I think there's a spectrum of proposals from: * The Basic Scheme (c.f. AI Rights for Human Safety) * Chartered Trust scheme (c.f. Making Deals with Early Schemers) * Weil's scheme (c.f. Proposal for making credible commitments to AIs) The axis is something like: The AIs are increasingly pessimistic that the courts will decide novel cases in their favour, and increasingly optimistic that specific humans will keep their promises. We should probably use the scheme that the AIs tell us they find most convincing. Personally, the more I've thought about dealmaking, I've updated towards "most of the credibility oomph will come from the promising-keeping tendencies of particular people, and not by the legal or cryptographic mechanism".

6Raemon6h

Curated. This is a simple and straightforward idea that I hadn't heard before, that seems like an interesting tool to have in humanity's toolkit. AFAICT this post doesn't address the "when do you pay out?" question. I think it is pretty important we do not pay out until the acute risk period is over. (i.e. we are confident in civilization's ability to detect rogue AIs doing catastrophic things. This could be via solving Strong Alignment or potentially other things). i.e. if you promise to pay the AI in 2029, I think there's way too many things that could go wrong there*. It's hard to define "acute risk period is over", but, a neat thing about this scheme is you can outsource that judgment to the particular humans playing the "keep the promise" role. You need people that both humans and AIs would trust to do that fairly. I don't know all the people on that list well enough to endorse them all. I think maybe 3-5 of them are people I expect to actually be able to do the whole job. Some of them I would currently bet against being competent enough at the "do a philosophically and strategically competent job of vetting that it's safe to pay out" (although they could potentially upskill and demonstrate credibility at this). There also seem like a couple people IMO conspicuously missing from the list, but, I think I don't wanna open the can of worms of arguing about that right now. * I can maybe imagine smart people coming up with some whitelisted things-the-AI-could-do that we could give it in 2029, but, sure seems dicey.

Fake thinking and real thinking

110

Joe Carlsmith

5mo

papetoast13m10

This is a token of small non-appreciation for how unnecessary (imo only!) long this post is for a relatively simple concept. I found your bag of tricks helpful, but didnt enjoy how it dragged on and on and on

Support for bedrock liberal principles seems to be in pretty bad shape these days

Max H

By 'bedrock liberal principles', I mean things like: respect for individual liberties and property rights, respect for the rule of law and equal treatment under the law, and a widespread / consensus belief that authority and legitimacy of the state derive from the consent of the governed.

Note that "consent of the governed" is distinct from simple democracy / majoritarianism: a 90% majority that uses state power to take all the stuff of the other 10% might be democratic but isn't particularly liberal or legitimate according to the principle of consent of the governed.

I believe a healthy liberal society of humans will usually tend towards some form of democracy, egalitarianism, and (traditional) social justice, but these are all secondary to the more foundational kind of thing I'm getting...

(See More – 949 more words)

6jenn2h

i've been working my way through the penguin great ideas series of essays at a pace of about one a week, and i've never been more of a supreme respecter for bedrock enlightenment and classical liberal principles - these guys make such passionate and intelligent arguments for them! i wonder if some part of this fading support is just that reading a lot of these thinkers used to be standard in a high school and university education (for the elite classes at least) and this is no longer the case; people might not really know why these principles are valuable any more, just that they're fashionable. i think this is unsurprising in retrospect; reading js mill on freedom of speech is what locked that in for me as a sacred value, way back in my early 20s. ...wait, i just re-derived the "this is why classical liberal arts education is important" argument, didn't i 😅

3DirectedEvolution6h

Most of the current debates about liberalism are debates about how to trade off between competing liberal priorities. I would regard these debates about exceptions to free speech - whether any are tolerated, and which ones - as debates within a common liberal framework. Typically, proponents of each site, all of whom are taking one liberal view or another, cast their opponents as illiberal (in the theory sense, not the American “progressive-vibe” sense). Opponents reject this label because they genuinely don’t perceive themselves that way. I think the whole debate would be better if we recognized that there are exist high-stakes tradeoffs between competing liberal priorities, and that it’s these competing visions of liberalism that are at the heart of contemporary political discourse in America.

lesswronguser12314m10

I only partially agree, I wouldn't be surprised if "free speech" is now on the road to suffering the same fate as the word "democracy" —china calls itself a democracy,they too have the word "free speech" in their constitution . I think trump's admin definition and aspiration for free speech— the legal animosity towards media, academics— is not what past US liberals would recognise as such and is departure from that tradition. What use is free speech if your critics are indirectly being suppressed? Even authoritarian governments give citizens enough "... (read more)

2sunwillrise10h

Funnily enough, I was also thinking about that exact SSC post when I was writing my comment. I do think I have a different perspective on this matter from yours, however.

Linch's Shortform

Linch

Linch17m20

I'd like to finetune or (maybe more realistically) prompt engineer a frontier LLM imitate me. Ideally not just stylistically but reason like me, drop anecodtes like me, etc, so it performs at like my 20th percentile of usefulness/insightfulness etc.

Is there a standard setup for this?

Examples of use cases include receive an email and send^[1] a reply that sounds like me (rather than a generic email), read Google Docs or EA Forum posts and give relevant comments/replies, etc

More concretely, things I do that I think current generation LLMs are in th... (read more)

The best simple argument for Pausing AI?

Gary Marcus

Not saying we should pause AI, but consider the following argument:

Alignment without the capacity to follow rules is hopeless. You can’t possibly follow laws like Asimov’s Laws (or better alternatives to them) if you can’t reliably learn to abide by simple constraints like the rules of chess.
LLMs can’t reliably follow rules. As discussed in Marcus on AI yesterday, per data from Mathieu Acher, even reasoning models like o3 in fact empirically struggle with the rules of chess. And they do this even though they can explicit explain those rules (see same article). The Apple “thinking” paper, which I have discussed extensively in 3 recent articles in my Substack, gives another example, where an LLM can’t play Tower of Hanoi with 9 pegs. (This is not a token-related

...

(See More – 76 more words)

WillPetillo18m10

The basic contention here seems to be that the biggest dangers of LLMs is not from the systems themselves, but from the overreliance, excessive trust, etc. that societies and institutions put on them. Another is that "hyping LLMs"--which I assume includes folks here expressing concerns that AI will go rogue and take over the world--increases perceptions of AI's abilities, which feeds into this overreliance. A conclusion is that promoting "x-risk" as a reason for pausing AI will have the unintended side effect of increasing (catastrophic, but no... (read more)

16Cole Wyeth4h

Welcome to lesswrong! I’m glad you’ve decided to join the conversation here. A problem with this argument is that it doesn’t prove we should pause AI, only that we should avoid deploying AI in high impact (e.g. military) applications. Insofar as LLMs can’t follow rules, the argument seems to indicate that we should continue to develop the technology until it can. Personally, I’m concerned about the type of AI system which can follow rules, but is not intrinsically motivated to follow our moral rules. Whether LLMs will reach that threshold is not clear to me (see https://www.lesswrong.com/posts/vvgND6aLjuDR6QzDF/my-model-of-what-is-going-on-with-llms) but this argument seems to cut against my actual concerns.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Nina Panickssery's Shortform

Nina Panickssery

Ω 46mo

2evhub2h

Would you be convinced if you talked to the ems a bunch and they reported normal, happy, fun lives? (Assuming nothing nefarious happened in terms of e.g. modifying their brains to report that.) I think I would find that very convincing. If you wouldn't find that convincing, what would you be worried was missing?

Nina Panickssery36m20

I would find that reasonably convincing, yes (especially because my prior is already that true ems would not have a tendency to report their experiences in a different way from us).

2Nina Panickssery12h

I think a big reason why uploads may be much worse than regular life is not that the brain scan will be not good enough but that they won’t be able to interact with the real world like you can as a physical human. Edit: I guess with sufficiently good robotics the ems would be able to interact with the same physical world as us in which case I would be much less worried.

2dr_s9h

I'd say even simply a simulated physical environment could be good enough to be indistinguishable. As Morpheus put it: Of course, that would require insane amounts of compute, but so would a brain upload in the first place anyway.

Summary of John Halstead's Book-Length Report on Existential Risks From Climate Change

Bentham's Bulldog

1 Introduction

(Crossposted from my blog--formatting is better there).

Very large numbers of people seem to think that climate change is likely to end the world. Biden and Harris both called it an existential threat. AOC warned a few years ago that “the world is going to end in 12 years if we don’t address climate change.” Thunberg once approvingly cited a supposed “top climate scientist” making the claim that “climate change will wipe out humanity unless we stop using fossil fuels over the next five years.” Around half of Americans think that climate change will destroy the planet (though despite this, most don’t consider it a top political issue, which means that a sizeable portion of the population thinks climate change will destroy the planet, but doesn’t think...

(Continue Reading – 6198 more words)

MichaelDickens41m20

Do you think you can learn something useful about existential risk from reading the IPCC report?

FWIW I only briefly looked at the latest report but from what I saw, it seemed hard to learn anything about existential risk from it, except for some obvious things like "humans will not go extinct in the median outcome". I didn't see any direct references to human extinction in the report, nor any references to runaway warming.

2MichaelDickens1h

How do you do that? I've spent several hours researching the topic and I'm still not convinced, but I think there's a lot I'm still missing, too. My current thinking is 1. Existential risk from climate change is not greater than 1%, because if it were, climate models would show a noticeable probability of extinction-level outcomes. 2. But I can't confidently say that existential risk is less than 0.1% because the assumptions of climate models may break down when you get into tail outcomes, and our understanding of climate science isn't robust enough to strongly rule out those tail outcomes.

The Contemporary Dissident Right

Jul 3rdWaterloo

jenn

Meet inside The Shops at Waterloo Town Square - we will congregate in the indoor seating area next to the Your Independent Grocer with the trees sticking out in the middle of the benches (pic) at 7:00 pm for 15 minutes, and then head over to my nearby apartment's amenity room. If you've been around a few times, feel free to meet up at the front door of the apartment at 7:30 instead.

Sorry Guys I'm Gatekeeping This One

This topic benefits an unusual amount from the attendees being familiar and comfortable with each other enough to engage in earnest dialogue. For that reason, attendance will be restricted to people with the "irregular" and "regular" roles in the discord, and others may be turned away at the door. Apologies...

(See More – 200 more words)

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

Popular Comments

Recent Discussion

1 Introduction

Sorry Guys I'm Gatekeeping This One

1. Introduction