Andrew Critch lists several research areas that seem important to AI existential safety, and evaluates them for direct helpfulness, educational value, and neglect. Along the way, he argues that the main way he sees present-day technical research helping is by anticipating, legitimizing and fulfilling governance demands for AI technology that will arise later.
I appreciate you writing this, and think it was helpful. I don't have a strong take on Nate's object-level decisions here, why TurnTrout said what he said, etc. But I wanted to flag that the following seems like a huge understatement:
...The concerns about Nate's conversational style, and the impacts of the way he comports himself, aren't nonsense. Some people in fact manage to never bruise another person, conversationally, the way Nate has bruised more than one person.
But they're objectively overblown, and they're objectively overblown in exactly the wa
Acknowledgments: The core scheme here was suggested by Prof. Gabriel Weil.
There has been growing interest in the dealmaking agenda: humans make deals with AIs (misaligned but lacking decisive strategic advantage) where they promise to be safe and useful for some fixed term (e.g. 2026-2028) and we promise to compensate them in the future, conditional on (i) verifying the AIs were compliant, and (ii) verifying the AIs would spend the resources in an acceptable way.[1]
I think the dealmaking agenda breaks down into two main subproblems:
There are other issues, but when I've discussed dealmaking with people, (1) and (2) are the most common issues raised. See footnote for some other issues in...
Would you agree that what we have now is nothing like that?
Yes.
This is a token of small non-appreciation for how unnecessary (imo only!) long this post is for a relatively simple concept. I found your bag of tricks helpful, but didnt enjoy how it dragged on and on and on
By 'bedrock liberal principles', I mean things like: respect for individual liberties and property rights, respect for the rule of law and equal treatment under the law, and a widespread / consensus belief that authority and legitimacy of the state derive from the consent of the governed.
Note that "consent of the governed" is distinct from simple democracy / majoritarianism: a 90% majority that uses state power to take all the stuff of the other 10% might be democratic but isn't particularly liberal or legitimate according to the principle of consent of the governed.
I believe a healthy liberal society of humans will usually tend towards some form of democracy, egalitarianism, and (traditional) social justice, but these are all secondary to the more foundational kind of thing I'm getting...
I only partially agree, I wouldn't be surprised if "free speech" is now on the road to suffering the same fate as the word "democracy" —china calls itself a democracy,they too have the word "free speech" in their constitution . I think trump's admin definition and aspiration for free speech— the legal animosity towards media, academics— is not what past US liberals would recognise as such and is departure from that tradition. What use is free speech if your critics are indirectly being suppressed? Even authoritarian governments give citizens enough "...
I'd like to finetune or (maybe more realistically) prompt engineer a frontier LLM imitate me. Ideally not just stylistically but reason like me, drop anecodtes like me, etc, so it performs at like my 20th percentile of usefulness/insightfulness etc.
Is there a standard setup for this?
Examples of use cases include receive an email and send[1] a reply that sounds like me (rather than a generic email), read Google Docs or EA Forum posts and give relevant comments/replies, etc
More concretely, things I do that I think current generation LLMs are in th...
Not saying we should pause AI, but consider the following argument:
The basic contention here seems to be that the biggest dangers of LLMs is not from the systems themselves, but from the overreliance, excessive trust, etc. that societies and institutions put on them. Another is that "hyping LLMs"--which I assume includes folks here expressing concerns that AI will go rogue and take over the world--increases perceptions of AI's abilities, which feeds into this overreliance. A conclusion is that promoting "x-risk" as a reason for pausing AI will have the unintended side effect of increasing (catastrophic, but no...
I would find that reasonably convincing, yes (especially because my prior is already that true ems would not have a tendency to report their experiences in a different way from us).
(Crossposted from my blog--formatting is better there).
Very large numbers of people seem to think that climate change is likely to end the world. Biden and Harris both called it an existential threat. AOC warned a few years ago that “the world is going to end in 12 years if we don’t address climate change.” Thunberg once approvingly cited a supposed “top climate scientist” making the claim that “climate change will wipe out humanity unless we stop using fossil fuels over the next five years.” Around half of Americans think that climate change will destroy the planet (though despite this, most don’t consider it a top political issue, which means that a sizeable portion of the population thinks climate change will destroy the planet, but doesn’t think...
Do you think you can learn something useful about existential risk from reading the IPCC report?
FWIW I only briefly looked at the latest report but from what I saw, it seemed hard to learn anything about existential risk from it, except for some obvious things like "humans will not go extinct in the median outcome". I didn't see any direct references to human extinction in the report, nor any references to runaway warming.
Meet inside The Shops at Waterloo Town Square - we will congregate in the indoor seating area next to the Your Independent Grocer with the trees sticking out in the middle of the benches (pic) at 7:00 pm for 15 minutes, and then head over to my nearby apartment's amenity room. If you've been around a few times, feel free to meet up at the front door of the apartment at 7:30 instead.
This topic benefits an unusual amount from the attendees being familiar and comfortable with each other enough to engage in earnest dialogue. For that reason, attendance will be restricted to people with the "irregular" and "regular" roles in the discord, and others may be turned away at the door. Apologies...
(Audio version here, or search for "Joe Carlsmith Audio" on your podcast app.)
“There comes a moment when the children who have been playing at burglars hush suddenly: was that a real footstep in the hall?”
- C.S. Lewis
Sometimes, my thinking feels more “real” to me; and sometimes, it feels more “fake.” I want to do the real version, so I want to understand this spectrum better. This essay offers some reflections.
I give a bunch of examples of this “fake vs. real” spectrum below -- in AI, philosophy, competitive debate, everyday life, and religion. My current sense is that it brings together a cluster of related dimensions, namely: