I think if Trump torpedoes the American economy and America's international reputation, that could be a very good thing from the perspective of AI x-risk.
Torpedoes our economy: Could "pop the AI bubble" if the US economy crashes or just becomes less attractive as an investment destination.
Torpedoes our international reputation: If Europeans start believing that American AI companies are on a path to omnicide (arguably a fairly accurate belief), they might put pressure on ASML to cut off the supply of chips to American AI companies.
If AI is aligned, you seem to expect that to be some kind of alignment to the moral good, which "genuinely has humanity's interests at heart", so much so that it redistributes all wealth. This is possible - but it's very hard, not what current mainstream alignment research is working on, and companies have no reason to switch to this new paradigm.
Eliezer Yudkowsky has repeatedly stated he does not think "moral good" is the hard part of alignment. He thinks the hard part is getting the AI to do anything at all without subverting the creator's intent somehow.
Eliezer: I mean, I wouldn't say that it's difficult to align an AI with our basic notions of morality. I'd say that it's difficult to align an AI on a task like “take this strawberry, and make me another strawberry that's identical to this strawberry down to the cellular level, but not necessarily the atomic level”. So it looks the same under like a standard optical microscope, but maybe not a scanning electron microscope. Do that. Don't destroy the world as a side effect.
Now, this does intrinsically take a powerful AI. There's no way you can make it easy to align by making it stupid. To build something that's cellular identical to a strawberry—I mean, mostly I think the way that you do this is with very primitive nanotechnology, but we could also do it using very advanced biotechnology. And these are not technologies that we already have. So it's got to be something smart enough to develop new technology.
Never mind all the subtleties of morality. I think we don't have the technology to align an AI to the point where we can say, “Build me a copy of the strawberry and don't destroy the world.”
I often post comments criticizing or disagreeing with Eliezer, but I think he is probably correct on this particular point.
To elaborate on this, I think one of the arguments for verbal consent was along these lines: "Some women panic freeze and become silent when they're uncomfortable, they aren't capable of saying no in that state."
I think it's worth considering the needs of such women. However, I suspect they are a minority of the population, and it seems like common sense to stop if your partner is unresponsive. I feel we may have prematurely decided that verbal consent is the best way to address this situation. Maybe a better approach, especially with subdued women, would be asking something like: "Can I trust you to let me know if you're becoming uncomfortable?", then adjust going forward depending on her answer. This approach doesn't put her on the spot in the same way the "big consent" approach does.
Another thing. I think consent discussion is hotter when framed in terms of desire. Either "Do you want to make out?", or "I want to make out with you" (and waiting for her verbal reply/nonverbal makeout initiation). Asking "May I make out with you?" puts the man in a subservient petitioner position which is not as erotic in my view. (Of course, if that's what you're into, then go ahead.)
I've also heard it claimed that the "big consent" approach can actually be a bit of a masculine power play. I think the idea is that a typical man will say something like: "Would you like to come up and see my etchings?" But if instead you say: "Would you like to come up and have sex?" That's hot because you're being direct, assertive, virile, and demonstrating a willingness to violate a (mild) taboo. This approach seems best to me if you feel fairly certain she will either assent in some manner ("maybe let's see"), or be comfortable rejecting you. It seems like a worse fit for subdued or anxious women.
Very interested to hear feedback on all my thoughts in this thread, especially from women.
I wonder how ladybrain/hornybrain corresponds to other classic brain dichotomies like near/far, system 1/system 2, right/left, etc.
Honestly the idea of trying to activate hornybrain and suppress ladybrain feels a tad manipulative or ethically dubious to me, would be interested to hear how women besides Aella are thinking about this.
Maybe it would be useful to discuss a variation on the claim from the OP? Something like: "Given the choice between a smooth, slow, gradual escalation, where a woman feels comfortable pausing or stopping at any point in time, vs being put on the spot with a yes-or-no question, most women prefer the former."
If the objective is to minimize her discomfort, one could argue that a yes-or-no question is less than ideal. If she says "no", she might have an unhappy man on her hands. If she says "yes", changing her mind later may become awkward. Communicating that you behave in a predictable way, and can be trusted to continuously check in for discomfort, creates ongoing optionality for her.
This hypothesis could explain some of the observations in the OP, while being less vulnerable to harmful misinterpretation. It's still potentially controversial, insofar as the approach conflicts with verbal-consent absolutism which is trendy in some circles. But in terms of minimizing female discomfort, this approach might work better than verbal-consent absolutism in practice. It also accords with conventional dating wisdom to some degree (successful guys tend to be men who "make women feel safe", who are "smooth").
Another way to frame this is that you should aim for lots of micro-consents rather than one big consent. The most successful guys are said to have an incredible ability to read their partner, combining masculine leadership with emotional safety which allows her to collapse into her feminine. Perhaps best to scaffold with a lot of discussion as this becomes more intuitive?
A related idea from Mark Manson is that women deeply want to be desired, and both rape fantasies and marriage proposal fantasies are facets of this. If you're going to carry a man's child for 9 months, you'd like him to be sufficiently obsessed with you that he won't get bored with you during that time. The takeaway for guys would be to focus on the women that you truly desire most, and let her know what you like about her in a way that is suave, contextually appropriate, and forthright without being threatening, overbearing, pathetic, or unpredictable.
Pournelle's Iron Law of Bureaucracy states that in any bureaucratic organization there will be two kinds of people:
First, there will be those who are devoted to the goals of the organization. Examples are dedicated classroom teachers in an educational bureaucracy, many of the engineers and launch technicians and scientists at NASA, even some agricultural scientists and advisors in the former Soviet Union collective farming administration.
Secondly, there will be those dedicated to the organization itself. Examples are many of the administrators in the education system, many professors of education, many teachers union officials, much of the NASA headquarters staff, etc.
The Iron Law states that in every case the second group will gain and keep control of the organization. It will write the rules, and control promotions within the organization.
I'm concerned that this "law" may apply to Anthropic. People devoted to Anthropic as an organization will have more power than people devoted to the goal of creating aligned AI.
I would encourage people at Anthropic to leave a line of retreat and consider the "least convenient possible world" where alignment is too hard. What's the contingency plan for Anthropic in that scenario?
Next, devise a collective decision-making procedure for activating that contingency plan. For example, maybe the contingency plan should be activated if X% of the technical staff votes to activate it. Perhaps after having a week of discussion first? What would be the trigger to spend a week doing discussion? You can answer these questions and come up with a formal procedure.
If you had both a contingency plan and a formal means to activate it, I would feel a lot better about Anthropic as an organization.
I wonder if Google is optimizing harder for benchmarks, to try and prop up its stock price against possible deflation of an AI bubble.
It occurs to me that an AI alignment organization should create comprehensive private alignment benchmarks and start releasing the scores. They would have to be constructed in a non-traditional way so they're less vulnerable to standard goodharting. If these benchmarks become popular with AI users and AI investors, they could be a powerful way to steer AI development in a more responsible direction. By keeping them private, you could make it harder for AI companies to optimize against the benchmarks, and nudge them towards actually solving deeper alignment issues. It would also be a powerful illustration of the point that advanced AI will need to solve unforeseen/out-of-distribution alignment challenges. @Eliezer Yudkowsky
Thanks for making this list!
Having written all this down in one place, it's hard not to feel some hopelessness that all of these problems can be made legible to the relevant people, even with a maximum plausible effort.
I think that a major focus should be on prioritizing these problems based on how plausible a story you can tell for a catastrophic outcome if the problem remains unsolved, conditional on an AI that is corrigible and aligned in the ordinary sense.
I suppose coming up with such a clear catastrophe story for a problem is more or less the same thing as legibilizing it, which reinforces my point from the previous thread that a priori, it seems likely to me that illegible problems won't tend to be as important to solve.
The longer a problem has been floating around without anyone generating a clear catastrophe story for it, the greater probability we should assign that it's a "terminally illegible" problem which just won't cause a catastrophe if it's unsolved.
Maybe it would be good to track how much time has been spent attempting to come up with a clear catastrophe story for each problem, so people can get a sense of when diminishing research returns are reached for a given problem? Perhaps researchers who make attempts should leave a comment in this thread indicating how much time they spent trying to generate catastrophe stories for each problem?
Perhaps it's worth concluding on a point from a discussion between @WillPetillo and myself under the previous post, that a potentially more impactful approach (compared to trying to make illegible problems more legible), is to make key decisionmakers realize that important safety problems illegible to themselves (and even to their advisors) probably exist, therefore it's very risky to make highly consequential decisions (such as about AI development or deployment) based only on the status of legible safety problems.
I still think the best way to do this is to identify at least one problem which initially seemed esoteric and illegible, and eventually acquired a clear and compelling catastrophe story. Right now this discussion all seems rather hypothetical. From my perspective, the problems on your list seem to fall into two rough categories: legible problems which seem compelling, and super-esoteric problems like "Beyond Astronomical Waste" which don't need to be solved prior to creation of an aligned AI. Off the top of my head I haven't noticed a lot of problems moving from one category to the other by my lights? So just speaking for myself, this list hasn't personally convinced me that esoteric and illegible problems should receive much more scarce resources, although I admit I only took a quick skim.
I definitely like the directions you are exploring in and I agree they are improvements over the implicit AGI lab directed concept. That's a useful thing to keep in mind, but so is what keeps them from being final ideas.
+1
What do you think? Does that make sense at all, or maybe it seems more like a time wasting distraction? I have to admit I'm uncomfortable with the amount I have gotten stuck on the idea that championing this concept is a useful thing for me to be doing.
Glad you're self-aware about this. I would focus less on championing the concept, and more on treating it as a hypothesis about a research approach which may or may not deliver benefits. I wouldn't evangelize until you've got serious benefits to show, and show those benefits first (with the concept that delivered those benefits as more of a footnote).
Jensen Huang says AI doomer rhetoric is dissuading people from making AI investments (see also: various other coverage). This seems like a pretty good sign to me. Wonder if it would be worthwhile for activists to find + contact all of the outlets which covered this story, respond to his statements, and try to engage Huang in a public debate.
Brainstorming further related ideas, given that investment appears to be a feasible point of leverage:
The price of silver has skyrocketed in the past ~year. I understand that the supply of silver is not all that elastic, and demand is being driven in part by AI data centers. Buying physical silver could be a good way to get exposure to the AI boom (plus hedge inflation risk etc.) in a way that actually helps choke the supply of critical inputs to the AI boom, instead of adding fuel to the fire? (Disclaimer: About 10% of my portfolio is in silver.)
I think most people in this community are already wary of holding AI stocks. But from what I understand, most data center buildout is actually funded by private equity, corporate bonds, etc. Maybe someone could launch corporate bond/private equity ETFs which work to avoid investments in AI data centers? Then perhaps sell it on the basis of ESG too? Until then, could it make sense to divest US PE/corporate bonds in favor of foreign alternatives?
From a consumer perspective, how about nonprofit alternatives to products like ChatGPT? A nonprofit could sell access to open models with a consumer-friendly UI for a minimal cost, donating incidental profits to AI safety research, so people can use AI without funding AI research. If this existed, it could provide a satisfying call-to-action for AI pause activism, help with norm-shaping/coalition-creation, and build a mailing list of concerned citizens for future activism. ("Lessons for optimal philanthropists: Volunteer your time to an optimal charity. You may soon find yourself giving time and money." source) I think this could be a great way to harness ambient anti-AI sentiment among artists, authors, and other "normies". "I might use AI, I might even pay for it. But at least I'm not funding its development."