Bounty for Evidence on Some of Palisade Research's Beliefs

benwr; Jeffrey Ladish

46 Bounty for Evidence on Some of Palisade Research's Beliefs

by benwr, Jeffrey Ladish

23rd Sep 2024

2 min read

46

(Cross-posted from the Bountied Rationality Facebook group)

EDIT: Bounty Expired

Thanks everyone for thoughts so far! I do want to emphasize that we're actually highly interested in collecting even the most "obvious" evidence in favor of or against these ideas. In fact, in many ways we're more interested in the obvious evidence than in reframes or conceptual problems in the ideas here; of course we want to be updating our beliefs, but we also want to get a better understanding of the existing state of concrete evidence on these questions. This is partly because we consider it part of our mission to expand the amount and quality of relevant evidence on these beliefs, and are trying to ensure that we're aware of existing work.

Here is a Google Doc that lists 21 important beliefs that Palisade Research has about AI. For each belief, we're looking for the strongest evidence that exists in favor of that idea, and the strongest evidence that exists against it. We'll award at least $20 for the best evidence in favor, and at least $20 for the best evidence against each idea. We'll use our discretion for what we consider the "best" evidence, but the kind of thing we're looking for includes empirical research or convincing arguments. Empirical research, or arguments clearly backed by empirical observations, will be preferred over pure arguments.

To submit a piece of evidence, you can either comment here, making it clear which specific idea(s) you're giving evidence for, or you can add a comment to the linked document. A piece of evidence should include a link, should be clearly associated with a specific idea, and should include a short sentence about how the evidence applies to that idea.

For example, you might write a comment on "a strategic AI system will aim to appear convincingly aligned with human goals, or incapable of harming humans, whether it really is or not.", that includes a link to a paper on AI Sandbagging (e.g. "AI Sandbagging: Language Models can Strategically Underperform on Evaluations"), with a sentence like "This work on AI sandbagging shows that existing AI systems already strategically underperform when they can tell they are being evaluated."

Note: Only responses in the above format will be considered for bounties, though of course you can respond however you want in the LessWrong comments.

In addition to the base $20 bonus for the best evidence on each point, we'll also give bonuses of $50 for pieces of evidence that we think are especially strong. We'll give at least 4 of these bonuses, and up to 20 depending on our subjective sense of the quality of submissions.

So in total, we're offering at least 21 * 2 * $20 + 4 * $50 = $1040, and up to 21 * 2 * $20 + 20 * $50 = $1840 in bounties.

Max bounty: $500 per person. All bounties paid via PayPal. Tentative deadline is October 1.

Bounties & Prizes (active)Existential RiskAIWorld Modeling

Frontpage

46

Bounty for Evidence on Some of Palisade Research's Beliefs

New Comment

4 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:32 PM

[-]Milan W2mo93

Against 1.c. Humans need at least some resources that would clearly put us in life-or-death conflict with powerful misaligned AI agents in the long run.: The doc says that "Any sufficiently advanced set of agents will monopolize all energy sources, including solar energy, fossil fuels, and geothermal energy, leaving none for others" There's two issues with that statement:

First, the qualifier "sufficiently advanced" is doing a lot of work. Future AI systems, even if superintelligent, will be subject to physical constraints and economic concepts such as opportunity costs. The most efficient route for an unaligned ASI or set of ASIs for expanding their energy capture may well sidestep current human energy sources, at least for a while. We don't fight ants to capture their resources.
Second: it assumes advanced agents will want to monopolize all energy sources. While instrumental convergence is true, partial misalignment with some degree of concern for humanity's survival and autonomy is plausible. Most people in developed countries have a preference for preserving the existence of an autonomous population of chimpanzees, and our "business-as-usual-except-ignoring-AI" world seems on track to achieve that.

Taken together, both arguments paint a picture of a future ASI mostly not taking over the resources we are currently using on Earth, mostly because it's easier to take over other resources (for instance, getting minerals from asteroids and energy from orbital solar capture). Then, it takes over the lightcone except Earth, because it cares about preserving independent-humanity-on-Earth a little. This scenario has us subset-of-humans-who-care-about-the-lightcone losing spectacularly to an ASI in a conflict over the lightcone, but not humanity being in a life-or-death-conflict with an ASI.

[-]Thomas Kwa1mo42

If the bounty isn't over, I'd likely submit several arguments tomorrow.

[-]Thomas Kwa1mo30

I wrote up about 15 arguments in this google doc.

[-]benwr1mo20

Thanks everyone for thoughts so far! I do want to emphasize that we're actually highly interested in collecting even the most "obvious" evidence in favor of or against these ideas. In fact, in many ways we're more interested in the obvious evidence than in reframes or conceptual problems in the ideas here; of course we want to be updating our beliefs, but we also want to get a better understanding of the existing state of concrete evidence on these questions. This is partly because we consider it part of our mission to expand the amount and quality of relevant evidence on these beliefs, and are trying to ensure that we're aware of existing work.

Moderation Log