ukc10014

The linked post seemed to fit better in EA Forum, but any comments (on the post itself, or on the object-level question) are welcome !

In this post, I argue that drafting a ‘Constitution for Superintelligence’ (CSI) could be a useful conceptual exercise, and I explore how existing ideas (CEV, Letters to AI, Long Reflection, and Constitutional AI) might inform such a project. Key considerations in writing such a document include:

a. How to craft a foundational document to guide ASI behavior, in ways that do not anchor purely upon human interests (e.g. relative to other entities) without a point-of-view-of-the-universe justification

b. Investigating the mechanisms by which such documents could influence the actions of current and future AI systems

c. Seeking input and collaboration from EAs/alignment people with relevant expertise and insights to further develop these concepts

The aim is to proactively but cautiously shape the trajectory of ASI development, in order to improve chances of maximally-inclusive beneficial outcomes.

ukc100142yQuick Take

Quite like this forecast from Andrew Critch on milestones on AI development, and my reactions:

The timeline he suggests, in ~10 years, we face choice 6a/b, which implies at least 3 possibilities:

A) we need society-level consensus (which might be force-backed) that humans can/should control agents (or digital entities, more generally) who are in all (economically/militarily) relevant aspects, superior to us. Assuming they fit within the moral circle as we currently conceive (@davidchalmers42 @jeffrsebo Thomas Metzinger and Nick Bostrom / Carl Shulman, have analysed this in various ways), and absent some novel claim about how AIs are different/lesser ethical beings, hard to see how this is essentially different from slavery, or animal cruelty,... (read more)

ukc10014's Shortform

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Replying toAdam Smith Meets AI Doomers

Adam Smith Meets AI Doomers

Thanks for this - a useful framing - a slightly more poetic framing would perhaps be the early writing of Nick Land (which has picked up notoriety owing to some combination of his political views and/or the people that have adopted/adapted his ideas). This article summarises Land's thoughts on capitalism and AGI.

Replying toAn Appeal to AI Superintelligence: Reasons to Preserve Humanity

An Appeal to AI Superintelligence: Reasons to Preserve Humanity

This is good, both for training data reasons, but also because it tries to present reasons to the successor (I'm not using ASI as that seems a pretty loaded/unclear term, i.e. the post could be of interest to proto-AGIs/ASIs and could be a profitable conversation with something well before AGI), acting in (our projection of) "its" "self-interest" (quoted as those are pretty anthropomorphic terms and bake in assumptions about identity).

I had a couple of comments:

the essay doesn't make a positive case for keeping us around (in the sense of 'try to preserve diversity in the biological world because 8 billion humans consciousnesses are a "good" thing in their own right'). It

... (read 421 more words →)

Replying toLLMs for Alignment Research: a safety priority?

LLMs for Alignment Research: a safety priority?

On the overall point of using LLMs-for-reasoning this (output of a team at AI Safety Camp 2023) might be interesting - it is rather broad-ranging and specifically about argumentation in logic, but maybe useful context: https://compphil.github.io/truth/

Replying toUnpicking Extinction

Unpicking Extinction

That’s really useful, thank you.

Unpicking Extinction

TL;DR

Human extinction is trending: there has been a lot of noise, mainly on X, about the apparent complacency amongst e/acc with respect to human extinction. Extinction also feels adjacent to another view (not particular to e/acc) that ‘the next step in human evolution is {AI/AGI/ASI}’. Many have pushed back robustly against the former, while the latter doesn’t seem very fleshed out. I thought it useful to, briefly, gather the various positions and summarise them, hopefully not too inaccurately, and perhaps pull out some points of convergence.

This is a starting point for my own research (on de-facto extinction via evolution). There is nothing particularly new in here: see the... (read 2869 more words →)

Replying toJoscha Bach on Synthetic Intelligence [annotated]

Joscha Bach on Synthetic Intelligence [annotated]

This is really useful, thank you - Bach's views are quite hard to capture without sitting through hours of podcasts (though he has re-started writing).

Philosophical Cyborg (Part 2)...or, The Good Successor

Philosophical Cyborg (Part 1)

This post is part of the output from AI Safety Camp 2023’s Cyborgism track, run by Nicholas Kees Dupuis - thank you to Nick, AISC organizers & funders for their support.

TL;DR

This post follows up on the cyborgism research/writing process documented in 'Upon the Philosophical Cyborg'. It attempts to analyse 2018 post by Paul Christiano about the possibility that an unaligned AI may yet be a morally-valuable entity, by its own and even by our lights. Writing this essay has involved a back-and-forth between a human author and a few different versions of GPT-3/4, followed by extensive editing, as well as human-written additions. So, while this post contains LLM-written parts,... (read 9051 more words →)

ukc10014, Roman Leventov, NicholasKees

This post is part of the output from AI Safety Camp 2023’s Cyborgism track, run by Nicholas Kees Dupuis - thank you to AISC organizers & funders for their support. Thank you for comments from Peter Hroššo; and the helpful background of conversations about the possibilities (and limits) of LLM-assisted cognition with Julia Persson, Kyle McDonnell, and Daniel Clothiaux.

Epistemic status: this is not a rigorous or quantified study, and much of this might be obvious to people experienced with LLMs, philosophy, or both. It is mostly a writeup of my (ukc10014) investigations during AISC and is a companion to The Compleat Cybornaut.

TL;DR

This post documents research into using LLMs for domains such as... (read 3789 more words →)

Replying toCollective Identity

Collective Identity

In response to Roman’s very good points (i have only for now skimmed the linked articles); these are my thoughts:

I agree that human values are very hard to aggregate (or even to define precisely); we use politics/economy (of collectives ranging from the family up to the nation) as a way of doing that aggregation, but that is obviously a work in progress, and perhaps slipping backwards. In any case, (as Roman says) humans are (much of the time) misaligned with each other and their collectives, in ways little and large, and sometimes that is for good or bad reasons. By ‘good reason’ I mean that sometimes ‘misalignment’ might literally be that... (read more)

The Compleat Cybornaut

ukc10014, Jozdien, NicholasKees

A cluster of conceptual frameworks and research programmes have coalesced around a 2022 post by janus, which introduced language models as ‘simulators’ (of other types of AIs such as agents, oracles, or genies). One such agenda, cyborgism, was coined in a post by janus and Nicholas Kees and is being researched as part of the 2023 editions of AI Safety Camp and SERI MATS. The objective of this document is to provide an on-ramp to the topic, one that is hopefully accessible to people not hugely familiar with simulator theory or language models.

So what is cyborgism?

Cyborgism proposes to use AIs, particularly language models (i.e. generative-pretrained transformers or GPTs), in ways that exploit their (increasingly) general-purpose intelligence, while... (read 4683 more words →)

Collective Identity

NicholasKees

NicholasKees, ukc10014, Garrett Baker

Thanks to Simon Celinder, Quentin Feuillade--Montixi, Nora Ammann, Clem von Stengel, Guillaume Corlouer, Brady Pelkey and Mikhail Seleznyov for feedback on drafts. This post was written in connection with the AI Safety Camp.

Executive Summary:

This document proposes an approach to corrigibility that focuses on training generative models to function as extensions to human agency. These models would be designed to lack independent values/preferences of their own, because they would not have an individual identity; rather they would identify as part of a unified system composed of both human and AI components.

The selfless soldier: This section motivates the difference between two kinds of group centric behavior, altruism (which is based in individual identity) and

... (read 2141 more words →)

Replying toSome conceptual alignment research projects

Some conceptual alignment research projects

I've taken a crack at #4 but it is more about thinking through how 'hundreds of millions of AIs' might be deployed in a world that looks, economically and geopolitically, something like today's (i.e. the argument in the OP is for 2036 so this seems a reasonable thing to do). It is presented as a flowchart which is more succinct than my earlier longish post.

Trajectories to 2036

TL;DR

This post follows on from an earlier, longer attempt to think through Holden Karnofsky's ‘AI Could Defeat All of Us Combined’ (AIDC). It spends less time on the specifics of precisely how a mass-copied human-level AI (HLAI) would kill, enslave, or disempower humans (i.e. who would copy the HLAIs, how they might establish an 'AI headquarters', how they might form malign plans, etc.), as both Karnofsky's and my earlier post consider these. The chart lays out a variety of (geopolitically-informed) trajectories, and the text below works through them to identify the different 'flavours' of AIDC or alignment that might result.

Gloss on AIDC

In another post I summarised Holden Karnofsky's ‘AI Could Defeat... (read 3965 more words →)

Replying toAnalysing a 2036 Takeover Scenario

Analysing a 2036 Takeover Scenario

Good catch, thank you - fixed & clarified !

Analysing a 2036 Takeover Scenario

TL;DR

This post basically digs through Holden Karnofsky's ‘AI Could Defeat All of Us Combined’ (AIDC) post and tries to work through some of the claims there, in light of other work by Ajeya Cotra and Richard Ngo. It originally was a response to the latter's suggestion that someone should recast that post as a scenario, but I struggled to do that without fleshing out the argument for myself. Anyway, I try to imagine the world around 2036, when the first human-level AI shows up (per AIDC), in context of current/forecast geopolitics.

I think my (weakly-held) takeaway is that the AIDC notion of 'hundreds of millions' of AIs, nominally engaged in scientific R&D... (read 7917 more words →)

Replying to[New Feature] Support for Footnotes!