Simon Goldstein

Replying toCould space debris block access to outer space?

Could space debris block access to outer space?

1. In my opinion one of the likeliest motivations for deliberate debris would be as part of an escalation ladder in the early stages of WW3. Whichever player has weaker satellite intelligence / capabilities would have an incentive to trigger a cascade in order to destroy the advantage of their opponent. The point effectively is that space conflict is very strongly offense dominant because of debris cascades, and we know that in general offense dominant dynamics tend to be very unstable.

2. Related to your discussion of totipotence, another dynamic I could imagine in the future is MAD dynamics between a moon colony and earth, where each side has the capacity to create... (read more)

Replying toWill AI and Humanity Go to War?

Simon Goldstein1y

Will AI and Humanity Go to War?

The issue of unified AI parties is discussed but not resolved in section 2.2. There, I discuss some of the paths AIs may take to begin engaging in collective decision making. In addition, I flag that the key assumption is that one AI or multiple AIs acting collectively accumulate enough power to engage in strategic competition with human states.

Will AI and Humanity Go to War?

Simon Goldstein

[This post is the introduction to my full paper, available here https://philpapers.org/rec/GOLWAA. This post was partially inspired by a LW comment thread between @Matthew Barnett and @Wei Dai.]

Abstract. This paper offers the first careful analysis of the possibility that AI and humanity will go to war. The paper focuses on the case of artificial general intelligence, AI with broadly human capabilities. The paper uses a bargaining model of war to apply standard causes of war to the special case of AI/human conflict. The paper argues that information failures and commitment problems are especially likely in AI/human conflict. Information failures would be driven by the difficulty of measuring AI capabilities, by the uninterpretability of AI... (read 1517 more words →)

Simon Goldstein1y

I think there's a steady stream of philosophy getting interested in various questions in metaphilosophy; metaethics is just the most salient to me. One example is the recent trend towards conceptual engineering (https://philpapers.org/browse/conceptual-engineering). Metametaphysics has also gotten a lot of attention in the last 10-20 years https://www.oxfordbibliographies.com/display/document/obo-9780195396577/obo-9780195396577-0217.xml. There is also some recent work in metaepistemology, but maybe less so because the debates tend to recapitulate previous work in metaethics https://plato.stanford.edu/entries/metaepistemology/.

Sorry for being unclear, I meant that calling for a pause seems useless because it won't happen. I think calling for the pause has opportunity cost because of limited attention and limited signalling value; reputation can only be used so many times; better to channel pressure towards asks that could plausibly get done.

Simon Goldstein1y

Great questions. Sadly, I don't have any really good answers for you.

I don't know of specific cases, but for example I think it is quite common for people to start studying meta-ethics because of frustration at finding answers to questions in normative ethics.
I do not, except for the end of Superintelligence
Many of the philosophers I know who work on AI safety would love for there to be an AI pause, in part because they think alignment is very difficult. But I don't know if any of us have explicitly called for an AI pause, in part because it seems useless, but may have opportunity cost.
I think few of my friends in philosophy

... (read more)

Simon Goldstein1y

I think most academic philosophers take the difficult of philosophy quite seriously. Metaphilosophy is a flourishing subfield of philosophy; you can find recent papers on the topic here https://philpapers.org/browse/metaphilosophy. There is also a growing group of academic philosophers working on AI safety and alignment; you can find some recent work here https://link.springer.com/collections/cadgidecih. I think that sometimes the tone of specific papers sounds confident; but that is more stylistic convention than a reflection of the underlying credences. Finally, I think that uncertainty / decision theory is a persistent theme in recent philosophical work on AI safety and other issues in philosophy of AI; see for example this paper, which is quite sensitive to issues about chances of welfare https://link.springer.com/article/10.1007/s43681-023-00379-1.

Replying toAI Rights for Human Safety

Simon Goldstein2y

AI Rights for Human Safety

Good question, Seth. We begin to analyse this question in section II.b.i of the paper, 'Human labor in an AGI world', where we consider whether AGIs will have a long-term interest in trading with humans. We suggest that key questions will be whether humans can retain either an absolute or comparative advantage in the production of some goods. We also point to some recent economics papers that address this question. One relevant factor for example is cost disease: as manufacturing became more productive in the 20th century, the total share of GDP devoted to manufacturing fell: non-automatable tasks can counterintuitively make up a larger share of GDP as automatable tasks become more productive, because the price of automatable goods will fall.

AI Rights for Human Safety

Simon Goldstein

Just wanted to share a new paper on AI rights, co-authored with Peter Salib, that members of this community might be interested in. Here's the abstract:

AI companies are racing to create artificial general intelligence, or “AGI.” If they succeed, the result will be human-level AI systems that can independently pursue high-level goals by formulating and executing long-term plans in the real world. Leading AI researchers agree that some of these systems will likely be “misaligned”–pursuing goals that humans do not desire. This goal mismatch will put misaligned AIs and humans into strategic competition with one another. As with present-day strategic competition between nations with incompatible goals, the result could be violent and... (read 238 more words →)

[Linkpost] A Case for AI Consciousness

cdkg

cdkg, Simon Goldstein

Just wanted to share a new paper on AI consciousness with Simon Goldstein that members of this community might be interested in. Here's the abstract:

It is generally assumed that existing artificial systems are not phenomenally conscious, and that the construction of phenomenally conscious artificial systems would require significant technological progress if it is possible at all. We challenge this assumption by arguing that if Global Workspace Theory (GWT) — a leading scientific theory of phenomenal consciousness — is correct, then instances of one widely implemented AI architecture, the artificial language agent, might easily be made phenomenally conscious if they are not already. Along the way, we articulate an explicit methodology for thinking about how to apply scientific theories of consciousness to artificial systems and employ this methodology to arrive at a set of necessary and sufficient conditions for phenomenal consciousness according to GWT.

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Simon Goldstein

Simon Goldstein, Peter S. Park

By Peter S. Park, Simon Goldstein, Aidan O’Gara, Michael Chen, and Dan Hendrycks

[This post summarizes our new report on AI deception, available here]

Abstract: This paper argues that a range of current AI systems have learned how to deceive humans. We define deception as the systematic inducement of false beliefs in the pursuit of some outcome other than the truth. We first survey empirical examples of AI deception, discussing both special-use AI systems (including Meta's CICERO) built for specific competitive situations, and general-purpose AI systems (such as large language models). Next, we detail several risks from AI deception, such as fraud, election tampering, and losing control of AI systems. Finally, we outline several potential... (read 2806 more words →)

Replying toSafety-First Agents/Architectures Are a Promising Path to Safe AGI

Simon Goldstein3y*

Safety-First Agents/Architectures Are a Promising Path to Safe AGI

Thanks Brendon, I agree with a lot of this! I do think there's a big open question about how capable autoGPT-like systems will end up being compared to more straightforward RL approaches. It could turn out that systems with a clear cognitive architecture just don't work that well, even though they are safer

Replying toThoughts on sharing information about language model capabilities

Simon Goldstein3y

Thoughts on sharing information about language model capabilities

Thanks for the thoughtful post, lots of important points here. For what it’s worth, here is a recent post where I’ve argued in detail (along with Cameron Domenico Kirk-Giannini) that language model agents are a particularly safe route to agi: https://www.alignmentforum.org/posts/8hf5hNksjn78CouKR/language-agents-reduce-the-risk-of-existential-catastrophe

Replying toShutdown-Seeking AI

Simon Goldstein3y

Shutdown-Seeking AI

I really liked your post! I linked to it somewhere else in the comment thread

Replying toShutdown-Seeking AI

Simon Goldstein3y

Shutdown-Seeking AI

I think one key point you're making is that if AI products have a radically different architecture than human agents, it could be very hard to align them / make them safe. Fortunately, I think that recent research on language agents suggests that it may be possible to design AI products that have a similar cognitive architecture to humans, with belief/desire folk psychology and a concept of self. In that case, it will make sense to think about what desires to give them, and I think shutdown-goals could be quite useful during development to lower the chance of bad outcomes. If the resulting AIs have a similar psychology to our own, then I expect them to worry about the same safety/alignment problems as we worry about when deciding to make a successor. This article explains in detail why we should expect AIs to avoid self-improvement / unchecked successors.

Shutdown-Seeking AI

Simon Goldstein

This is a draft written by Simon Goldstein, associate professor at the Dianoia Institute of Philosophy at ACU, and Pamela Robinson, postdoctoral research fellow at the Australian National University, as part of a series of papers for the Center for AI Safety Philosophy Fellowship's midpoint.

Abstract: We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al 2015, that... (read 4383 more words →)

Language Agents Reduce the Risk of Existential Catastrophe

cdkg

cdkg, Simon Goldstein

This post was written by Simon Goldstein, associate professor at the Dianoia Institute of Philosophy at ACU, and Cameron Domenico Kirk-Giannini, assistant professor at Rutgers University, for submission to the Open Philanthropy AI Worldviews Contest. Both authors are currently Philosophy Fellows at the Center for AI Safety.

Abstract: Recent advances in natural language processing have given rise to a new kind of AI architecture: the language agent. By repeatedly calling an LLM to perform a variety of cognitive tasks, language agents are able to function autonomously to pursue goals specified in natural language and stored in a human-readable format. Because of their architecture, language agents exhibit behavior that is predictable according to the laws... (read 7574 more words →)

The Polarity Problem [Draft]

Dan H

Dan H, cdkg, Simon Goldstein

This is a draft written by Cameron Domenico Kirk-Giannini, assistant professor at Rutgers University, and Simon Goldstein, associate professor at the Dianoia Institute of Philosophy at ACU, as part of a series of papers for the Center for AI Safety Philosophy Fellowship's midpoint. Dan helped post to the Alignment Forum. This draft is meant to solicit feedback. The authors would especially welcome pointers to work related to the topic which they have not cited.

Abstract:

If it is possible to construct artificial superintelligences, it is likely that they will be extremely powerful. A natural question is how many such superintelligences to expect. Will the future be shaped by a single superintelligence (a unipolar outcome),... (read 13004 more words →)

Aggregating Utilities for Corrigible AI [Feedback Draft]

Dan H

Dan H, Simon Goldstein

This is a draft written by Simon Goldstein, associate professor at the Dianoia Institute of Philosophy at ACU, as part of a series of papers for the Center for AI Safety Philosophy Fellowship. Dan helped post to the Alignment Forum. This draft is meant to solicit feedback.

PDF of this draft: https://www.dropbox.com/s/a85oip71jsfxfk7/Corrigibility_shared.pdf?dl=0

Abstract: An AI is corrigible if it lets humans change its goals. This post argues that the utility aggregation framework from Pettigrew 2019 is a promising approach to designing corrigible AIs. Utility aggregators do not simply maximize their current utility function. Instead, they can change their utility function, in order to maximize expected satisfaction across present and future utility functions. I also... (read 6334 more words →)

LESSWRONG
LW

LESSWRONG
LW

AI Rights for Human Safety

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Shutdown-Seeking AI

Language Agents Reduce the Risk of Existential Catastrophe

Simon Goldstein

Will AI and Humanity Go to War?

AI Rights for Human Safety

[Linkpost] A Case for AI Consciousness

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Shutdown-Seeking AI

Language Agents Reduce the Risk of Existential Catastrophe

The Polarity Problem [Draft]

Simon Goldstein

AI Rights for Human Safety

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Shutdown-Seeking AI

Language Agents Reduce the Risk of Existential Catastrophe

Simon Goldstein

Will AI and Humanity Go to War?

AI Rights for Human Safety

[Linkpost] A Case for AI Consciousness

AI Deception: A Survey of Examples, Risks, and Potential Solutions

Shutdown-Seeking AI

Language Agents Reduce the Risk of Existential Catastrophe

The Polarity Problem [Draft]