I think there's a steady stream of philosophy getting interested in various questions in metaphilosophy; metaethics is just the most salient to me. One example is the recent trend towards conceptual engineering (https://philpapers.org/browse/conceptual-engineering). Metametaphysics has also gotten a lot of attention in the last 10-20 years https://www.oxfordbibliographies.com/display/document/obo-9780195396577/obo-9780195396577-0217.xml. There is also some recent work in metaepistemology, but maybe less so because the debates tend to recapitulate previous work in metaethics https://plato.stanford.edu/entries/metaepistemology/.
Sorry for being unclear, I meant that calling for a pause seems useless because it won't happen. I think calling for the pause has opportunity cost because of limited attention and limited signalling value; reputation can only be used so many times; better to channel pressure towards asks that could plausibly get done.
Great questions. Sadly, I don't have any really good answers for you.
I think most academic philosophers take the difficult of philosophy quite seriously. Metaphilosophy is a flourishing subfield of philosophy; you can find recent papers on the topic here https://philpapers.org/browse/metaphilosophy. There is also a growing group of academic philosophers working on AI safety and alignment; you can find some recent work here https://link.springer.com/collections/cadgidecih. I think that sometimes the tone of specific papers sounds confident; but that is more stylistic convention than a reflection of the underlying credences. Finally, I think that uncertainty / decision theory is a persistent theme in recent philosophical work on AI safety and other issues in philosophy of AI; see for example this paper, which is quite sensitive to issues about chances of welfare https://link.springer.com/article/10.1007/s43681-023-00379-1.
Good question, Seth. We begin to analyse this question in section II.b.i of the paper, 'Human labor in an AGI world', where we consider whether AGIs will have a long-term interest in trading with humans. We suggest that key questions will be whether humans can retain either an absolute or comparative advantage in the production of some goods. We also point to some recent economics papers that address this question. One relevant factor for example is cost disease: as manufacturing became more productive in the 20th century, the total share of GDP devoted to manufacturing fell: non-automatable tasks can counterintuitively make up a larger share of GDP as automatable tasks become more productive, because the price of automatable goods will fall.
Thanks Brendon, I agree with a lot of this! I do think there's a big open question about how capable autoGPT-like systems will end up being compared to more straightforward RL approaches. It could turn out that systems with a clear cognitive architecture just don't work that well, even though they are safer
Thanks for the thoughtful post, lots of important points here. For what it’s worth, here is a recent post where I’ve argued in detail (along with Cameron Domenico Kirk-Giannini) that language model agents are a particularly safe route to agi: https://www.alignmentforum.org/posts/8hf5hNksjn78CouKR/language-agents-reduce-the-risk-of-existential-catastrophe
I really liked your post! I linked to it somewhere else in the comment thread
I think one key point you're making is that if AI products have a radically different architecture than human agents, it could be very hard to align them / make them safe. Fortunately, I think that recent research on language agents suggests that it may be possible to design AI products that have a similar cognitive architecture to humans, with belief/desire folk psychology and a concept of self. In that case, it will make sense to think about what desires to give them, and I think shutdown-goals could be quite useful during development to lower the chance of bad outcomes. If the resulting AIs have a similar psychology to our own, then I expect them to worry about the same safety/alignment problems as we worry about when deciding to make a successor. This article explains in detail why we should expect AIs to avoid self-improvement / unchecked successors.
Thanks for taking the time to think through our paper! Here are some reactions:
-'This has been proposed before (as their citations indicate)'
Our impression is that positively shutdown-seeking agents aren't explored in great detail by Soares et al 2015; instead, they are briefly considered and then dismissed in favor of shutdown-indifferent agents (which then have their own problems), for example because of the concerns about manipulation that we try to address. Is there other work you can point us to that proposes positively shutdown-seeking agents?
-' Saying, 'well, maybe we can train it in a simple gridworld with a shutdown button?' doesn't even begin to address the problem of how to make current models suicidal in a useful way.'
True, I think your example of AutoGPT is important here. In other recent research, I've argued that new 'language agents' like AutoGPT (or better, generative agents, or Voyager, or SPRING) are much safer than things like Gato, because these kinds of agents optimize for a goal without being trained using a reward function. Instead, their goal is stated in English. Here, shutdown-seeking may have added value: 'your goal is to be shut down' is relatively well-defined, compared 'promote human flourishing' (but the devil is in the details as usual), and generative agents can literally be given a goal like that in English. Anyways, I'd be curious to hear what you think of the linked post.
-'What would it mean for an AutoGPT swarm of invocations to 'shut off' 'itself', exactly?' I feel better about the safety prospects for generative agents, compared to AutoGPT. In the case of generative agents, shut off could be operationalized as no longer adding new information to the "memory stream".
-'If a model is quantized, sparsified, averaged with another, soft-prompted/lightweight-finetuned, fully-finetuned, ensembled etc - are any of those 'itself'?' I think that behaving like an agent with >= human-level general intelligence will involve having a representation of what counts as 'yourself', and then shutdown-seeking can maybe be defined relative to shutting 'yourself' down. Agreed that present LLMs probably don't have that kind of awareness.
-' It's not very helpful to have suicidal models which predictably emit non-suicidal versions of themselves in passing.' at least when an AGI is creating a successor, I expect them to worry about the same alignment problems that we are, and so would want to make their successor shutdown-seeking for the same reasons that we would want AGI to be shutdown-seeking.
The issue of unified AI parties is discussed but not resolved in section 2.2. There, I discuss some of the paths AIs may take to begin engaging in collective decision making. In addition, I flag that the key assumption is that one AI or multiple AIs acting collectively accumulate enough power to engage in strategic competition with human states.