I think there's a steady stream of philosophy getting interested in various questions in metaphilosophy; metaethics is just the most salient to me. One example is the recent trend towards conceptual engineering (https://philpapers.org/browse/conceptual-engineering). Metametaphysics has also gotten a lot of attention in the last 10-20 years https://www.oxfordbibliographies.com/display/document/obo-9780195396577/obo-9780195396577-0217.xml. There is also some recent work in metaepistemology, but maybe less so because the debates tend to recapitulate pre...
Great questions. Sadly, I don't have any really good answers for you.
I think most academic philosophers take the difficult of philosophy quite seriously. Metaphilosophy is a flourishing subfield of philosophy; you can find recent papers on the topic here https://philpapers.org/browse/metaphilosophy. There is also a growing group of academic philosophers working on AI safety and alignment; you can find some recent work here https://link.springer.com/collections/cadgidecih. I think that sometimes the tone of specific papers sounds confident; but that is more stylistic convention than a reflection of the underlying crede...
Good question, Seth. We begin to analyse this question in section II.b.i of the paper, 'Human labor in an AGI world', where we consider whether AGIs will have a long-term interest in trading with humans. We suggest that key questions will be whether humans can retain either an absolute or comparative advantage in the production of some goods. We also point to some recent economics papers that address this question. One relevant factor for example is cost disease: as manufacturing became more productive in the 20th century, the total share of GDP devo...
Thanks Brendon, I agree with a lot of this! I do think there's a big open question about how capable autoGPT-like systems will end up being compared to more straightforward RL approaches. It could turn out that systems with a clear cognitive architecture just don't work that well, even though they are safer
Thanks for the thoughtful post, lots of important points here. For what it’s worth, here is a recent post where I’ve argued in detail (along with Cameron Domenico Kirk-Giannini) that language model agents are a particularly safe route to agi: https://www.alignmentforum.org/posts/8hf5hNksjn78CouKR/language-agents-reduce-the-risk-of-existential-catastrophe
I think one key point you're making is that if AI products have a radically different architecture than human agents, it could be very hard to align them / make them safe. Fortunately, I think that recent research on language agents suggests that it may be possible to design AI products that have a similar cognitive architecture to humans, with belief/desire folk psychology and a concept of self. In that case, it will make sense to think about what desires to give them, and I think shutdown-goals could be quite useful during development to lower the chance...
Thanks for taking the time to think through our paper! Here are some reactions:
-'This has been proposed before (as their citations indicate)'
Our impression is that positively shutdown-seeking agents aren't explored in great detail by Soares et al 2015; instead, they are briefly considered and then dismissed in favor of shutdown-indifferent agents (which then have their own problems), for example because of the concerns about manipulation that we try to address. Is there other work you can point us to that proposes positively shutdown-seeking ag...
Is there other work you can point us to that proposes positively shutdown-seeking agents?
No, I haven't bothered to track the idea because it's not useful.
In other recent research, I’ve argued that new ‘language agents’ like AutoGPT (or better, generative agents, or Voyager, or SPRING) are much safer than things like Gato, because these kinds of agents optimize for a goal without being trained using a reward function. Instead, their goal is stated in English.
They cannot be 'much safer' because they are the same thing: a decoder Transformer trained to...
Thanks for comments! There is further discussion of this idea in another recent LW post about 'meeseeks'
For what its worth, I've posted a draft paper on this topic over here https://www.lesswrong.com/posts/FgsoWSACQfyyaB5s7/shutdown-seeking-ai
Thank you for your reactions:
-Good catch on 'language agents', we will think about best terminology going forward
-I'm not sure what you have in mind regarding accessing beliefs/desires using synaptic weights rather than text. For example, the language of thought approach to human cognition suggests that human access to beliefs/desires is also fundamentally syntactic rather than weight based. OTOH one way to incorporate some kind of weight would be to assign probabilities to the beliefs stored in the memory stream.
-For OOD over time, I think updating ...
Thanks for reading!
The issue of unified AI parties is discussed but not resolved in section 2.2. There, I discuss some of the paths AIs may take to begin engaging in collective decision making. In addition, I flag that the key assumption is that one AI or multiple AIs acting collectively accumulate enough power to engage in strategic competition with human states.