All of Simon Goldstein's Comments + Replies

The issue of unified AI parties is discussed but not resolved in section 2.2. There, I discuss some of the paths AIs may take to begin engaging in collective decision making. In addition, I flag that the key assumption is that one AI or multiple AIs acting collectively accumulate enough power to engage in strategic competition with human states. 

I think there's a steady stream of philosophy getting interested in various questions in metaphilosophy; metaethics is just the most salient to me. One example is the recent trend towards conceptual engineering (https://philpapers.org/browse/conceptual-engineering). Metametaphysics has also gotten a lot of attention in the last 10-20 years https://www.oxfordbibliographies.com/display/document/obo-9780195396577/obo-9780195396577-0217.xml.  There is also some recent work in metaepistemology, but maybe less so because the debates tend to recapitulate pre... (read more)

3Wei Dai
Thanks for this info and the references. I guess by "metaphilosophy" I meant something more meta than metaethics or metaepistemology, i.e., a field that tries to understand all philosophical reasoning in some unified or systematic way, including reasoning used in metaethics and metaepistemology, and metaphilosophy itself. (This may differ from standard academic terminology, in which case please let me know if there's a preferred term for the concept I'm pointing at.) My reasoning being that metaethics itself seems like a hard problem that has defied solution for centuries, so why stop there instead of going even more meta? I think you (and other philosophers) may be too certain that a pause won't happen, but I'm not sure I can convince you (at least not easily). What about calling for it in a low cost way, e.g., instead of doing something high profile like an open letter (with perceived high opportunity costs), just write a blog post or even a tweet saying that you wish for an AI pause, because ...? What if many people privately prefer an AI pause, but nobody knows because nobody says anything? What if by keeping silent, you're helping to keep society in a highly suboptimal equilibrium? I think there are also good arguments for doing something like this from a deontological or contractualist perspective (i.e. you have a duty/obligation to honestly and publicly report your beliefs on important matters related to your specialization), which sidestep the "opportunity cost" issue, but I'm not sure if you're open to that kind of argument. I think they should have some weight given moral uncertainty.

Great questions. Sadly, I don't have any really good answers for you.

  1. I don't know of specific cases, but for example I think it is quite common for people to start studying meta-ethics because of frustration at finding answers to questions in normative ethics.
  2. I do not, except for the end of Superintelligence
  3. Many of the philosophers I know who work on AI safety would love for there to be an AI pause, in part because they think alignment is very difficult. But I don't know if any of us have explicitly called for an AI pause, in part because it seems useless,
... (read more)
7Chris_Leong
The FLI Pause letter didn't achieve a pause, but it dramatically shifted the Overton Window.
3Wei Dai
Thanks, it's actually very interesting and important information. I've noticed (and stated in the OP) that normative ethics seems to be an exception where it's common to express uncertainty/confusion/difficulty. But I think, from both my inside and outside views, that this should be common in most philosophical fields (because e.g. we've been trying to solve them for centuries without coming up with broadly convincing solutions), and there should be a steady stream of all kinds of philosophers going up the meta ladder all the way to metaphilosophy. It recently dawned on me that this doesn't seem to be the case. What seems useless, calling for an AI pause, or the AI pause itself? Have trouble figuring out because if "calling for an AI pause", what is the opportunity cost (seems easy enough to write or sign an open letter), and if "AI pause itself", "seems useless" contradicts "would love". In either case, this seems extremely important to openly discuss/debate! Can you please ask these philosophers to share their views of this on LW (or their preferred venue), and share your own views?

I think most academic philosophers take the difficult of philosophy quite seriously. Metaphilosophy is a flourishing subfield of philosophy; you can find recent papers on the topic here https://philpapers.org/browse/metaphilosophy.  There is also a growing group of academic philosophers working on AI safety and alignment; you can find some recent work here https://link.springer.com/collections/cadgidecih. I think that sometimes the tone of specific papers sounds confident; but that is more stylistic convention than a reflection of the underlying crede... (read more)

6Wei Dai
Thank you for your view from inside academia. Some questions to help me get a better sense of what you see: 1. Do you know any philosophers who switched from non-meta-philosophy to metaphilosophy because they become convinced that the problems they were trying to solve are too hard and they needed to develop a better understanding of philosophical reasoning or better intellectual tools in general? (Or what's the closest to this that you're aware of?) 2. Do you know any philosophers who have expressed an interest in ensuring that future AIs will be philosophically competent, or a desire/excitement for supercompetent AI philosophers? (I know 1 or 2 private expressions of the former, but not translated into action yet.) 3. Do you know any philosophers who are worried that philosophical problems involved in AI alignment/safety may be too hard to solve in time, and have called for something like an AI pause to give humanity more time to solve them? (Even philosophers who have expressed a concern about AI x-risk or are working on AI safety have not taken a position like this, AFAIK.) 4. How often have you seen philosophers say something like "Upon further reflection, my proposed solution to problem X has many problems/issues, I'm no longer confident it's the right approach and now think X is much harder than I originally thought." Would also appreciate any links/citations/quotes (if personal but sharable communications) on these. These are all things I've said or done due to high estimate of philosophical difficulty, but not (or rarely) seen among academic philosophers, at least from my casual observation from outside academia. It's also possible that we disagree on what estimate of philosophical difficulty is appropriate (such that for example you don't think philosophers should often say or do these things), which would also be interesting to know.

Good question, Seth. We begin to analyse this question in section II.b.i of the paper, 'Human labor in an AGI world', where we consider whether AGIs will have a long-term interest in trading with humans.  We suggest that key questions will be whether humans can retain either an absolute or comparative advantage in the production of some goods. We also point to some recent economics papers that address this question. One relevant factor for example is cost disease: as manufacturing became more productive in the 20th century, the total share of GDP devo... (read more)

2Seth Herd
All tasks are automatable in the long term. Humans will eventually have a comparative advantage in nothing if a new AGI can be spun up on newly manufactured hardware to do that task better and cheaper than any human can charge and survive (as space becomes more valuable for robots and compute than for humans). I and others think that long term is maybe 10-30 years. You may have different intuitions, but whatever your horizon, surely you agree that humans are not magical and machines can do better in every regard, and cheaper as software and hardware improve. Competitive economics will not be kind to the weak, and we are but flesh and monkey brains. So: what has economics to say of this possibility? Edit: I guess one obvious answer is that space isn't limited, just space on Earth. So vast economic progress might mean humans can earn enough to survive or even flourish, if progress expands space as well as other goods. It still seems like if AI ultimately out-competes us on every dimension, including cost-to-live, we're screwed - AIs will take all jobs unless we charge too little to support our inefficient meat bodies. And some other algorithm is probably more efficient for any particular task, so I wouldn't expect us us to survive as uploads either. This is why I, and I think many other long-term thinkers, expect humans to survive only through benevolence, not traditional competitive economic forces. Second edit: the last bastion of non-automatable tasks is work that's valued specifically because it's done by a human; better work from an AI would not compete. Are we all to be entertainers? Is enjoying our human lives perhaps of adequate entertainment value for some ultra-rich founding AGI? Or is it guaranteed that they will ultimately find some form of AGI even more entertaining, with more comic foibles and noble raging? Will the species become only a few, preserved as a historical oddity? If our replacements are better even in our eyes, would this be a bad thing?

Thanks Brendon, I agree with a lot of this! I do think there's a big open question about how capable autoGPT-like systems will end up being compared to more straightforward RL approaches. It could turn out that systems with a clear cognitive architecture just don't work that well, even though they are safer

3Brendon_Wong
Yep, I agree that there's a significant chance/risk that alternative AI approaches that aren't as safe as LMAs are developed, and are more effective than LMAs when run in a standalone manner. I think that SCAs can still be useful in those scenarios though, definitely from a safety perspective, and less clear from a performance perspective. For example, those models could still do itemized, sandboxed, and heavily reviewed bits of cognition inside an architecture, even though that's not necessary for them to achieve what the architecture working towards. Also, this is when we start getting into more advanced safety features, like building symbolic/neuro-symbolic white box reasoning systems that are interpretable, for the purpose of either controlling cognition or validating the cognition of black box models (Davidad's proposal involves the latter).

Thanks for the thoughtful post, lots of important points here. For what it’s worth, here is a recent post where I’ve argued in detail (along with Cameron Domenico Kirk-Giannini) that language model agents are a particularly safe route to agi: https://www.alignmentforum.org/posts/8hf5hNksjn78CouKR/language-agents-reduce-the-risk-of-existential-catastrophe

I really liked your post! I linked to it somewhere else in the comment thread

I think one key point you're making is that if AI products have a radically different architecture than human agents, it could be very hard to align them / make them safe. Fortunately, I think that recent research on language agents suggests that it may be possible to design AI products that have a similar cognitive architecture to humans, with belief/desire folk psychology and a concept of self. In that case, it will make sense to think about what desires to give them, and I think shutdown-goals could be quite useful during development to lower the chance... (read more)

Thanks for taking the time to think through our paper! Here are some reactions:

-'This has been proposed before (as their citations indicate)' 

Our impression is that positively shutdown-seeking agents aren't explored in great detail by Soares et al 2015; instead, they are briefly considered and then dismissed in favor of shutdown-indifferent agents (which then have their own problems), for example because of the concerns about manipulation that we try to address.  Is there other work you can point us to that proposes positively shutdown-seeking ag... (read more)

Is there other work you can point us to that proposes positively shutdown-seeking agents?

No, I haven't bothered to track the idea because it's not useful.

In other recent research, I’ve argued that new ‘language agents’ like AutoGPT (or better, generative agents, or Voyager, or SPRING) are much safer than things like Gato, because these kinds of agents optimize for a goal without being trained using a reward function. Instead, their goal is stated in English.

They cannot be 'much safer' because they are the same thing: a decoder Transformer trained to... (read more)

Thanks for comments! There is further discussion of this idea in another recent LW post about 'meeseeks'

Thank you for your reactions:

-Good catch on 'language agents', we will think about best terminology going forward

-I'm not sure what you have in mind regarding accessing beliefs/desires using synaptic weights rather than text. For example, the language of thought approach to human cognition suggests that human access to beliefs/desires is also fundamentally syntactic rather than weight based. OTOH one way to incorporate some kind of weight would be to assign probabilities to the beliefs stored in the memory stream. 

-For OOD over time, I think updating ... (read more)

  • Thanks for taking the time to work through this carefully! I'm looking forward to reading and engaging with the articles you've linked to. I'll make sure to implement the specific description-improvement suggestions in final draft
  • I wish I had more to say about the effort metric! So far, the only thing concrete ideas I've come up with are (i) measure how much compute each action performs; or (ii) decompose each action into a series of basic actions, measure the number of basic actions necessary to perform the action. But both ideas are sketchy.

Thanks for reading! 

  • Yes, you can think of it as having a non-corrigible complicated utility function. The relevant utility function is the 'aggregated utilities' defined in section 2. I think 'corrigible' vs 'non-corrigible' is slightly verbal, since it depends on how you define 'utility', but the non-verbal question is whether the resulting AI is safer. 
  • Good idea, this is on my agenda!
  • Looking forward to reading up on geometric rationality in detail. On a quick first pass, looks like geometric rationality is a bit different because it involves de
... (read more)

Yep that's right! One complication is maybe the agent could behave this way even though it wasn't designed to.