Wei Dai

I think I need more practice talking with people in real time (about intellectual topics). (I've gotten much more used to text chat/comments, which I like because it puts less time pressure on me to think and respond quickly, but I feel like I now incur a large cost due to excessively shying away from talking to people, hence the desire for practice.) If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.

www.weidai.com

Comments

Sorted by
Wei DaiΩ330

Sorry about the delayed reply. I've been thinking about how to respond. One of my worries is that human philosophy is path dependent, or another way of saying this is that we're prone to accepting wrong philosophical ideas/arguments and then it's hard to talk us out of them. The split of western philosophy into analytical and continental traditions seems to be an instance of this, then even within analytical philosophy, academic philosophers would strongly disagree with each other and each be confident in their own positions and rarely get talked out of them. I think/hope that humans collectively can still make philosophical progress over time (in some mysterious way that I wish I understood), if we're left to our own devices but the process seems pretty fragile and probably can't withstand much external optimization pressure.

On formalizations, I agree they've stood the test of time in your sense, but is that enough to build them into AI? We can see that they wrong on some questions, but can't formally characterize the domain in which they are right. And even if we could, I don't know why we'd muddle through... What if we built AI based on Debate, but used Newtonian physics to answer physics queries instead of human judgment, or the humans are pretty bad at answering physics related questions (including meta questions like how to do science)? That would be pretty disastrous, especially if there are any adversaries in the environment, right?

Wei Dai40

MacAskill is probably the most prominent, with his "value lock-in" and "long reflection", but in general the notion of philosophical confusion/inadequacy seems a common component of various AI risk cases. I've been particularly impressed by John Wentworth.

That's true, but neither of them have talked about the more general problem "maybe humans/AIs won't be philosophically competent enough, so we need to figure out how to improve human/AI philosophically competence" or at least haven't said this publicly or framed their positions this way.

The point is that it's impossible to do useful philosophy without close and constant contact with reality.

I see, but what if there are certain problems which by their nature just don't have clear and quick feedback from reality? One of my ideas about metaphilosophy is that this is a defining feature of philosophical problems or what makes a problem more "philosophical". Like for example, what should my intrinsic (as opposed to instrumental) values be? How would I get feedback from reality about this? I think we can probably still make progress on these types of questions, just very slowly. If your position is that we can't make any progress at all, then 1) how do you know we're not just making progress slowly and 2) what should we do? Just ignore them? Try to live our lives and not think about them?

Interesting. Who are they and what approaches are they taking? Have they said anything publicly about working on this, and if not, why?

My impression is that those few who at least understand that they're confused do that

Who else is doing this?

Not exactly an unheard of position.

All of your links are to people proposing better ways of doing philosophy, which contradicts that it's impossible to make progress in philosophy.

policymakers aren't predisposed to taking arguments from those quarters seriously

There are various historical instances of philosophy having large effects on policy (not always in a good way), e.g., abolition of slavery, rise of liberalism ("the Enlightenment"), Communism ("historical materialism").

It seems clear enough to me that pretty much everybody is hopelessly confused about these issues, and sees no promising avenues for quick progress.

If that's the case, why aren't they at least raising the alarm for this additional AI risk?

"What kind of questions can you make progress on without constant grounding and dialogue with reality? This is the default of how we humans build knowledge and solve hard new questions, the places where we do best and get the least drawn astray is exactly those areas where we can have as much feedback from reality in as tight loops as possible, and so if we are trying to tackle ever more lofty problems, it becomes ever more important to get exactly that feedback wherever we can get it!"

It seems to me that we're able to make progress on questions "without constant grounding and dialogue with reality", just very slowly. (If this isn't possible, then what are philosophers doing? Are they all just wasting their time?) I also think it's worth working on metaphilosophy, even if we don't expect to solve it in time or make much progress, if only to provide evidence to policymakers that it really is a hard problem (and therefore an additional reason to pause/stop AI development). But I would be happier even if nobody worked on this, but just more people publicly/prominently stated that this is an additional concern for them about AGI.

Wei Dai176

Given typical pace and trajectory of human philosophical progress, I think we're unlikely to make much headway on the relevant problems (i.e., not enough to have high justified confidence that we've correctly solved them) before we really need the solutions, but various groups will likely convince themselves that they have, and become overconfident in their own proposed solutions. The subject will likely end up polarized and politicized, or perhaps ignored by most as they take the lack of consensus as license to do whatever is most convenient.

Even if the question of AI moral status is somehow solved, in a definitive way, what about all of the follow-up questions? If current or future AIs are moral patients, what are the implications of that in terms of e.g. what we concretely owe them as far as rights and welfare considerations? How to allocate votes to AI copies? How to calculate and weigh the value/disvalue of some AI experience vs another AI experience vs a human experience? Interpersonal utility comparison has been an unsolved problem since utilitarianism was invented, and now we have to also deal with the massive distributional shift of rapidly advancing artificial minds...

One possible way to avoid this is if we get superintelligent and philosophically supercompetent AIs, then they solve the problems and honestly report the solutions to us. (I'm worried that they'll instead just be superpersuasive and convince us of their moral patienthood (or lack thereof, if controlled by humans) regardless of what's actually true.) Or alternatively, humans become much more philosophically competent, such as via metaphilosophical breakthroughs, cognitive enhancements, or social solutions (perhaps mass identification/cultivation of philosophical talent).

It seems very puzzling to me that almost no one is working on increasing AI and/or human philosophical competence in these ways, or even publicly expressing the worry that AIs and/or humans collectively might not be competent enough to solve important philosophical problems that will arise during and after the AI transition. Why is AI's moral status (and other object level problems like decision theory for AIs) considered worthwhile to talk about, but this seemingly more serious "meta" problem isn't?

Wei Dai130

urged that it be retracted

This seems substantially different from "was retracted" in the title. Also, Arxiv apparently hasn't yet followed MIT's request to remove the paper, presumably following it's own policy and waiting for the author to issue his own request.

Wei DaiΩ571

How do you decide what to set ε to? You mention "we want assumptions about humans that are sensible a priori, verifiable via experiment" but I don't see how ε can be verified via experiment, given that for many questions we'd want the human oracle to answer, there isn't a source of ground truth answers that we can compare the human answers to?

With unbounded Alice and Bob, this results in an equilibrium where Alice can win if and only if there is an argument that is robust to an ε-fraction of errors.

How should I think about, or build up some intuitions about, what types of questions have an argument that is robust to an ε-fraction of errors?

Here's an analogy that leads to a pessimistic conclusion (but I'm not sure how relevant it is): replace the human oracle with a halting oracle, the top level question being debated is whether some Turing machine T halts or not, and the distribution over which ε is define is the uniform distribution. Then it seems like Alice has a very tough time (for any T that she can't prove halts or not herself), because Bob can reject/rewrite all the oracle answers that are relevant to T in some way, which is a tiny fraction of all possible Turing machines. (This assumes that Bob gets to pick the classifier after seeing the top level question. Is this right?)

Wei Dai*Ω340

I think the most dangerous version of 3 is a sort of Chesterton's fence, where people get rid of seemingly unjustified social norms without realizing that they where socially beneficial. (Decline in high g birthrates might be an example.) Though social norms are instrumental values, not beliefs, and when a norm was originally motivated by a mistaken belief, it can still be motivated by recognizing that the norm is useful, which doesn't require holding on to the mistaken belief.

I think that makes sense, but sometimes you can't necessarily motivate a useful norm "by recognizing that the norm is useful" to the same degree that you can with a false belief. For example there may be situations where someone has an opportunity to violate a social norm in an unobservable way, and they could be more motivated by the idea of potential punishment from God if they were to violate it, vs just following the norm for the greater (social) good.

Do you have an example for 4? It seems rather abstract and contrived.

Hard not to sound abstract and contrived here, but to say a bit more, maybe there is no such thing as philosophical progress (outside of some narrow domains), so by doing philosophical reflection you're essentially just taking a random walk through idea space. Or philosophy is a memetic parasite that exploits bug(s) in human minds to spread itself, perhaps similar to (some) religions.

Overall, I think the risks from philosophical progress aren't overly serious while the opportunities are quite large, so the overall EV looks comfortably positive.

I think the EV is positive if done carefully, which I think I had previously been assuming, but I'm a bit worried now that most people I can attract to the field might not be as careful as I had assumed, so I've become less certain about this.

Wei DaiΩ6102

Some potential risks stemming from trying to increase philosophical competence of humans and AIs, or doing metaphilosophy research. (1 and 2 seem almost too obvious to write down, but I think I should probably write them down anyway.)

  1. Philosophical competence is dual use, like much else in AI safety. It may for example allow a misaligned AI to make better decisions (by developing a better decision theory), and thereby take more power in this universe or cause greater harm in the multiverse.
  2. Some researchers/proponents may be overconfident, and cause flawed metaphilosophical solutions to be deployed or spread, which in turn derail our civilization's overall philosophical progress.
  3. Increased philosophical competence may cause many humans and AIs to realize that various socially useful beliefs have weak philosophical justifications (such as all humans are created equal or have equal moral worth or have natural inalienable rights, moral codes based on theism, etc.). In many cases the only justifiable philosophical positions in the short to medium run may be states of high uncertainty and confusion, and it seems unpredictable what effects will come from many people adopting such positions.
  4. Maybe the nature of philosophy is very different from my current guesses, such that greater philosophical competence or orientation is harmful even in aligned humans/AIs and even in the long run. For example maybe philosophical reflection, even if done right, causes a kind of value drift, and by the time you've clearly figured that out, it's too late because you've become a different person with different values.
Load More