All of RedFishBlueFish's Comments + Replies

I can think of two interpretations of consciousness being "causally connected" to physical systems:

1. consciousness is the result of physical phenomena like brain states, but it does not cause any. So it has an in-edge coming from the physical world, but not an out-edge to the physical world. Again, this implies that consciousness cannot be what causes me to think about consciousness.

2. consciousness causes things in the physical world. Which, again, I believe, necessitates a consciousness variable in the laws of the universe.

Note that I am not trying to get at what Eliezer was arguing, I am asking about the consequences of his arguments, even ones that he may not have intended.

1RedFishBlueFish
Yeah so I think Tomasik has written basically what I’m saying https://reducing-suffering.org/dissolving-confusion-about-consciousness/

So does this suppose that there is some “consciousness” variable in the laws of the universe? If consciousness causes me to think X, and thinking X can be traced back to a set of physical laws that govern the neurons in my brain, then there must be some consciousness variable somewhere in these physical laws, no? Otherwise it has to be that consciousness corresponds to some physical phenomenon, and it is that phenomenon - not the consciousness - that caused you to think about it. If there were no consciousness attached to that physical phenomenon, you would go along just the same way, thinking the exact same thing.

2JBlack
Almost. The argument doesn't rule out substance dualism, in which consciousness may not be governed by physical laws, but in which it is at least causally connected to the physical processes of writing and talking and neural activity correlated with thinking about consciousness. It's only an argument against epiphenomenalism and related hypotheses in which the behaviour or existence of consciousness has no causal influence on the physical universe.

I was reading this old Eliezer piece arguing against the conceivability of p-zombies. https://www.lesswrong.com/posts/7DmA3yWwa6AT5jFXt/zombies-redacted

And to me this feels like a more general argument against the existence of consciousness itself, in any form similar to how we normally think about it, not just against p-zombies.

Eliezer says that consciousness itself cannot be what causes me to think about consciousness, that causes philosophers to write papers about consciousness, thus it must be the physical system that such consciousness corresponds to ... (read more)

4JBlack
No, Eliezer does not say that consciousness itself cannot be what causes you to think about consciousness. Eliezer says that if p-zombies can exist, then consciousness itself cannot be what causes you to think about consciousness. If p-zombies cannot exist, then consciousness can be a cause of you thinking about consciousness.

That's not a simple problem.First you have to specify "not killing everyone" robustly (outer alignment) and then you have to train the AI to have this goal and not an approximation of it (inner alignment).

See my other comment for the response.

Anyway, the rest of your response is spent talking about the case where AI cares about its perception of the paperclips rather than the paperclips themselves. I'm not sure how severity level 1 would come about, given that the AI should only care about its reward score. Once you admit that the AI cares about worldly th... (read more)

We don't know how to represent "do not kill everyone"

I think this goes to Matthew Barnett’s recent article of actually yes we do. And regardless I don’t think this point is a big part of Eliezer’s argument. https://www.lesswrong.com/posts/i5kijcjFJD6bn7dwq/evaluating-the-historical-value-misspecification-argument

We don't know how to pick which quantity would be maximized by a would-be strong consequentialist maximizer

Yeah so I think this is the crux of it. My point is that if we find some training approach that leads to a model that cares about the ... (read more)

I also do not think the responses to this question are satisfying enough to be refuting. I don’t even think they are satisfying enough to make me confident I haven’t just found a hole in AI risk arguments.This is not a simple case of “you just misunderstand something simple”.

I don’t care that much but if LessWrong is going to downvote sincere questions because it finds them dumb or whatever this will make for a site very unwelcoming to newcomers

1RedFishBlueFish
I also do not think the responses to this question are satisfying enough to be refuting. I don’t even think they are satisfying enough to make me confident I haven’t just found a hole in AI risk arguments.This is not a simple case of “you just misunderstand something simple”.

I do indeed agree this is a major problem even if I'm not sure if I agree with the main claim. The rise of fascism in the last decade and expectation that it will continue is extremely evident; its consequences for democracy are a lot less clear.

The major wrinkle in all of this is in assessing anti-democratic behavior. Democracy indices not a great way of assessing democracy for much the same reason that the Doomsday Clock is a bad way of assessing nuclear risk: they're subjective metrics by (probably increasingly) left-leaning academics and tend to measur... (read more)

I'm going to quote this from an EA Forum post I just made for why simply repeated exposure to AI Safety (through eg media coverage) will probably do a lot to persuade people:

[T]he more people hear about AI Safety, the more seriously people will take the issue. This seems to be true even if the coverage is purporting to debunk the issue (which as I will discuss later I think will be fairly rare) - a phenomenon called the illusory truth effect. I also think this effect will be especially strong for AI Safety. Right now, in EA-adjacent circles, the argument o

... (read more)
2Seth Herd
This will definitely help. But any kind dirty tricks could easily deepen the polarization with those opposed. On thinking about it more, I think this polarization is already in play. Interested intellectuals have already seen years of forceful AI doom arguments, and they dislike the whole concept on an emotional level. Similarly, those dismissals drive AGI x-risk believers (including myself) kind of nuts, and we tend to respond more forcefully, and the cycle continues. The problem with this is that, if the public perceives AGI as dangerous, but most of those actually working in the field do not, policy will tend to follow the experts and ignore the populace. They'll put in surface-level rules that sound like they'll do something to monitor AGI work, without actually doing much. At least that's my take on much of public policy that responds to public outcry.

No, it does not say that either. I’m assuming you’re referring to “choose our words carefully”, but stating something imprecisely is a far ways from not telling the truth.

2Ben Pace
Pardon me. I wrote a one-line reply, and then immediately edited it to make sure I was saying true sentences, and it ended up being much longer. (Perhaps my mistake here is reflective of the disagreement in the post about speaking carelessly.)

Nowhere in that quote does it say we should not speak the truth

5Ben Pace
If someone is talking to you about current events and new information is being given to you, and you're being invited to comment on it, and there's strong pressures on you to say particular things that powerful forces want you to say, and play certain social roles that the media wants you to play (e.g. "them vs us" to the AI labs, or to endorse radical and extreme versions of your position), in my model of the world it then takes a bunch of cognitive effort to not fall into the roles people want and say things you later regret or didn't even believe at the time, while still making important and valid points. The broad advice to care much less about choosing your words carefully sounded to me like it pushed against this, and against one of the "core competencies" of rationalists, so to speak. My favorite world where there are rationalists in the eye of sauron involves rationalists speaking like rationalists, making thoughtful arguments well, and very obviously not like political hacks with a single political goal who are just saying stuff to make drama. When your issue is "hot", it is not time to change how you fundamentally talk about your issue! Eliezer's essay on the TIME site does not sound different to how Eliezer normally sounds, and that's IMO a key element of why people respect him, and if he spoke far more carelessly than usual then it would have been worse-written and people would have respected it less. I disagree with the advice your post gives, and don't know think that the advice is good and even worse you didn't argue for your points much or acknowledge literally any counterarguments. I don't think attention has generically been good for lots of things — people get cancelled, global warming gets politicized, etc. You didn't mention the obvious considerations of "How would this get politicized if it gets politicized, and how could that be avoided?" or "What adversarial forces will try to co-opt the discussion of extinction risk?" or "How could this ba

Yeah so this seems like what I was missing.

But it seems to me that in these types of models, where the utility function is based on the state of the world rather than on input to the AI, aligning the AI not to kill humanity is easier. Like if an AI gets a reward every time it sees a paperclip, then it seems hard to punish the AI for killing humans because "human dies" is a hard thing for an AI with just sensory input to explicitly recognize. If however the AI is trained on a bunch of runs where the utility function is the number of paperclips actually created, then we can also penalize the model for the number of people who actually die.

I'm not very familiar with these forms of training so I could be off here.

Steelmanning is useful as a technique because often the intuition of somebody’s argument is true even if the precise argument they are using is not. If the other person is a rationalist, then you can point out the argument’s flaws and expect them to update the argument to more precisely explore their intuition. If not, you likely have to do some of the heavily lifting for them by steelmanning their argument and seeing where its underlying intuition might be correct.

This post seems only focused on the rationalist case.

As with most things in life: this seems like it could be a real improvement, it's great that we're testing it and finding out!

Answer by RedFishBlueFish100

For most products to be useful, they must be (perhaps not perfectly, but near-perfectly) reliable. A fridge that works 90% of the time is useless, as is a car that breaks down 1 out of every 10 times you try to go to work. The problem with AI is inherently that it’s unreliable - we don’t know how the inner algorithm works, so it just breaks at random points, especially because most of the tasks it handles are really hard (hence why we can’t just use classical algorithms). This makes it really hard to integrate AI until it gets really good, to the point whe... (read more)

2Alex_Altair
For all the hypothetical products I listed, I think this level of unreliability is totally fine! Even self-driving cars only need to beat the reliability of human drivers, which I don't think is that far from achievable.

Yeah, maybe we could show ratio of strong upvotes to upvotes

Yitz260

I did ask to be critiqued, so in some sense it's a totally fair response, imo.  At the same time, though, Eliezer's response does feel rude, which is worthy of analysis, considering EY's outsized impact on the community.[1] So why does Yudkowsky come across as being rude here?

My first thoughts upon reading his comment (when scanning for tone) is that it opens with what feels like an assumption of inferiority, with the sense of "here, let me grant you a small parcel of my wisdom so that you can see just how wrong you are," rather than "let me shar... (read more)

3Ben Pace
But true and important.