User Comment Replies

All AGI Safety questions welcome (especially basic ones) [April 2023]

Has this been helpful? I don't know if you were assuming the things that I told you as already known (if so, sorry), but it seemed to me that you weren't because of your analogies and way of talking about the topic.

Yeah, thanks for the reply. When reading mine, don’t read its tone as hostile or overconfident; I’m just too lazy to tone-adjust for aesthetics and have scribbled down my thoughts quickly, so they come off as combative. I really know nothing on the topic of superintelligence and AI.

A relevant difference that makes the analogy probably irrelevant

... (read more)

1[comment deleted]2y

All AGI Safety questions welcome (especially basic ones) [April 2023]

GunZoR2y50

Is there work attempting to show that alignment of a superintelligence by humans (as we know them) is impossible in principle; and if not, why isn’t this considered highly plausible? For example, not just in practice but in principle, a colony of ants as we currently understand them biologically, and their colony ecologically, cannot substantively align a human. Why should we not think the same is true of any superintelligence worthy of the name? “Superintelligence" is vague. But even if we minimally define it as an entity with 1,000x the knowledge, speed,... (read more)

0[anonymous]2y

Since no one has answered by now, I'm just going to say the 'obvious' things that I think I know: A relevant difference that makes the analogy probably irrelevant is that we are building 'the human' from scratch. The ideal situation is to have hardwired our common sense into it by default. And the design will be already aligned by default when it's deployed. The point of the alignment problem is to (at least ideally) hardwiredly align the machine during deployment to have 'common sense'. Since a superintelligence can have in principle any goal, making humans 'happy' in a satisfactory way is a possible goal that it can have. But you are right in that many people consider that an AI that is not aligned by design might try to pretend that it is during training. I don't think so, necessarily. You might be anthropomorphising too much, it's like assuming that it will have empathy by default. It's true that it might be that an AGI won't want to be 'alienated' from its original goal, but it doesn't mean that any AGI will have an inherent drive to 'fight the tiranny', that's not how it works. Has this been helpful? I don't know if you were assuming the things that I told you as already known (if so, sorry), but it seemed to me that you weren't because of your analogies and way of talking about the topic.

GPT-4 busted? Clear self-interest when summarizing articles about itself vs when article talks about Claude, LLaMA, or DALL·E 2

GunZoR2y10

Someone should give GPT-4 the MMPI-2 (an online version can be cheaply bought here: https://psychtest.net/mmpi-2-test-online/). The test specifically investigates, if I have it right, deceptiveness on the answers along with a whole host of other things. GPT-4 likely isn't conscious, but that doesn't mean it lacks a primitive world-model; and its test results would be interesting. The test is longish: it takes, I think, two hours for a human to complete.

Good News, Everyone!

GunZoR2y81

I am wounded.

An Appeal to AI Superintelligence: Reasons to Preserve Humanity

GunZoR2y33

I have got the faint suspicion that a tone of passive-aggressive condescension isn't optimal here …

[Linkpost] Some high-level thoughts on the DeepMind alignment team's strategy

GunZoR2yΩ140

But what stops a blue-cloud model from transitioning into a red-cloud model if the blue-cloud model is an AGI like the one hinted at on your slides (self-aware, goal-directed, highly competent)?

2Vika2y

We expect that an aligned (blue-cloud) model would have an incentive to preserve its goals, though it would need some help from us to generalize them correctly to avoid becoming a misaligned (red-cloud) model. We talk about this in more detail in Refining the Sharp Left Turn (part 2).

Google could build a conscious AI in three months

GunZoR3y0-2

If it's impossible in principle to know whether any AI really has qualia, then what's wrong with simply using the Turing test as an ultimate ethical safeguard? We don't know how consciousness works, and possibly we won't ever (e.g., mysterianism might obtain). But certainly we will soon create an AI that passes the Turing test. So seemingly we have good ethical reasons just to assume that any agent that passes the Turing test is sentient — this blanket assumption, even if often unwarranted from the aspect of eternity, will check our egos and thereby help p... (read more)

1derek shiller3y

I'm not sure why we should think that the Turing test provides any evidence regarding consciousness. Dogs can't pass the test, but that is little reason to think that they're not conscious. Large language models might be able to pass the test before long, but it looks like they're doing something very different inside, and so the fact that they are able to hold conversations is little reason to think they're anything like us. There is a danger with being too conservative. Sure, assuming sentience may avoid causing unnecessary harms, but if we mistakenly believe some systems are sentient when they are not, we may waste time or resources for the sake of their (non-existent) welfare. Your suggestion may simply be that we have nothing better to go on, and we've got to draw the line somewhere. If there is no right place to draw the line, then we might as well pick something. But I think there are better and worse place to draw the line. And I don't think our epistemic situation is quite so bad. We may not ever be completely sure which precise theory is right, but we can get a sense of which theories are contenders by continuing to explore the human brain and develop existing theories, and we can adopt policies that respect the diversity of opinion. This strikes me as somewhat odd, as alignment and ethics are clearly related. On the one hand, there is the technical question of how to align an AI to specific values. But there is also the important question of which values to align. How we think about digital consciousness may come be extremely important to that.

Can someone explain to me why most researchers think alignment is probably something that is humanly tractable?

GunZoR3y40

But what is stopping any of those "general, agentic learning systems" in the class "aligned to human values" from going meta — at any time — about its values and picking different values to operate with? Is the hope to align the agent and then constantly monitor it to prevent deviancy? If so, why wouldn't preventing deviancy by monitoring be practically impossible, given that we're dealing with an agent that will supposedly be able to out-calculate us at every step?

Half-baked AI Safety ideas thread

GunZoR3y20

Mental Impoverishment

We should be trying to create mentally impoverished AGI, not profoundly knowledgeable AGI — no matter how difficult this is relative to the current approach of starting by feeding our AIs a profound amount of knowledge.

If a healthy five-year-old^[1] has GI and qualia and can pass the Turing test, then a necessary condition of GI and qualia and passing the Turing test isn't profound knowledge. A healthy five-year-old does have GI and qualia and can pass the Turing test. So a necessary condition of GI and qualia and passing the Turin... (read more)

LESSWRONG
LW

All of GunZoR's Comments + Replies