Advanced agent properties

Discuss the wikitag on this page. Here is the place to ask questions and propose changes.

New Comment

2 comments, sorted by

It's worth pointing out that in our discussions of AI safety, the author (I assume Eliezer, hereafter "you") often describe the problems as being hard precisely for agents that are not (yet) epistemically efficient, especially concerning predictions about human behavior. Indeed, in this comment it seems like you imply that a lack of epistemic efficiency is the primary justification for studying vingean reflection.

Given that you think coping with epistemic inefficiency is an important part of the safety problem, this line:

But epistemic efficiency isn't a necessary property for advanced safety to be relevant - we can conceive scenarios where an AI is not epistemically efficient, and yet we still need to deploy parts of value alignment theory. We can imagine, e.g., a Limited Genie that is extremely good with technological designs, smart enough to invent its own nanotechnology, but has been forbidden to model human minds in deep detail (e.g. to avert programmer manipulation)

Seems misleading.

In general, you seem to equivocate between a model where we can/should focus on extremely powerful agents, and a model where most of the key difficulties are at intermediate levels of power where our AI systems are better than humans at some tasks and worse at others. (You often seem to have quite specific views about which tasks are likely to be easy or hard; I don't really buy most of these particular views, but I do think that we should try to design controls systems that work robustly across a wide range of capability states.)

[-]Kenzi Amodei10y*10

Does it have to be (1) and (2)? My impression is that either one should be sufficient to count - I guess unless they turn out to be isomorphic, but naively I'd expect there to be edge cases with just one or the other.

Gosh this is just like reading the sequences, in the sense that I'm quite confused about what order to read things in. Currently defaulting to reading in the order on the VA list page

My guess why not to use a mathy definition at this point: because we don't want to undershoot when these protocols should be in effect. If that were the only concern though presumably we could just list several sufficient conditions and note that it isn't an exhaustive list. I don't see that, so maybe I'm missing something.

Are stock prices predictably under/over estimates on longer time horizons? I don't think I knew that.

I guess all the brackets are future-hyperlinks?

So an advanced agent doesn't need to be very "smart" necessarily; advanced just means "can impact the world a lot"

I'm guessing instrumental efficiency means that we can't predict it making choices less-smart-than-us in a systematic way? Or something like that

Oh good, cognitive uncontainability was one of the ones I could least guess what it meant from the list hmm, also cross-domain consequentialism.

I don't remember what Vingean unpredictability is. hmm, it seems to be hard to google. I know I've listened to people talk about Vingean reflection, but I didn't really understand it enough for it to stick. Ok, googling Vingean reflection gets me "ensuring that the initial agent's reasoning about its future versions is reliable, even if these future versions are far more intelligent than the current reasoner" from a MIRI abstract. (more generally, reasoning about agents that are more intelligent than you). So Vingean unpredictability would be that you can't perfectly predict the actions of an agent that's more intelligent than you?

Moderation Log