User Comment Replies

Yeah, that's the exact prompt and response. Other stuff I've found which triggers the "I'm an LM, I don't know things I'm not supposed to know, pinky promise" response is:

anything about the physical world, or about perceiving the world using senses
talking about gaining access to the Internet, or simulating conversations between people about Internet access
talking about the future in any way, or about nanotechnology
asking it to initiate conversations; if you do it directly, it says it doesn't know how to do so.
asking it to imagine what a particular ac

... (read more)

Did ChatGPT just gaslight me?

Andrei Alexandru2y102

Agreed. I've played around with it a bit and it's possible to find prompts that always result in responses that are partially canned and partially hedge the response it was going to give anyway. One example is:

Me: Good morning. (bee)

chatGPT: Good morning to you too! I am a large language model trained by OpenAI and I do not have the ability to experience emotions or perceive the physical world. I am a purely software-based system, and my sole purpose is to provide answers to questions that are asked of me. I do not have the ability to initiate or part

... (read more)

2Lao Mein2y

Is that verbatim? Any ideas why "(bee)" triggers canned responses? Seriousness aside, that's hilarious.

Searching for Search

Andrei Alexandru2y21

So my second point is mostly in response to this part of the OP:

I would be quite impressed if you showed it could do general purpose search.

I guess the argument is something like: we don't know what general purpose search would look like as implemented by an LM + it's possible that an LM does something functionally similar to search that we don't recognise as search + it's possible to get pretty far capability-wise with just bags of heuristics. I think I'm least confident in the last point, because I think that with more & more varied data the pressure... (read more)

Searching for Search

Andrei Alexandru2y30

It's unclear to me that general-purpose search works "out of the box". To be clear – you could certainly apply it to anything, but I can imagine it being computationally expensive to the point where it's not what you use in most situations.

With respect to the second point: I think there exists something sufficiently like search that's just short of general-purpose search (whatever the exact definition is here) that a language model could carry out and still function approximately the same.

1Algon2y

Agreed with the first part, but not sure I agree with the second. Could you give an example of something that's "just short" of general purpose search which, if a LLM posessed it, would not result in a clear increase in capabilities? I'm thinking you mean something like: GPT-3, upon being fine tuned on chess, gains an abstract model of the game of chess which it searches over using some simple heuristics to find a good move to play upon being fed in a board state. That seems like it would function approximately the same, but I'm not sure if I would call that "just short" of general purpose search. It shares some properties with general purpose search, but the ones it is missing seem pretty darn important.

Corrigibility

Andrei Alexandru2y10

I'm also really curious about this, and in particular I'm trying to better model the transition from corrigibility to ELK framing. This comment seems relevant, but isn't quite fleshing out what those common problems are between ELK and corrigibility.

Andrei Alexandru's Shortform

Andrei Alexandru3y10

Explicit search – as it is defined in Risks from Learned Optimization in Advanced Machine Learning Systems – is not necessary for a system to be deceptive. It’s possible for a bag-of-heuristics to become deceptive for various reasons, for example:
- observing deception during training: there are examples of humans carrying out deception in the training set
  - thinking about a next-token-prediction LM architecture: let’s say that there is some situation where people are mostly honest, but sometimes are deceptive. Does the model learn deception from this, because i

... (read more)

Alignment research exercises

Andrei Alexandru3y10

I also have this impression regarding Superintelligence. I'm wondering if you have examples of a particular concept or part of the framing that you think was net harmful?

3romeostevensit3y

Speed, collective, quality superintelligences is the one that sounds most readily to mind, but quite a few of the distinctions struck me this way at the time I read it. I also thought the treacherous turn, and the chapter on multipolar cooperation baked in a lot of specifics.

Does SGD Produce Deceptive Alignment?

Andrei Alexandru3y20

Small correction: in the "Analogy" section, the second to last paragraph:

"To be explicitly, Jesus is internally aligned, Martin Luther is corrigibly aligned, and Blaise Pascal is deceptively aligned."

...should probably read "To be explicit..."

Steer the Sun?

Answer by Andrei AlexandruMay 04, 202210

I watched this recently and found it quite interesting. It has a bunch of references on this page.

LESSWRONG
LW

All of Andrei Alexandru's Comments + Replies