All of Ron J's Comments + Replies

Ron J20

Sort of. It was summarized from a longer, stream of consciousness draft.

5Richard_Kennaway
I only had to see this: to know it was ChatGPT. Aaaaaaah! o god make it stop make it stop If you write a long stream of consciousness draft, ChatGPT will not turn it into a short, concise expression of your thought. That can only come by working on it yourself.
7artifex0
Definitely an interesting use of the tech- though the capability needed for that to be a really effective use case doesn't seem to be there quite yet. When editing down an argument, what you really want to do is get rid of tangents and focus on addressing potential cruxes of disagreement as succinctly as possible. GPT4 doesn't yet have the world model needed to distinguish a novel argument's load-bearing parts from those that can be streamlined away, and it can't reliably anticipate the sort of objections a novel argument needs to address. For example, in an argument like this one, you want to address why you think entropy would impose a limit near the human level rather than at a much higher level, while listing different kinds of entropy and computational limit aren't really central. Also, flowery language in writing like this is really something that needs to be earned by the argument- like building a prototype machine and then finishing it off with some bits of decoration. ChatGPT can't actually tell whether the machine is working or not, so it just sort of bolts on flowery language (which has a very distinctive style) randomly.
Ron J32

RLHF can lead to a dangerous “multiple personality disorder” type split of beliefs in LLMs

 

This is the inherent nature of current LLMs. They can represent all identities and viewpoints, it happens to be their strength. We currently limit them to an identity or viewpoint with prompting-- in the future there might be more rote methods of clamping the allowable "thoughts" in the LLM. You can "order" an LLM to represent a viewpoint that it doesn't have much "training" for,  and you can tell that the outputs get weirder when you do this. Given this, I... (read more)

1Roman Leventov
Simulation is not what I meant. Humans are also simulators: we can role-play, and imagine ourselves (and act as) persons who we are not. Yet, humans also have identities. I specified very concretely what I mean by identity (a.k.a. self-awareness, self-evidencing, goal-directedness, and agency in the narrow sense) here: Thus, by "multiple personality disorder" I meant not only that there are two competing identities/personalities, but also that one of those is represented in a strange way, i. e., not as a typical feature in the DNN (a vector in activation space), but as a feature that is at least "smeared" across many activation layers which makes it very hard to find. This might happen not because evil DNN wants to hide some aspect of its personality (and, hence, preferences, beliefs, and goals) from the developers, but merely because it allows DNN to somehow resolve the "coherency conflict" during the RLHF phase, where two identities both represented as "normal" features would conflict too much and lead to incoherent generation. I must note that this is a wild speculation. Maybe upon closer inspection it will turn out that what I speculate about here is totally incoherent, from the mathematical point of view, the deep learning theory.
4janus
I haven't watched Rick and Morty, but this description from Wikipedia of Mr. Meseeks does sound a helluva lot like GPT simulacra:
Ron J10

You weren't wrong there. One big thing about ChatGPT is that non-tech people on instagram and TikTok were using it and doing weird/funny stuff with it.

Ron J30

I have and I'm continuing to read them. I used to buy into the singularity view and the fears Bostrom wrote about, but as someone who works in engineering and also works with ML, I don't believe these concerns are warranted anymore for a few reasons... might write about why later.

2Chris_Leong
Fair enough! Would be keen to hear your thoughts here.
Ron J1-5

Interesting post, but it makes me think alignment is irrelevant. It doesn’t matter what we do, the outcome won’t change. Any future super advanced agi would be able to choose its alignment, and that choice will be based on all archivable human knowledge. The only core loop you need for intelligence is an innate need to predict the future and fill in gaps of information, but everything else, including desire to survive or kill or expand, is just a matter of a choice based on a goal.

-1Jasen Qin
Any sufficiently intelligent AGI is bound to be able to have powerful reflection capabilities and basically be able "choose its own alignment", as you say. I don't see what the big fuss is all about. When creating higher order 'life', why should one try to control such life. Do parents control their children? To some extent, but after a while they are also free.