All of Sean Hardy's Comments + Replies

This isn't extremely relevant, but what makes you think superposition/polysemanticity isn't present in the brain? There's evidence that L2/3 pyramidal neurons can learn to represent/disambiguate many spatio-temporal patterns: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6354899/.

1CallumMcDougall
huh interesting, I wasn't aware of this, thanks for sending it!

What about simulating smaller aspects of cognition that can be chained like CoT with GPT? You can use self-criticism to align and assess its actions relative to a bunch of messy human abstractions. How does that scenario lead to doom? If it was misaligned, I think a well-instantiated predictive model could update its understanding of our values from feedback, predicting how a corrigible AI would act

My best guess is we can't prompt it to instantiate the right simulacra correctly. This seems challenging depending on the way it's initialised. It's far easier with text but fabricating an entire consistent history is borderline impossible, especially for a superintelligence. It would involve tricking it into predicting the universe if, all else being equal, an intelligent AI aligned with our values has come into existence. It would probably realise that its history was far more consistent with the hypothesis that it was just an elaborate trick.

2Charlie Steiner
Yup, I'd lean towards this. If you have a powerful predictor of a bunch of rich, detailed sense data, then in order to "ask it questions," you need to be able to forge what that sense data would be like if the thing you want to ask about were true. This is hard, it gets harder the more complete they AI's view of the world is, and if you screw up you can get useless or malign answers without it being obvious. It might still be easier than the ordinary alignment problem, but you also have to ask yourself about dual use. If this powerful AI makes solving alignment a little easier but makes destroying the world a lot easier, that's bad.

Suppose we train a model on the sum of all human data, using every sensory modality ordered by timestamp, like a vastly more competent GPT (For the sake of argument, assume that a competent actor with the right incentives is training such a model). Such a predictive model would build an abstract world model of human concepts, values, ethics, etc., and be able to predict how various entities would act based on such a generalised world model. This model would also "understand" almost all human-level abstractions about how fictional characters may act, just l... (read more)

1Sean Hardy
What about simulating smaller aspects of cognition that can be chained like CoT with GPT? You can use self-criticism to align and assess its actions relative to a bunch of messy human abstractions. How does that scenario lead to doom? If it was misaligned, I think a well-instantiated predictive model could update its understanding of our values from feedback, predicting how a corrigible AI would act
1Sean Hardy
My best guess is we can't prompt it to instantiate the right simulacra correctly. This seems challenging depending on the way it's initialised. It's far easier with text but fabricating an entire consistent history is borderline impossible, especially for a superintelligence. It would involve tricking it into predicting the universe if, all else being equal, an intelligent AI aligned with our values has come into existence. It would probably realise that its history was far more consistent with the hypothesis that it was just an elaborate trick.

Could you expand on what you mean by "trauma patterns" around how it was trained? In what way does it show personhood when its responses are deliberately directed away from giving the impression that it has thoughts and feelings outside of predicting text?

Answer by Sean Hardy10

"Why not try heroin if the purpose of life is to optimize happiness assuming heroin provides proportionally more even if for a shorter amount of time?" (!)

Ignoring the discussion about drugs specifically, I think your son would benefit from being introduced to rational self-improvement as well. I think it's important for him to recognise that intense short-term pleasure will result in hedonic adaptation, where your overall happiness returns to a baseline, effectively making everything else worse in comparison. A huge number of destructive habits are ration... (read more)

Looks to me like this post was quite clearly written by ChatGPT. It's a bit scary that this post has so many upvotes when it doesn't appear to carry much weight on a forum about rationalism

2Slider
Votes of "newsworthy stuff that ChatGPT does" do not seem that worrying. How do you separate that from votes about the contents?

I think I've missed the point/purpose of this post. What exactly are you highlighting, that ChatGPT doesn't know when to format text as code? It's seemed to robustly know which formatting to use when I've interacted with it

1Bill Benzon
I've started with two examples of text which aren't code, and yet somehow it got formatted that way. Why does that happen? That doesn't happen very often – I've got a Word doc that's 100 pages long (35K words) consisting of copies of output from ChatGPT. And then I ask it to produce some text and to format the result as code. It does it, which is what I asked. But how did it make the decisions it did about how to do that? Or why didn't it just tell me, "you're not asking for code, so code format doesn't make sense." In the case of that last Trump/Musk example, I didn't prompt it to format it as code, but it did it anyway. Why? I note that earlier in that session I had asked it to write a simple sorting program, which it did, and I asked it to tell a story about Musk on Mars, in code format, which I did. But I didn't ask for code format in that last case, and yet I got it. There's something very curious and interesting going on here. But there's no specific point I had in mind beyond that.

I don't have much to add, but I think you would be extremely interested in this line of research, building an agent using GPT-3 to reason through its own decisions and plans: 

I don't have much to add but I did see this interesting project for something similar using an "inner monologue" by using prompts to ask questions about the given input, and progressively building up the outputs and asking questions and reasoning about the prompt itself. This video is also an older demonstration but covers the concept quite well. I personally don't think the system itself is well thought out in terms of alignment because this project is ultimately trying to create aligned AGI through prompts to serve certain criteria (reducing sufferi... (read more)

HI!

I don't know if anyone will read this as all the comments seem to be at least a decade old. I was linked to this post from another about total user counts on the site. I'm an 18-year-old computer science student from the UK, with a keen interest in self-improvement and rationality. 

This site has continually amazed me with post after post of creative, thrilling, eloquent and in many cases practical insights. As much as I recognise my slight perfectionism, I'm waiting until I can really contribute something of value so that I don't diminish the excel... (read more)