Wiki Contributions

Comments

Sorted by
hmys30

I agree with this analysis. I mean, I'm not certain further optimization will erode the interpretability of the generated CoT, its possible the fact its pretrained to use human natural language pushes it in a stable equilibrium, but I don't think so, there are ways the CoT can become less interpretable in a step-wise fashion.

But this is the way its going, seems inevitable to me. Just scaling up models and then training them on English language internet text, is clearly less efficient (from a "build AGI" perspective, and from a profit-perspective) than training them to do the specific tasks that the users of the technology want. So thats the way its going. 

And once you're training the models this way, the tether between human-understandable concepts and the CoT will be completely destroyed. If they stay together, it will just be because its kind of a stable initial condition.

 

hmys45

I just meant not primarily motivated by truth.

hmys72

I think this is a really bad article. So bad that I can't see it not being written with ulterior motives.

1. Too many things are taken out of context, like "the feminists are literally voldemort" quote.

2. Too many things are paraphrased in dishonest and ridiculously over the top ways. Like saying Harris has "longstanding plans to sterilize people of color", before a quote that just says she wants to give birth control to people in Haiti.

3. Offering negative infinity charity in every single area. In the HBD email, Scott says he thinks neoreactionaries create endless streams of garbage, but with some tiny nuggets of gold. And that he can take the nuggets of gold and just tune out the rest. The article then goes on to list everything bad about neoreactionaries as if Scott's email is evidence he endorses all of neoreaction? What?

4. Overall no clear direct argument. The article spends half its word justifying the connection between Scott and EA, which I don't think anyone would deny. Then puts up the email, instantly infers the worst possible intent being it with little justification. Then lists every single racist person scott has ever said anything even lighly good about. 

Overall, the article updates me in the direction of thinking scott is less racist and less sympethetic to neoreactionary thinking. The article has clearly put in effort, and the author is clearly trying their very best to pain Scott in a bad light, and Scott has literally 20 years of constant blogging put out openly on the internet. But the article is not very convincing. 

hmys50

What is the probability they intentionally fine tuned to hide canary contamination?

Seems like an obviously very silly thing to do. But with things like the NDA, my priors on oai being deceptive to their own detriment is not that low.

I'm pretty sure it wouldn't forget the string.

hmys32

In my experience, the results are quite quick and its interesting to remember your dreams. The time it takes is ~10 minutes a day. 

I'm not gonna say it doesn't take any effort. It can be hard to to it if you are tired in the morning, but I disagree with the characterization that it takes "a lot" of effort. 

Outside of studying/work, I exercise every day, do anki cards every day, and try to make a reasonably healthy dinner every day. Each of those activities individually take ~10x the cognitive effort and willpower that dream journaling does. (for me)

hmys246

Maybe I'm a unique example, but none of this matches my experience at all.

 I was able to have lucid dreams relatively consistently just by dream journaling and doing reality checks. WILD was quite difficult to do, because you kind of have to walk a tight balance, where you keep yourself in a half-asleep state while carrying out instructions that requite a fair bit of metacognitive awareness, but once you get the hang of it, you can do that pretty consistently as well, without much time commitment.

That lucid dreams don't offer much more than traditional entertainment seems also (obviously?) false to me. People use VR to make traditional entertainment more immersive. And LDs are far more immersive than that, and less limited than video games are. 

They're also just a really interesting psychological phenomena. The process is fun. If you find yourself in a lucid dream, its a strange situation. Testing out things, like checking how well your internal physics simulation engine works is really fun. Or just walking around and seeing what your subconscious generates is very fun. And very different from just imagining random stuff. Trying to meditate, and observing how your mind works differently in a dream, compared with waking reality is interesting. Seeing how extreme/vivid sensations you can generate in a dream is fun. Like trying to see if you can get yourself to feel pain. Or how loud sounds you can make.

Galantamine and various supplements all did nothing for me. 

The only thing I agree with is the habituation effect. But like, that's how many things work. You eventually get bored of stuff / feel you've exhausted all the low-hanging fruits.

hmys60

Can't you just keep a dream journal? I find if I do that consistently right upon waking up, I'm able to remember dreams quite well.

hmys30

I've used SSRIs for maybe 5 years, and I think they've been really useful, with no negative effects, and more or less unwavering efficacy. The only exception is that they've non-negligibly lowered my libido. But to be honest, I don't mind it that much. 

Also, few times where I've had to not use them for a while (travelling and was very stupid not to bring enough), the withdrawal effects were quite strange and somewhat scary. 

I also feel they had some very strange positive effects. Like I think they made my reaction time improve by quite a bit. Although it could be something random coinciding with starting SSRIs. Or just me being confused. I haven't tested it. On humanbenchmark I score around the same now as I did in high school. But I feel like I can catch falling things with much better regularity, and this was an almost immediate effect after starting.

hmys20

I feel like the biggest issue with aligning powerful AI systems, is that nearly all the features we'd like these systems to have, like being corrigible, not being deceptive, having values aligned with ours etc, are properties we are currently unable to state formally. They are clearly real properties, like humans can agree on examples of non-corrigibility, misalignment, dishonest, when shown examples of actions AIs could take. But we can't put them in code or a program specification, and consequently can't reason about them very precisely, test whether systems have them or not etc

One reason I'm very bullish on mechinterp is that it seems like the only natural pathway towards making progress on this. Transformers trained with RLHF do have "tendencies" and proto-values in a sense, figuring out how those proto-desires are represented internally, really understanding it, I believe will shed a lot of light on how values form in transformers, will necessarily entail getting a solid formal framework for reasoning aobut these processes, and will put the notions of alignment on much firmer ground. Same goes for the other features. Models already show deceptive tendencies. In the process of developing deep mechinterp understanding of that, I believe we'd gain better understanding of how deception in a neural net can be modeled formally, which would allow us to reason about it infinitely better.

(I mean, someone 300IQ might come along and just galaxy brain all this from first principles, but quite galaxy brained people have tried already.. The point is that if mechinterp was developed to a sophisticated enough level, in addition to all the good things listed already, it would shed a lot of conceptual clarity on many of the key notions, which we are currently stuck reasoning about on an informal level, and I think we will get there through incremental progress, without having to hope someone just figures it out by thinking really hard and having an einstein-tier insight).

Load More