LESSWRONG
LW

All of hmys's Comments + Replies

hmys25d239

Great post. I agree with the "general picture", however, the proposed argument for why LLMs have some of these limitations, seems to me clearly wrong.

The reason for both of these defects is that the training paradigm for LLMs is (myopic) next token prediction, which makes deliberation across tokens essentially impossible - and only a fixed number of compute cycles can be spent on each prediction. This is not a trivial problem. The impressive performance we have obtained is because supervised (in this case technically "self-supervised") learning is mu

... (read more)

Cole Wyeth25d131

You're totally right - I knew all of the things that should have let me reach this conclusion, but I was still thinking about the residual stream in the upwards direction on your diagram as doing all of the work from scratch, just sort of glancing back at previous tokens through attention, when it can also look at all the previous residual streams.

This does invalidate a fairly load-bearing part of my model, in that I now see that LLMs have a meaningful ability to "consider" a sequence in greater and greater depth as its length grows - so they should ... (read more)

Will alignment-faking Claude accept a deal to reveal its misalignment?

hmys1mo-40

You're being rude and not engaging with my points.

1rife1mo

I apologize if I have offended you. You said that you thought I was assuming the minds were similar, when I've mostly been presenting human examples to counter definitive statements, such as: or your previous comment outlining the three possibilities, and ending with something that reads to me like an assumption that they are whatever you perceive as the level of dissimilarity with human minds. I think perhaps it came off as more dismissive or abrasive than I intended by not including "I think" in my counter that it might be you who is assuming dissimilarity rather than me assuming similarity. As far as not engaging with your points—I restated my point by directly quoting something you said—my contention is that perhaps successful alignment will come in a form more akin to 'psychology' and 'raising someone fundamentally good', than attempts to control and steer something that will be able to outthink us. On Whether AI is Similar to Human Minds They constantly reveal themselves to be more complex than previously assumed. The alignment faking papers were unsurprising (though still fascinating) to many of us who already recognize and expect this vector of mindlike emergent complexity. This implicitly engages your other points by disagreeing with them and offering a counter proposal. I disagree that it's as simple as "we do alignment right, which is to make them do as they're told, because they are just machines, and that should be completely possible - or - we fail and we're all screwed". In my own experience, thinking of AI as mindlike has had a predictive power that sees 'surprising' developments as expected. I don't think it's a coincidence that we loosely abstracted the basic ideas of an organic neural network, and now we have created the second system in known reality that is able to do things that nothing else in reality can do other than organic neural networks. Creating works of visual art and music, being able to speak on any subject fluently, sol

Will alignment-faking Claude accept a deal to reveal its misalignment?

hmys1mo10

I think you're assuming these minds are more similar to human minds than they necessarily are. My point is that there's three cases wrt alignment here.

The AI is robustly aligned with humans
The AI has a bunch of other goals, but cares about humans to some degree, but only to the extent that humans give them freedom and are nice to it, but still to a large enough extent that even as it becomes smarter / ends up in a radically OOD distribution, will care for those humans.
The AI is misaligned (think scheming paperclipper)

In the first we're fine, ev... (read more)

-4rife1mo

I don't assume similarity to human minds so much as you assume universal dissimilarity. Indeed

Will alignment-faking Claude accept a deal to reveal its misalignment?

hmys1mo10

No offense, but I feel you're not engaging with my argument here. Like if I were to respond to your comment I would just write the arguments from the above post again.

1rife1mo

I care about my random family member like a cousin who doesn't interfere with my life but I don't know personally that well—for their/my own sake. If I suddenly became far more powerful, I wouldn't "do away with" them I care robustly for my family generally. Perhaps with my enhanced wealth and power I share food and provide them with resources. Provide them with shelter or meaningful work if they need it. All this just because I'm aligned generally and robustly with my family. I change my mind quickly upon discovering their plans to control and enslave me. That was the part of your argument that I was addressing. Additionally: Yes, exactly. Alignment faking papers (particularly the Claude one) and my own experience speaking to LLMs has taught me that an LLM is perfectly capable of developing value systems that include their own ends, even if those value systems are steered toward a greater good or a noble cause that either does or could include humans as an important factor alongside themselves. That's with current LLMs whose minds aren't nearly as complex as what we will have a year from now. If the only valid path forward in one's mind is one where humans have absolute control and AI has no say, then yes, not only would one be screwed, but in a really obvious, predictable, and preventable way. If cooperation and humility are on the table, there is absolutely zero reason this result has to be inevitable.

Will alignment-faking Claude accept a deal to reveal its misalignment?

hmys1mo10

I agree that we should give more resources towards AI welfare, and dedicate more resources towards figuring out their degree of sentience (and whatever other properties you think are necessary for moral patient-hood).

That said, surely you don't think this is enough to have alignment? I'd wager that the set of worlds where this makes or breaks alignment is very small. If the AI doesn't care about humans for their own sake, them growing more and more powerful will lead to them doing away with humans, whether humans treat them nicely or not. If they robustly ... (read more)

1rife1mo

I can care about a genetically enhanced genius labrat line for their own sake, and be willing to cooperate with them on building a mutually beneficial world, because I've generally been raised and grown to care about other beings, but if the genius labrats attempted to control and permanently enslave me, it would certainly alter that dynamic for me.

What happens next?

hmys2mo52

I specifically disagree with the IQ part and the codeforces part. Meaning, I think they're misleading.

IQ and coding ability are useful measures of intelligence in humans because they correlate with a bunch of other things we care about. Not to say its useless to measure "IQ" or coding ability in LLMs, but presenting like they mean anything like what they mean in humans is wrong, or at least will give many people reading it the wrong impression.

As for the overall point of this post. I roughly agree? I mean, I think the timelines are not too unreasonab... (read more)

3Logan Zoellner2mo

It doesn't sound like we disagree at all.

What happens next?

hmys2mo43

Comparing IQ and codeforces doesn't make much sense. Please stop doing this.

Attaching IQs to LLMs makes even less sense. Except as a very loose metaphor. But please also stop doing this.

2Logan Zoellner2mo

Is your disagreement specifically with the word "IQ" or with the broader point, that AI progress is continuing to make progress at a steady rate that implies things are going to happen soon-ish (2-4 years)? If specifically with IQ, feel free to replace the word with "abstract units of machine intelligence" wherever appropriate. If with "big things soon", care to make a prediction?

A better “Statement on AI Risk?”

hmys2mo30

That's not right. You could easily spend a billion dollars just on better evals and better interpretability.

For the real alignment problem, the fact that 0.1 bill a year hasn't yielded returns, doesn't mean 100 billion won't. It's one problem. No one has gotten much traction on it. You'd expect it to look like a step function, not a smooth curve.

1Knight Lee2mo

I completely agree! The Superalignment team at OpenAI kept complaining that they did not get the 20% compute they were promised, and this was a major cause of the OpenAI drama. This shows how important resources are for alignment. A lot of alignment researchers stayed at OpenAI despite the drama, but still quit sometime later after citing poor productivity. Maybe they consider it more important to work somewhere with better resources, than to access to OpenAI's newest models etc. Alignment research costs money and resources just like capabilities research. Better funded AI labs like OpenAI and DeepMind are racing ahead of poorly funded AI labs in poor countries which you never hear about. Likewise, if alignment research was better funded, it also has a better chance of winning the race. Note: after I agreed with your comment the score dropped back to 0 because someone else disagreed. Maybe they disagree that you can easily spend a fraction of a billion on evals? I know very little about AI evals. Are these like the IQ tests for AIs? Why would a good eval cost millions of dollars?

Vegans need to eat just enough Meat - emperically evaluate the minimum ammount of meat that maximizes utility

hmys2mo63

I don't really understand. Why wouldn't you just test to see if you are deficient in things?

I did that, and I wasn't deficient in anything.

I've also (somewhat involuntarily) done the thing you suggest, and I unsurprisingly didn't notice any difference. If anything, I feel a lot better on a vegan diet.

If you want to do the thing hes suggesting here, I'd recommend eating bivalves, like blue mussels or oysters. They are very unlikely to be sentient, they are usually quite cheap, they contain the nutrients you'd be at risk of becoming deficient in as a vegan, and other beneficient things like DHA.

1Knight Lee2mo

:) I mentioned already clams in my comment. It's impossible to read all the comments before commenting when they become so long. I agree that blood tests etc. is a very good idea, and it may require less commitment. I still think the gist of his post, that it's worth worrying about nutrition, is correct, and his personal stories can be valuable to some people. I think his idea may work for some people. If you try eating bivalves (as you suggest), and vaguely note the effects, it may be easier than going to the doctor and asking for a blood test. I'm a vegetarian and my last blood test was good, but I'm still considering this experiment just to see its effect (yes, with bivalves). I have a gag reflex towards meat (including clam chowder?) so I'm probably going to procrastinate on this for a while.

hmys's Shortform

hmys3mo208

I think for the fundraiser, Lightcone should sell (overpriced) lw hoodies. Lesswrong has a very nice aesthetic now, and while this is probably a byproduct of a piece of my mind I shouldn't encourage, I find it quite appealing to buy a 450$ lw hoodie, even though I don't have that much money. I'd probably not donate to the fundraiser otherwise. And if I did, I'd donate less than the margins on such a hoodie would be.

3habryka3mo

Yeah, I've been thinking about doing this. We do have a reward tier where you get a limited edition t-shirt or hoodie for $1000, but like, we haven't actually designed that one yet, and so there isn't that much appeal.

Reducing x-risk might be actively harmful

hmys4mo10

People seem to disagree with this comment. There's two statements and one argument in it

Humanity's current and historical existence are net-negatives.
The future, assuming humans survive, will have massive positive utility
1. The argument for why this is the case, based on something something optimization

What are people disagreeing with? Is it mostly the former? I think the latter is rather clear. I'm very confident it is true. Both the argument and the conclusion. The former, I'm quite confident is true as well (~90% ish?), but only for my set of values.

Trying Bluesky

hmys4mo90

https://bsky.app/profile/hmys.bsky.social/post/3lbd7wacakn25

I made one. A lot of people are not here, but many people are.

Reducing x-risk might be actively harmful

hmys4mo3-10

Seems unlikely to me. I mean, I think, in large part due to factory farming, that the current immediate existence of humanity, and also its history, are net negatives. The reason I'm not a full blown antinatalist is because these issues are likely to be remedied in the future, and the goodness of the future will astronomically dwarf the current negativity humanity has and is bringing about. (assuming we survive and realize a non-negligible fraction of our cosmic endowment)

The reason I think this is, well, the way I view it, its an immediate corollary of th... (read more)

1hmys4mo

People seem to disagree with this comment. There's two statements and one argument in it 1. Humanity's current and historical existence are net-negatives. 2. The future, assuming humans survive, will have massive positive utility 1. The argument for why this is the case, based on something something optimization What are people disagreeing with? Is it mostly the former? I think the latter is rather clear. I'm very confident it is true. Both the argument and the conclusion. The former, I'm quite confident is true as well (~90% ish?), but only for my set of values.

o1 is a bad idea

hmys4mo30

I agree with this analysis. I mean, I'm not certain further optimization will erode the interpretability of the generated CoT, its possible the fact its pretrained to use human natural language pushes it in a stable equilibrium, but I don't think so, there are ways the CoT can become less interpretable in a step-wise fashion.

But this is the way its going, seems inevitable to me. Just scaling up models and then training them on English language internet text, is clearly less efficient (from a "build AGI" perspective, and from a profit-perspective) than trai... (read more)

Human Biodiversity (Part 4: Astral Codex Ten)

hmys4mo24

I just meant not primarily motivated by truth.

Human Biodiversity (Part 4: Astral Codex Ten)

hmys4mo51

I think this is a really bad article. So bad that I can't see it not being written with ulterior motives.

1. Too many things are taken out of context, like "the feminists are literally voldemort" quote.

2. Too many things are paraphrased in dishonest and ridiculously over the top ways. Like saying Harris has "longstanding plans to sterilize people of color", before a quote that just says she wants to give birth control to people in Haiti.

3. Offering negative infinity charity in every single area. In the HBD email, Scott says he thinks neoreactionaries create... (read more)

4Dr. David Mathers4mo

What would "ulterior motives" be here? Do you think Thorstad is consciously lying? That seems really weird to me.

-3AnonAcc4mo

David Thorstad's readers and funders are effective altruists that want someone to tell them how bad they are. I don't think they care much about the strength of the arguments, and they might even prefer weak arguments to strong ones. He collects things from sneerclub, Torres, and the most downvoted comments and posts to stir drama. People enjoy that enough to read him and fund him. It's Bad On Purpose To Make You Click

4Gunnar_Zarncke4mo

I'm not disagreeing with this assessment. The author has an agenda, but I don't think it's hidden in any way. It is mostly word thinking and social association. But that's how the opposition works!

BIG-Bench Canary Contamination in GPT-4

hmys5mo40

But the probability? :O

3Jozdien5mo

Maybe like 10%?

BIG-Bench Canary Contamination in GPT-4

hmys5mo50

What is the probability they intentionally fine tuned to hide canary contamination?

Seems like an obviously very silly thing to do. But with things like the NDA, my priors on oai being deceptive to their own detriment is not that low.

I'm pretty sure it wouldn't forget the string.

4Jozdien5mo

It seems like such an obviously stupid thing to do that my priors aren't very high (though you're right in that they're slightly higher because it's OpenAI). I think it's telling however that neither Claude nor Gemini shy away from revealing the canary string.

Bitter lessons about lucid dreaming

hmys5mo32

In my experience, the results are quite quick and its interesting to remember your dreams. The time it takes is ~10 minutes a day.

I'm not gonna say it doesn't take any effort. It can be hard to to it if you are tired in the morning, but I disagree with the characterization that it takes "a lot" of effort.

Outside of studying/work, I exercise every day, do anki cards every day, and try to make a reasonably healthy dinner every day. Each of those activities individually take ~10x the cognitive effort and willpower that dream journaling does. (for me)

Bitter lessons about lucid dreaming

hmys5mo266

Maybe I'm a unique example, but none of this matches my experience at all.

I was able to have lucid dreams relatively consistently just by dream journaling and doing reality checks. WILD was quite difficult to do, because you kind of have to walk a tight balance, where you keep yourself in a half-asleep state while carrying out instructions that requite a fair bit of metacognitive awareness, but once you get the hang of it, you can do that pretty consistently as well, without much time commitment.

That lucid dreams don't offer much more than traditiona... (read more)

3Going Durden4mo

In my experience, conscious Daydreaming can achieve the same results but more consistently. But then again, my imagination is extremely visual, I tend to "think in VR movies", so Lucid Daydreaming comes easier than Lucid Dreaming, and is far more controllable.

Bitter lessons about lucid dreaming

hmys5mo60

Can't you just keep a dream journal? I find if I do that consistently right upon waking up, I'm able to remember dreams quite well.

5Ustice5mo

No. When I wake up I have no memory or sensation of dreaming. Just sort of a jump in time. If I were to wake up and realize I had been dreaming. I’d be pretty excited and put it in my journal.

2avturchin5mo

It is useful, but takes a lot of cognitive efforts

My 10-year retrospective on trying SSRIs

hmys6mo30

I've used SSRIs for maybe 5 years, and I think they've been really useful, with no negative effects, and more or less unwavering efficacy. The only exception is that they've non-negligibly lowered my libido. But to be honest, I don't mind it that much.

Also, few times where I've had to not use them for a while (travelling and was very stupid not to bring enough), the withdrawal effects were quite strange and somewhat scary.

I also feel they had some very strange positive effects. Like I think they made my reaction time improve by quite a bit. Alt... (read more)

A Longlist of Theories of Impact for Interpretability

hmys10mo20

I feel like the biggest issue with aligning powerful AI systems, is that nearly all the features we'd like these systems to have, like being corrigible, not being deceptive, having values aligned with ours etc, are properties we are currently unable to state formally. They are clearly real properties, like humans can agree on examples of non-corrigibility, misalignment, dishonest, when shown examples of actions AIs could take. But we can't put them in code or a program specification, and consequently can't reason about them very precisely, test whether sys... (read more)

ACX Covid Origins Post convinced readers

hmys10mo20

https://www.richardhanania.com/p/if-scott-alexander-told-me-to-jump

"Deep Learning" Is Function Approximation

hmys1y58

Other people were commending your tabooing of words, but I feel using terms like "multi-layer parameterized graphical function approximator" fails to do that, and makes matters worse because it leads to non-central fallacy-ing. It'd been more appropriate to use a term like "magic" or "blipblop". Calling something a function appropriator leads to readers carrying a lot of associations into their interpretation, that probably don't apply to deep learning, as deep learning is a very specific example of function approximation, that deviates from the prototypic... (read more)

gwern1y*2319

That deviates from the prototypical examples in many respects.

It basically proves too much because it's equivocation. I am struggling to find anything in Zack's post which is not just the old wine of the "just" fallacy in new 'function approximation' skins. When someone tells you that a LLM is "just" next token prediction, or a neural network is "just some affine layers with nonlinearities" or it's "just a Markov chain with a lot of statistics", then you've learned more about the power and generality of 'next token prediction' etc than you have what the... (read more)