You're being rude and not engaging with my points.
I think you're assuming these minds are more similar to human minds than they necessarily are. My point is that there's three cases wrt alignment here.
In the first we're fine, ev...
No offense, but I feel you're not engaging with my argument here. Like if I were to respond to your comment I would just write the arguments from the above post again.
I agree that we should give more resources towards AI welfare, and dedicate more resources towards figuring out their degree of sentience (and whatever other properties you think are necessary for moral patient-hood).
That said, surely you don't think this is enough to have alignment? I'd wager that the set of worlds where this makes or breaks alignment is very small. If the AI doesn't care about humans for their own sake, them growing more and more powerful will lead to them doing away with humans, whether humans treat them nicely or not. If they robustly ...
I specifically disagree with the IQ part and the codeforces part. Meaning, I think they're misleading.
IQ and coding ability are useful measures of intelligence in humans because they correlate with a bunch of other things we care about. Not to say its useless to measure "IQ" or coding ability in LLMs, but presenting like they mean anything like what they mean in humans is wrong, or at least will give many people reading it the wrong impression.
As for the overall point of this post. I roughly agree? I mean, I think the timelines are not too unreasonab...
Comparing IQ and codeforces doesn't make much sense. Please stop doing this.
Attaching IQs to LLMs makes even less sense. Except as a very loose metaphor. But please also stop doing this.
That's not right. You could easily spend a billion dollars just on better evals and better interpretability.
For the real alignment problem, the fact that 0.1 bill a year hasn't yielded returns, doesn't mean 100 billion won't. It's one problem. No one has gotten much traction on it. You'd expect it to look like a step function, not a smooth curve.
I don't really understand. Why wouldn't you just test to see if you are deficient in things?
I did that, and I wasn't deficient in anything.
I've also (somewhat involuntarily) done the thing you suggest, and I unsurprisingly didn't notice any difference. If anything, I feel a lot better on a vegan diet.
If you want to do the thing hes suggesting here, I'd recommend eating bivalves, like blue mussels or oysters. They are very unlikely to be sentient, they are usually quite cheap, they contain the nutrients you'd be at risk of becoming deficient in as a vegan, and other beneficient things like DHA.
I think for the fundraiser, Lightcone should sell (overpriced) lw hoodies. Lesswrong has a very nice aesthetic now, and while this is probably a byproduct of a piece of my mind I shouldn't encourage, I find it quite appealing to buy a 450$ lw hoodie, even though I don't have that much money. I'd probably not donate to the fundraiser otherwise. And if I did, I'd donate less than the margins on such a hoodie would be.
People seem to disagree with this comment. There's two statements and one argument in it
What are people disagreeing with? Is it mostly the former? I think the latter is rather clear. I'm very confident it is true. Both the argument and the conclusion. The former, I'm quite confident is true as well (~90% ish?), but only for my set of values.
https://bsky.app/profile/hmys.bsky.social/post/3lbd7wacakn25
I made one. A lot of people are not here, but many people are.
Seems unlikely to me. I mean, I think, in large part due to factory farming, that the current immediate existence of humanity, and also its history, are net negatives. The reason I'm not a full blown antinatalist is because these issues are likely to be remedied in the future, and the goodness of the future will astronomically dwarf the current negativity humanity has and is bringing about. (assuming we survive and realize a non-negligible fraction of our cosmic endowment)
The reason I think this is, well, the way I view it, its an immediate corollary of th...
I agree with this analysis. I mean, I'm not certain further optimization will erode the interpretability of the generated CoT, its possible the fact its pretrained to use human natural language pushes it in a stable equilibrium, but I don't think so, there are ways the CoT can become less interpretable in a step-wise fashion.
But this is the way its going, seems inevitable to me. Just scaling up models and then training them on English language internet text, is clearly less efficient (from a "build AGI" perspective, and from a profit-perspective) than trai...
I just meant not primarily motivated by truth.
I think this is a really bad article. So bad that I can't see it not being written with ulterior motives.
1. Too many things are taken out of context, like "the feminists are literally voldemort" quote.
2. Too many things are paraphrased in dishonest and ridiculously over the top ways. Like saying Harris has "longstanding plans to sterilize people of color", before a quote that just says she wants to give birth control to people in Haiti.
3. Offering negative infinity charity in every single area. In the HBD email, Scott says he thinks neoreactionaries create...
But the probability? :O
What is the probability they intentionally fine tuned to hide canary contamination?
Seems like an obviously very silly thing to do. But with things like the NDA, my priors on oai being deceptive to their own detriment is not that low.
I'm pretty sure it wouldn't forget the string.
In my experience, the results are quite quick and its interesting to remember your dreams. The time it takes is ~10 minutes a day.
I'm not gonna say it doesn't take any effort. It can be hard to to it if you are tired in the morning, but I disagree with the characterization that it takes "a lot" of effort.
Outside of studying/work, I exercise every day, do anki cards every day, and try to make a reasonably healthy dinner every day. Each of those activities individually take ~10x the cognitive effort and willpower that dream journaling does. (for me)
Maybe I'm a unique example, but none of this matches my experience at all.
I was able to have lucid dreams relatively consistently just by dream journaling and doing reality checks. WILD was quite difficult to do, because you kind of have to walk a tight balance, where you keep yourself in a half-asleep state while carrying out instructions that requite a fair bit of metacognitive awareness, but once you get the hang of it, you can do that pretty consistently as well, without much time commitment.
That lucid dreams don't offer much more than traditiona...
Can't you just keep a dream journal? I find if I do that consistently right upon waking up, I'm able to remember dreams quite well.
I've used SSRIs for maybe 5 years, and I think they've been really useful, with no negative effects, and more or less unwavering efficacy. The only exception is that they've non-negligibly lowered my libido. But to be honest, I don't mind it that much.
Also, few times where I've had to not use them for a while (travelling and was very stupid not to bring enough), the withdrawal effects were quite strange and somewhat scary.
I also feel they had some very strange positive effects. Like I think they made my reaction time improve by quite a bit. Alt...
I feel like the biggest issue with aligning powerful AI systems, is that nearly all the features we'd like these systems to have, like being corrigible, not being deceptive, having values aligned with ours etc, are properties we are currently unable to state formally. They are clearly real properties, like humans can agree on examples of non-corrigibility, misalignment, dishonest, when shown examples of actions AIs could take. But we can't put them in code or a program specification, and consequently can't reason about them very precisely, test whether sys...
https://www.richardhanania.com/p/if-scott-alexander-told-me-to-jump
Other people were commending your tabooing of words, but I feel using terms like "multi-layer parameterized graphical function approximator" fails to do that, and makes matters worse because it leads to non-central fallacy-ing. It'd been more appropriate to use a term like "magic" or "blipblop". Calling something a function appropriator leads to readers carrying a lot of associations into their interpretation, that probably don't apply to deep learning, as deep learning is a very specific example of function approximation, that deviates from the prototypic...
That deviates from the prototypical examples in many respects.
It basically proves too much because it's equivocation. I am struggling to find anything in Zack's post which is not just the old wine of the "just" fallacy in new 'function approximation' skins. When someone tells you that a LLM is "just" next token prediction, or a neural network is "just some affine layers with nonlinearities" or it's "just a Markov chain with a lot of statistics", then you've learned more about the power and generality of 'next token prediction' etc than you have what the...
Great post. I agree with the "general picture", however, the proposed argument for why LLMs have some of these limitations, seems to me clearly wrong.
You're totally right - I knew all of the things that should have let me reach this conclusion, but I was still thinking about the residual stream in the upwards direction on your diagram as doing all of the work from scratch, just sort of glancing back at previous tokens through attention, when it can also look at all the previous residual streams.
This does invalidate a fairly load-bearing part of my model, in that I now see that LLMs have a meaningful ability to "consider" a sequence in greater and greater depth as its length grows - so they should ... (read more)