Are we sure that these questions aren’t in their datasets? I don’t think we can be. First off, you just posted them online.
Questions being online is not a bad thing. Pretraining on the datapoints is very useful, and does not introduce any bias; it is free performance, and everyone should be training models on the questions/datapoints before running the benchmarks (though they aren't). After all, when a real-world user asks you a new question (regardless of whether anyone knows the answer/label!), you can... still train on the new question then and there...
You can see it as an example of 'alpha' vs 'beta'. When someone asks me about the value of someone as a guest, I tend to ask: "do they have anything new to say? didn't they just do a big interview last year?" and if they don't but they're big, "can you ask them good questions that get them out of their 'book'?" Big guests are not necessarily as valuable as they may seem because they are highly-exposed, which means both that (1) they have probably said everything they will said before and there is no 'news' or novelty, and (2) they are message-disciplined a...
I agree with all of this! But I'm not sure I understand what you mean by "there may be mediation, but only in a weak sense". We were just interested in studying how models naturally learn in this RL setting
I am emphasizing that to me, this current mediation learning looks fragile and temporary, and is not a solid, long-term 'natural' thing - it is learning, but only as a temporary artificial heuristic that would wash away in the long run with more training or more diverse tasks etc.
My expectation is that in the limit, a model will learn to focus only on...
I would not believe that unless you have done a simulation study with the small n of this study, plausible levels of measurement error (alcoholism being much harder to measure than weight or body fat), with about a dozen covariates (to correspond to the different ways to slice the patients and threshold BMI etc), and then shown that you hardly ever get a false negative like this. My experience with doing such power analysis simulation studies for other things inclines me to think that people greatly overestimate how informative such small studies are once you allow for plausible levels of measurement error and (reverse) p-hacking degrees of freedom.
I don't think that study shows much either way: too small and underpowered to show much of anything (aside from the attrition undermining internal validity).
Dynomight's primary criticism doesn't hold much water because it is (un-pre-registered) reverse p-hacking. If you check enough covariates, you'll find a failure of randomization to balance on some covariate, and you can, if you wish, tell a post hoc story about how that is actually responsible for the overall mean difference. Nevertheless, randomization works, because on average why would any particular covariate be the way in which the confounding is mediated?
Just have to wait for more studies.
But did it inspire them to try to stop CelestAI or to start her? I guess you might need some more drinks for that one...
It's worth mentioning in this context that one of the most remarkable things about the recent wave of GLP-1/GIP drugs is that they seem to have large benefits on, for lack of a better word, willpower and psychiatry. Nor was this expected or predicted AFAIK, or clearly linked solely to the weight-loss: the justification in the animal experiments and early human trials were based purely on physiology and then the human diabetics reporting they felt a less hungry. So this is quite remarkable, and part of why GLP-1/GIP drugs are one of the best things to happe...
Do you think that these drugs significantly help with alcoholism (as one might posit if the drugs help significantly with willpower)? If so, I'm curious what you make of this Dynomight post arguing that so far the results don't look promising.
I don't think it's weird. Given that we know there are temporal trends towards increasing parameter size (despite Chinchilla), FLOPs, data, and continued progress in compute/data-efficiency (with various experience curves), any simple temporal chart will tend to show an increase unless you are specifically conditioning or selecting in some way to neutralize that. Only if you had measured and controlled for all that and there was still a large unexplained residual of 'time' would you have to start reaching for other explanations such as 'divine benevolence'...
In reality, we observe that roughly 85% of recommendations stay the same when flipping nationality in the prompt and freezing reasoning traces. This suggests that the mechanism for the model deciding on its recommendation is mostly mediated through the reasoning trace, with a smaller less significant direct effect from the prompt to the recommendation.
This might be less convincing than it seems, because the simple interpretation of the results to me seems to be something like, "the inner-monologue is unfaithful because in this setting, it is simply gene...
You would also expect that the larger models will be more sample-efficient, including at in-context learning of variations of existing tasks (which of course is what steganography is). So all scale-ups go much further than any experiment at small-scale like 8B would indicate. (No idea what 'medium-scale' here might mean.)
One possible interpretation here is going back to the inner-monologue interpretations as being multi-step processes with an error rate per step where only complete success is useful, which is just an exponential; as the number of steps increase from 1 to n, you get a sigmoid from ceiling performance to floor performance at chance. So you can tell the same story about these more extended tasks, which after all, are just the same sort of thing - just more so. We also see this sort of sigmoid in searching with a fixed model, in settings like AlphaZero in Hex...
While it's not possible to counter-signal with a suit in Japan, I feel the equivalent would be to wear traditional clothing like a samue or jinbei, which have their own set of challenges.
Yep. It can be pretty funny watching the contexts in which you can get away with a happi coat or a kimono/yukata; I can only speak from Japanese media rather than personal experience, but one thing I've noticed is that it seems a non-retired man wearing a kimono can still get away with it today as long as they are a sufficiently accomplished humanist or literary scholar...
Though now that I think about it more, presumably once someone has been captured the next thing you'd get them to do is spend a lot of time staring at a region of the sky that will reprogram them in more sophisticated ways. So maybe the normal glitchers in my story are unrealistically incompetent.
That was what I was thinking, yes. "A pact would normally allow voluntary communication to be initiated with the AIs, so any glitcher which had been successfully attacked would have simply communicated back to its masters, either downloading new instructions &a...
I never heard of that, do you have examples?
My local gym has posted rules which include an explicit ban on perfume. (They don't use the exact term 'scent-free' but I assume it is an example of what OP means.)
Not that they enforce it, or even could enforce it; but I am reminded that rule exists every so often a woman (and it's always a woman) walks past me when I'm there at night, and I am suddenly hit by the smell (especially as I don't think of myself as being particularly perceptive nose-wise and I don't usually notice how people smell), and I wonder ...
I've never seen that SpongeBob gag either. But Mr Bean is a real person and people do have perfume sensitivities and allergic reactions. (My father had an ugly clash at work with one woman who apparently wore a lot of perfume and he was convinced was causing him headaches and other problems.)
I saw Mr. Bean nearly die at the perfume counter.
To clarify, this is a fictional example and not a personal anecdote in desperate need of unpacking: https://mrbean.fandom.com/wiki/The_Return_of_Mr._Bean#Act_Two:_Shopping
if you take his general 'cognitive strategy' and just power it with worse thinking, you get really bad results.
I call this "retired physicist syndrome", after the classic SMBC cartoon: https://www.smbc-comics.com/comic/2012-03-21
Climate models aren't reductionist enough!
We can recreate the first language using statistics!
Awoooo! Oncology is doing it wrong!
It can be bad to be Open but not also smart. (You could also call this "emeritus professor syndrome" in many cases.)
A hack like that would just have other EDT failure modes: instead of confabulating evidence from my dataset or personal examples, it might just confabulate references. "Yes, this was predicted by Foo et al 1990, and makes perfect sense."
I think it's substantially better and calmer - even just the thumbnails look calmer now.
I still think you are going a bit overboard on visual complexity, things like slashed-zeros aren't bad (I like them), just too much of a good thing and using up your visual complexity budget where there may be a better deal elsewhere: the question I ask myself is, "do I want to look at this for the next 20 years? If I add X, will it 'look painfully 2025' in a few years?" Elements which don't seem excessive in the excitement of implementation may, with the passage of tim...
It’s about grief, with central metaphors that add exactly zero to anyone’s aesthetic understanding of grief (stuff being underground, things not staying buried)
It is about grief, but it didn't have to be. This would've been more obvious if I could've shown you the session, but I'll copy it out:
...2. Brainstorming Ideas:
- A child confronting a local superstition after witnessing something traumatic.
- A funeral narrated by an animal's perspective.
- A celebrity’s fall from grace caught on live camera.
- A girl who collects superstitions until one unexpectedly c
I think it's something of a trend relating to a mix of 'tools for thought' and imitation of some websites (LW2, Read The Sequences, Asterisk, Works in Progress & Gwern.net in particular), and also a STEM meta-trend arriving in this area: you saw this in security vulnerabilities where for a while every major vuln would get its own standalone domain + single-page website + logo + short catchy name (eg. Shellshock, Heartbleed). It is good marketing which helps you stand out in a crowded ever-shorter-attention-span world.
I also think part of it is that it ...
I read the 'stars' as simply very dense low-orbiting satellites monitoring the ground 24/7 for baseline humans to beam low-latency optical propaganda at. The implied King's Pact presumably is something like, "the terrestrial Earth will be left unmodified and no AI are allowed to directly communicate or interact with or attempt to manipulate baseline humans", and so satellites, being one-way broadcasts outside the Earth, don't violate it. This then allows the bootstrap of all the other attacks: someone looks up at night long enough, they get captured, start...
Nice, that's almost exactly how I intended it. Except that I wasn't thinking of the "stars" as satellites looking for individual humans to send propaganda at (which IMO is pretty close to "communicating"), but rather a network of satellites forming a single "screen" across the sky that plays a video infecting any baseline humans who look at it.
In my headcanon the original negotiators specified that sunlight would still reach the earth unimpeded, but didn't specify that no AI satellites would be visible from the Earth. I don't have headcanon explanations fo...
Ice-nucleating bacteria: https://www.nature.com/articles/ismej2017124 https://www.sciencefocus.com/planet-earth/bacteria-controls-the-weather
If you can secrete the right things, you can potentially cause rain/snow inside clouds. You can see why that might be useful to bacteria swept up into the air: the air may be a fine place to go temporarily, and to go somewhere, but like a balloon or airplane, you do want to come down safely at some point, usually somewhere else, and preferably before the passengers have begun to resort to cannibalism. So given that ev...
No, it is on the ChatGPT end. I was surprised since I can't recall ever seeing that before. The usual share-button pops up the share box, but with the red-background message
This shared link has been disabled by moderation.
I don't know if it's perhaps the copyrighted stories (given the Bing search engine integration, entirely possible for these December stories to show up and be flagged) or some of the content, haven't cared enough to try to ablate it because the exact text of the session isn't terribly important here IMO - you see the prompt, you see the final result, you get the idea.
I generally agree that r1's fiction is not that great and tends to a simple-minded 'edgelord' vibe with lots of portentous phrases that fall apart on genuine reading, but I feel like you didn't give Deepseek-r1 a fair shot at all here. You don't describe your prompt but I'm guessing it was something very simple like "write a flash fiction story of at least 500 words". No description of goals, no requirements, no planning, no editing or revision... no humans writes the way you expect the LLM to. Especially given that this is for short fiction, a much more r...
The diminishing returns isn't too surprising, because you are holding the model size fixed (whatever that is for Houdini 3), and the search sigmoids hard. Hence, diminishing returns as you jump well past the initial few searches with the largest gains, to large search budgets like 2k vs 4k (and higher).
This is not necessarily related to 'approaching perfection', because you can see the sigmoid of the search budget even with weak models very far from the known oracle performance (as well as stronger models); for example, NNs playing Hex: https://arxiv.org/p...
Well, if you want to try to use video game playing as a measure of anything, it's worth noting that his preferences have, fairly recently, shifted from strategy games (the original Civilization when younger, but even as of 2020--2021, he was still playing primarily strategy games AFAICT from Isaacson & other media coverage - specifically, being obsessed with Polytopia) to twitch-fests like Elden Ring or Path of Exile 2... and most recently, he's infamously started cheating on those too.
Could just be aging or lack of time, of course.
colonizing Himalayan mountain slopes
One way to tell that you're at the edge of viability for actual living at this point, as opposed to simply passing through or enduring it until better conditions arise, is that Antarctic mountain slopes appear to be completely sterile and free of microbes:
...We analyzed 204 ice-free soils collected from across a remote valley in the Transantarctic Mountains (84–85°S, 174–177°W) and were able to identify a potential limit of microbial habitability. While most of the soils we tested contained diverse microbial communiti
The radiator story might be real, apparently. I was reading a random review of an Astounding issue (November 1944) and was surprised to see this part:
..."Time for a Universe" by R. S. Richardson looks at how the age of the universe has been calculated by various means (expansion of the universe, uranium clock, dynamics of clusters, and statistics of binaries) and the differences in the results.
There is also a good anecdote about the necessity of being cautious about data:
There is a story told about Robert lvirchoff [presumably Gustav Kirchhoff, with whom
but I expect that the RLHFed models would try to play the moves which maximize their chances of winning
RLHF doesn't maximize probability of winning, it maximizes a mix of token-level predictive loss (since that is usually added as a loss either directly or implicitly by the K-L) and rater approval, and god knows what else goes on these days in the 'post-training' phase muddying the waters further. Not at all the same thing. (Same way that a RLHF model might not optimize for correctness, and instead be sycophantic. "Yes master, it is just as you say!") I...
Given the Superalignment paper describes being trained on PGNs directly, and doesn't mention any kind of 'chat' reformatting or encoding metadata schemes, you could also try writing your games quite directly as PGNs. (And you could see if prompt programming works, since PGNs don't come with Elo metadata but are so small a lot of them should fit in the GPT-4.5 context window of ~100k: does conditioning on finished game with grandmaster-or-better players lead to better gameplay?)
I agree: if you've ever played any of the Pokemon games, it's clear that a true uniform distribution over actions would not finish any time that a human could ever observe it, and the time would have to be galactic. There are just way too many bottlenecks and long trajectories and reset points, including various ways to near-guarantee (or guarantee?) failure like discarding items or Pokemon, and if you've looked at any Pokemon AI projects or even just Twitch Plays Pokemon, this becomes apparent - they struggle to get out of Pallet Town in a reasonable time, never mind the absurdity of playing through the rest of game and beating the Elite Four etc, and that's with much smarter move selection than pure random.
I think this implies that we collectively have no more than about 50 * 8 = 400 billion bits per second of control over the world.
I don't know how to think about this statement. Should I find this a 'small' number or unimpressive in some respect?
This allows for each second, picking a possibility out of 'no more than about' 2400,000,000,000 high-level possibilities, which is a possibility-space so large I don't think I can write it out in decimal without crashing LW2 or hitting size limits. (GHCi tries to evaluate it but I killed it after a bit when the R...
OP didn't say secret, it just said 'many facts'. I took it as a reference to the super-knowledgeability of LLMs: similar to how SAT vocab tests work because it is difficult for dumb people to know so many words that a random selection of rare words will include the few they happen to know. The Internet is stuffed with archaic memes, jokes, web pages, images, and writings that possibly no human alive could recognize, much less quote... but LLMs, trained on trillions of words scraped indiscriminately from every corner of the Internet accessible to crawlers, ...
I didn't mean Marcus had said anything about Sabine. What I meant by "whose expertise has little to do with AI (nor is regarded as such like a Gary Marcus)" is that 'a Gary Marcus' is 'regarded as' having 'expertise [much] to do with AI' and that is why, even though Marcus has been wrong about pretty much everything and has very little genuine expertise about AI these days, ie. DL scaling (and is remarkably inept at even the most basic entry-level use of LLMs) and his writings are intrinsically not worth the time it takes to read them, he is still popular ...
I have misgivings about the text-fragment feature as currently implemented. It is at least now a standard and Firefox implements reading text-fragment URLs (just doesn't conveniently allow creation without a plugin or something), which was my biggest objection before; but there are still limitations to it which show that a lot of what the text-fragment 'solution' is, is a solution to the self-inflicted problems of many websites being too lazy to provide useful anchor IDs anywhere in the page. (I don't know how often I go to link a section of a blog post, w...
I suspect part of it might just be a latent preference on LessWrong for the sort of lengthy blog posts in a style they're accustomed to, which is valid, but a tendency to presume the same sort of info they like being exposed to but delivered in a different way means it must be lower quality
You wrote a low quality summary of a low quality secondary-source video of no particular importance by a talking head whose expertise has little to do with AI (nor is regarded as such like a Gary Marcus), about events described more informatively in other secondary so...
It may be a broader effect of media technology & ecosystem changes: https://gwern.net/note/fashion#lorenz-spreen-et-al-2019
The really interesting question is, while you would generally expect old eminent figures to gradually decay (how often do you really need to cite Boethius these days?) and so I'm not surprised if you can find old eminent figures who are now in decline, are they being replaced by new major figures in an updated canon and eg. Ibram X. Kendi smoothly usurps Foucault, or just sorta... not being replaced at all and citations chaotically...
I wouldn't wear a suit everywhere. I live on the West Coast of the USA, which is very casual. That makes wearing a suit a fashion statement. If I wore a suit in Japan, then it wouldn't look like I'm making a fashion statement. It would look like I just got off of work and didn't have time to change.
Demonstrating that wearing a suit in some contexts is a thing you can't countersignal. I'm reminded of the classic Onion article, "Why Can't Anyone Tell I'm Wearing This Business Suit Ironically?"
That looks pretty sensible overall, thanks.
You can see what looks like a fairly clear anti-pattern of switching languages/scripts, and the glitch-tokens may help explain the apparent patternness of the repetition in the non-token-split visualization: if LLaMA has " Хронологија" as a glitch-token, it may literally be unable to see that it's repeating a token by writing the apparently-patterned " Хронологија| Хронологија". Then it's not surprising if there are occasional repeats or 'too many' glitch-tokens (either birthday paradox as you scan over the sample...
"Overtraining" isn't Chinchilla; Chinchilla is just "training". The overtraining being advocated was supra-Chinchilla, with the logic that while you were going off the compute-optimal training, sure, you were more than making up for it by your compute-savings in the deployment phase, which the Chinchilla scaling laws do not address in any way. So there was a fad for training small models for a lot longer.
The pondering happens in earlier layers of the network, not in the output
Then how does it produce any tokens...?
then training on task Y could inadvertently bias the model to do more or less pondering on mostly-unrelated-but-statistically-correlated topic X.
But if that is what is going on and it accidentally learns to ponder initially due to bogus feedback or error, eventually the spurious correlation should be figured out by the model doing the pondering more, but it not increasing reward, and so it gets unlearned.
...(Also, this assumes that RL gives
This idea could very well be wrong. The gradients may be weakened during backpropagation before they get to the unrelated ideas, because the ideas did not directly contribute to the task.
Under a straightforward RLHF using PPO, I think there wouldn't be much weakening because the REINFORCE operator conceptually simply rewards (or punishes) all tokens generated during an episode, without making much attempt to decide which were 'good' or 'bad'. (That's why it's so high variance.) Any advantage function trying to remove some of the variance probably won't ...
Maybe it would look more random if you presented it segmented by token instead of translated into characters? I'm not familiar with the LLaMA tokenizations, but you seem to imply that a lot of the apparent patterns here are single tokens (like "partiellement" would be very surprising to me as the output of a greedy likelihood-minimizing sampling, but is trivial if it is a single BPE token). This would create a misleading impression of coherence.
Also, as Baginski notes, greedy sampling to minimize likelihood will not minimize total likelihood any more than ...
Note that text in pretraining may even be an expensive way to go about it: one of the most dramatic demonstrations MS gave us with Sydney was the incredible speed & efficiency of web-search-powered adversarial attacks on LLMs. You don't need to dump a lot of samples onto the Internet and pray they make it into the training data and don't get forgotten, if you can set up a single sample with good SEO and the LLM kindly retrieves it for you and attacks itself with your sample.
This is something to think about: it's not just making it into the training dat...
Why not just 'valuable information', in a Value of Information sense of 'valuable'?
The estimate of the compute of their largest version ever (which is a very helpful way to phrase it) at only <=50x GPT-4 is quite relevant to many discussions (props to Nesov) and something Altman probably shouldn't've said.
The estimate of test-time compute at 1000x effective-compute is confirmation of looser talk.
The scientific research part is of uncertain importance but we may well be referring back to this statement a year from now.
Apropos of very low-latency LLMs and revisiting this topic a little: what does this imply about DRL robotics, rather than animals? Will DRL NNs have to have brains as big as humans in order to run superhuman humanoid robots?
One possible implication is that Portia-like NNs are possible for robotics in general. Robotics may be quite 'easy' in that sense.
It is striking that when we look at NN parameter/FLOPS-counts, we generally do not see 'large' robotics, vision, or sound models, but LLMs; the largest pure-vision models like PaLI-X are <100b-parameters, ...
Probably https://gwern.net/newsletter/2021/05#master-synthesis
That's what makes it alpha! If he was as legible as, say, Hinton, he would be mined out by now, and nothing but beta. (Similar situation to Schmidhuber - 'obvious crackpot' - although he's such a self-promoter that he overcomes it, and so at thi... (read more)