I didn't mean Marcus had said anything about Sabine. What I meant by "whose expertise has little to do with AI (nor is regarded as such like a Gary Marcus)" is that 'a Gary Marcus' is 'regarded as' having 'expertise [much] to do with AI' and that is why, even though Marcus has been wrong about pretty much everything and has very little genuine expertise about AI these days, ie. DL scaling (and is remarkably inept at even the most basic entry-level use of LLMs) and his writings are intrinsically not worth the time it takes to read them, he is still popular and widely-regarded-as-an-expert and so it is useful to keep tabs on 'oh great, what's Marcus saying now that everyone is going to repeat for years to come?' You can read someone because they are right & informative, or you can read someone because they are wrong & uninformative but everyone else reads them; but you shouldn't read someone who is neither right nor read. So, you grit your teeth and wade into the Marcus posts that go viral...
I have misgivings about the text-fragment feature as currently implemented. It is at least now a standard and Firefox implements reading text-fragment URLs (just doesn't conveniently allow creation without a plugin or something), which was my biggest objection before; but there are still limitations to it which show that a lot of what the text-fragment 'solution' is, is a solution to the self-inflicted problems of many websites being too lazy to provide useful anchor IDs anywhere in the page. (I don't know how often I go to link a section of a blog post, where the post is written in a completely standard hierarchical table-of-contents way, and the headers turn out to be... nothing but <h2>
s with not an id=
anywhere in sight.) We would be a lot better off if pages had more meaningful IDs and selecting text did something like, pick the nearest preceding ID. (This could be implemented in LW2 or Gwern.net right now, incidentally. If the user selects some text, just search through the tree to find the first previous ID, and update the current browser-bar URL to URL#ID
.)
Hacking IDs onto an unwilling page, whose author neither knows nor cares nor can even find out what IDs are in use (or what they may be breaking by editing this or that word), is a recipe for long-term breakage: your archive.is example works simply because archive.is is an archive website, and the pages, in theory, never change (even though the original URLs certainly can, and often quite dramatically). That's less true for LW comments or articles. There are also downstream effects: text-fragments are long and verbose and can't be written by hand because they're trying to specify arbitrary ranges which are robust to corruption, and they are unwieldy to search. (How does a tool handle different hash-anchors in a URL? Most choose to define them as unique URLs different from each other... so what happens when two users selecting from the same section inevitably wind up selecting slightly different text ranges every time, and every user has a unique text-fragment anchor? Now suddenly every URL is unique - no more useful backlinks, no more consolidated discussions of the same URL, etc. And if the URL content changes, you don't get anything out of it. It's now just a bunch of trailing junk causing problems forever, like all that ?utm_foo_bar
junk.)
Somewhat like the fad for abusing #
for the stupid #!
JS thing (which pretty much everyone, Twitter included, came to regret), I worry that this is still a half-baked tech designed for a very narrow use case (Google's convenience in providing search results) where we don't know how well it will work in the wild long-term or what side-effects it will have. So I personally have been holding off on it and making a point of deleting those archive.is anchors.
I suspect part of it might just be a latent preference on LessWrong for the sort of lengthy blog posts in a style they're accustomed to, which is valid, but a tendency to presume the same sort of info they like being exposed to but delivered in a different way means it must be lower quality
You wrote a low quality summary of a low quality secondary-source video of no particular importance by a talking head whose expertise has little to do with AI (nor is regarded as such like a Gary Marcus), about events described more informatively in other secondary sources like Zvi's newsletter, where you added no original information or thought, and failed to follow up on basic details, like failing to name or link the study in the final item, and even pointing out how low-quality your own summary is & how you added nothing (despite praising your own "effort" repeatedly):
The linked video doesn't cover or summarize how 'confidence' or 'critical thinking' were operationalized in the study by Microsoft.
I do not think you really have to ask why your post is not being upvoted to the skies.
I wouldn't mind as much except nobody has explained why when I bothered putting in the effort...I'm just hoping you can offer insight into whether I should keep bothering with the effort of posts like this because I'm the one who's off here.
If you are spending a lot of "effort" on posts like this and you are upset by the reception, I would suggest that this sort of tertiary source writing is not your forte, and you are better off finding something that plays to your strengths (or is at least more your comparative advantage).
To be honest, I was surprised to read your comment complaining about your human effort not being appreciated, because I had assumed when reading it originally, that a post this derivative had to have been written by a low-end LLM whose use you had chosen to not disclose.
It may be a broader effect of media technology & ecosystem changes: https://gwern.net/note/fashion#lorenz-spreen-et-al-2019
The really interesting question is, while you would generally expect old eminent figures to gradually decay (how often do you really need to cite Boethius these days?) and so I'm not surprised if you can find old eminent figures who are now in decline, are they being replaced by new major figures in an updated canon and eg. Ibram X. Kendi smoothly usurps Foucault, or just sorta... not being replaced at all and citations chaotically swirling in fashions?
I've speculated that the effect of hyper-speed media like social media is to destroy the multi-level filtering of society, and the different niches wind up separating and becoming self-contained hermetic ecosystems. (Have you ever used a powerful stand mixer to mix batter and set it too high? What happens? Well, if the contents aren't liquid enough to flow completely at high speed, you tend to observe that your contents separate, and shear off into two or three different layers, rotating inside each other, with the inner layer spinning ultra-rapidly while the outer layer possibly becoming completely static and stuck to the sides of the mixing bowl. The inner layer is Tiktok, and the stuck outer layer is places like academia. The big fads and discoveries and trends in Tiktok spin around so rapidly and are forgotten so quickly that none of them ever 'make it out' to elsewhere.)
I wouldn't wear a suit everywhere. I live on the West Coast of the USA, which is very casual. That makes wearing a suit a fashion statement. If I wore a suit in Japan, then it wouldn't look like I'm making a fashion statement. It would look like I just got off of work and didn't have time to change.
Demonstrating that wearing a suit in some contexts is a thing you can't countersignal. I'm reminded of the classic Onion article, "Why Can't Anyone Tell I'm Wearing This Business Suit Ironically?"
That looks pretty sensible overall, thanks.
You can see what looks like a fairly clear anti-pattern of switching languages/scripts, and the glitch-tokens may help explain the apparent patternness of the repetition in the non-token-split visualization: if LLaMA has " Хронологија" as a glitch-token, it may literally be unable to see that it's repeating a token by writing the apparently-patterned " Хронологија| Хронологија". Then it's not surprising if there are occasional repeats or 'too many' glitch-tokens (either birthday paradox as you scan over the sample looking for any possible pattern, or the preceding context induces the same prediction as the LLM sort of 'skips over' the glitch-token as a blind spot and makes a similar prediction which results in the same glitch-token).
"Overtraining" isn't Chinchilla; Chinchilla is just "training". The overtraining being advocated was supra-Chinchilla, with the logic that while you were going off the compute-optimal training, sure, you were more than making up for it by your compute-savings in the deployment phase, which the Chinchilla scaling laws do not address in any way. So there was a fad for training small models for a lot longer.
The pondering happens in earlier layers of the network, not in the output
Then how does it produce any tokens...?
then training on task Y could inadvertently bias the model to do more or less pondering on mostly-unrelated-but-statistically-correlated topic X.
But if that is what is going on and it accidentally learns to ponder initially due to bogus feedback or error, eventually the spurious correlation should be figured out by the model doing the pondering more, but it not increasing reward, and so it gets unlearned.
(Also, this assumes that RL gives an average reward of 0.0, which I don't know if that's true in practice.)
I think the mean would be taken out by the advantage estimation, so the RLHF continues to increase the probability of tokens being generated from the episodes with above-average reward, and punish the probability of generating the tokens from the below-average reward episodes. This is in effect as if the average reward is always 0.
What would be the implications? The model could develop a political bias to think more deeply about topics related to party X, where X is whatever party has more users giving the model positive feedback. Even if the other topics on party X's agenda are never explicitly talked about (!)
That sounds like the pondering's conclusions are then related to the task.
This idea could very well be wrong. The gradients may be weakened during backpropagation before they get to the unrelated ideas, because the ideas did not directly contribute to the task.
Under a straightforward RLHF using PPO, I think there wouldn't be much weakening because the REINFORCE operator conceptually simply rewards (or punishes) all tokens generated during an episode, without making much attempt to decide which were 'good' or 'bad'. (That's why it's so high variance.) Any advantage function trying to remove some of the variance probably won't do a good job.
More problematically for your idea, if the conclusions are indeed 'unrelated to the task', then shouldn't they be just as likely to arise in every episode - including the ones where it got negative reward? That would seem like it ought to exactly cancel out any learning of 'pondering'.
You need some incentive somewhere to learn good 'pondering'. (I have an example proposal for 'free play' which tries to teach a sort of 'pondering', but by stopping gradients, so anything learned in the initial steps is 'free', and so it can meta-learn to screw around and get something useful for free.)
Sounds somewhat like a bucket brigade market economy.