Comments

Sorted by
gwern93

There is also a considerable level of politics, if you read Egan's writings about Iran and Australia's handling of migrants - anything which distracts from the real important issues, like how Iran is mistreated by the world, is the enemy.

(Plus a certain degree of mathematician crankery: his page on Google Image Search, and how it disproves AI, or his complaints about people linking the wrong URLs due to his ISP host - because he is apparently unable to figure out 'website domain names' - were quite something even when they were written.)

gwern30

I think it's applicable. It seems little different to me than other mode-collapse exercises like 'what is your name?' or 'imagine a random person; what is their surname?' or 'flip a fair coin; what did it come up as?' (Note that for none of the models in either set of responses, is "the favorite color" a 'the', as they do not deterministically pick the same color every time, so if there is supposed to be 'the' favorite color, you still have to explain how the LLM happens to pick this distribution of values.)

According to these two (dubious) surveys I just found 30% of humans pick Blue, 15% Purple, 10% Green. It's not particularly far from human distribution (if you are guessing that they should just say the favorite colors humans are saying).

Seems quite different to me. Leaving aside that hex code is a very large space of colors but the responses still look remarkably uniform and discretized to just a handful of values, just look at all the pink, white, red, 'none of these' etc in the human colors. In the LLM colors, I see a tiny bit of pink/red, and definitely no 'white'. This is not the most mode-collapsed thing ever, but still looks mode-collapsed.

(Mode-collapse is a spectrum. There's fully representing the distribution, then there are mode drops you can't see without statistical tests because they are subtle and distribution-wide or in places you don't normally look, then there's stuff you start to notice if you use it often enough like Claude apparently liking to name characters "Susan Chen" in stories, then there's stuff so blatant you can't help but notice with even a single sample - like how ChatGPT used to write rhyming poetry even if you asked "write a non-rhyming poem".)

gwern40

Cross-model consistency or the exact color shades aside, would you consider this an example of mode-collapse? Those all look like tuned models, with no base models I recognize, while the chosen colors seem... extremely undiverse. I am sure that if we asked random humans, we would find that people like more colors than just 'green' and 'purple', roughly.

gwern7849

'You're prejudiced'? That's really the best defense of AI slop you can come up with? I should have to spend an hour reading 9k words and writing a detailed rebuttal or fisking before I downvote it?

Yes, I'm prejudiced. Or as I prefer to put it, 'using informative priors'. And I will continue to be so, if the best argument you can make is that I might be wrong and I shouldn't be 'prejudiced'. You didn't really paste in 9k words and expect everyone to read it and have to engage it, and not know if it was good or not - did you?

(I am also increasingly wondering if talking too much to LLMs is an infohazard akin to taking up psychedelics or Tiktok or meditation as a habit. No one I see who spends a lot of time talking to LLMs seems to get smarter because of it, but to start talking like them instead or thinking that pasting in big blocks of LLM text is now a laudable intellectual activity rather than taking a big 💩 in public. Everyone in Cyborgism or AI Twitter or LW who talks a lot about talking a lot to LLMs for generic conversation, rather than specific tasks, seems to lose their edge and ability to think critically, even though they all claim the opposite. Like Janus complaining about that Claude comment being censored - the comment contained nothing of value I saw skimming, just impossible to disprove confabulation about introspection of no evidential value, and was certainly not the best comment on the page... When I think about how little it bothers people when ChatGPT or Claude blatantly manipulate them, or show sycophancy, or are mode-collapsed, I wonder to myself - does anyone Agreeable make it out of the near-future intact? "If shoggoth not friend, why friend-shaped?")

This was posted back in April, and it is still pulling people in who are responding to it, 8 months later, presumably because what they read, and what it meant to them, and what they could offer in response in comments, was something they thought had net positive value.

This is an argument against it, not for it. The sin of AI slop, like that of trolls or fabricators or activists, is that they draw in and waste high-quality human thinking time by presenting the superficial appearance of high-quality text worth engaging with, thereby burning the commons and defecting against important intellectual norms like 'not writing Frankfurtian bullshit'. See Brandolini's law: the amount of time and energy spent reading or looking at or criticizing AI slop is (several) orders of magnitude more than went into creating it. Downvoting without reading is the only defense.

However, prejudicially downvoting very old things, written before any such policy entered common norms, violates a higher order norm about ex post facto application of new laws.

No, it doesn't, and I would remind you that LW2 is not a court room, and legal norms are terrible ideas anywhere outside the legal contexts they are designed for.

Bad content should be downvoted no matter when it was written. And AI slop has always been AI slop: ChatGPTese has always been ChatGPTese, and bad, ever since davinci-003 and me screwing around with it in OA Playground and getting an increasingly disturbing sense of 'something has gone terribly wrong here' from the poetry... We have had problems from the start with people pasting ChatGPT spam into LW2 - often badly wrong and confabulated as well (even when the claims made no sense if you thought about them for a few seconds), not merely vomiting junk food text into the comment form. The problem just wasn't bad enough to need to enunciate a policy against it then.

gwern50

Prediction: the SAE results may be better for 'wider', but only if you control for something else, possibly perplexity, or regularize more heavily. The literature on wider vs deeper NNs has historically shown a stylized fact of wider NNs tending to 'memorize more, generalize less' (which you can interpret as the finegrained features being used mostly to memorizing individual datapoints, perhaps exploiting dataset biases or nonrobust features) and so deeper NNs are better (if you can optimize them effectively without exploding/collapsing gradients), which would potentially more than offset any orthogonality gains from the greater width. Thus, you would either need to regularize more heavily ('over-regularizing' from the POV of the wide net, because it would achieve a better performance if it could memorize more of the long tail, the way it 'wants' to) or otherwise adjust for performance (to disentangle the performance benefits of wideness from the distorting effect of achieving that via more memorization).

gwern20

Bryan Johnson is plugged in to Bay Area stuff and Twitter and regularly goes to places where AI risk is discussed seriously, like certain conferences, and if he is not already interested, it is almost certainly not due to ignorance of the sort some cold outreach could fix, but because he either doesn't believe it is a big deal or has chosen to prioritize his longevity/biohacking interests, and lobbying him unsolicited would probably just annoy him and I do not recommend trying. (Given the size of his exit many years ago, Johnson is probably not so wealthy he can casually throw large sums at projects of minor personal priority.)

gwern60

But (as you're presumably thinking) training compute can be amortized across all occasions when the model is used, while inference compute cannot, which means it won't be worthwhile to go very far down the road of scaling inference compute.

Inference compute is amortized across future inference when trained upon, and the three-way scaling law exchange rates between training compute vs runtime compute vs model size are critical. See AlphaZero for a good example.

As always, if you can read only 1 thing about inference scaling, make it "Scaling Scaling Laws with Board Games", Jones 2021.

gwern52

An intuition I’ve had for some time is that search is what enables an agent to control the future. I’m a chess player rated around 2000. The difference between me and Magnus Carlsen is that in complex positions, he can search much further for a win, such than I gave virtually no chance against him; the difference between me and an amateur chess player is similarly vast.

This is at best over-simplified in terms of thinking about 'search': Magnus Carlsen would also beat you or an amateur at bullet chess, at any time control:

As of December 2024, Carlsen is also ranked No. 1 in the FIDE rapid rating list with a rating of 2838, and No. 1 in the FIDE blitz rating list with a rating of 2890.[495]

(See for example the forward-pass-only Elos of chess/Go agents; Jones 2021 includes scaling law work on predicting the zero-search strength of agents, with no apparent upper bound.)

gwern2713

Life is far too short to read either OP's dump of AI slop, or this comment's dump, and exemplifies why LW2 should have a policy against dumping in AI-written content without improvement. (And no, some vague comments tacked on the end about penal slavery does not count, and 'there was so much slop even I can't be bothered to paste it all in' especially doesn't count.)

I've strong downvoted OP. Do better.

Load More