LESSWRONG
LW

All of gwern's Comments + Replies

[Meta] New moderation tools and moderation guidelines

I claim that Said's post is bad because it can be rewritten into a post that fulfills the same function but doesn't feel as offensive.[1] Nothing analogous is true for the Scaling Hypothesis. And it's not just that you couldn't rewrite it to be less scary but convey the same ideas; rather the whole comparison in a non-starter because I don't think that your post on the scaling hypothesis has bad vibes, at all. If memory serves (I didn't read your post in its entirety back then, but I read some of it and I have some memory of how I reacted), it sparks a ki

... (read more)

3habryka5d

(To be clear, my take on all of this is that it is often appropriate to be rude and offensive, and often inappropriate. What has made these discussions so frustrating is that Said continues to insist that no rudeness or offensiveness is present in any of his writing, which makes it impossible to have a conversation about whether the rudeness of offensiveness is appropriate in the relevant context. Like, yeah, LessWrong has a culture, a lot of which is determined by what things people are rude and offensive towards. One of my jobs as a moderator is to steer where that goes. If someone keeps being rude and offensive towards things I really want to cultivate on the site, I will tell them to stop, or at least provide arguments for why this thing that I do not think is worth scorn, deserves scorn. But if that person then insists that no rudeness or offensiveness was present in any of their writing, despite an overwhelming fraction of readers reading it as such, then they are either a writer so bad at communication as to not belong on the site, or trying to avoid accountability for the content of their messages, both of which leave little room but to take moderation action that limits their contributions to the site)

Habryka's Shortform Feed

gwern6d20

If GPT-4.5 was supposed to be GPT-5, why would Sam Altman underdeliver on compute for it? Surely GPT-5 would have been a top priority?

If it's not obvious at this point why, I would prefer to not go into it here in a shallow superficial way, and refer you to the OA coup discussions.

Habryka's Shortform Feed

gwern7d131

GPT-4.5 is roughly a 10x scale-up of GPT-4, right? And full number jumps in GPT have always been ~100x? So GPT-4.5 seems like the natural name for OpenAI to go with.

10x is what it was, but it wasn't what it was supposed to be. That's just what they finally killed it at, after the innumerable bugs and other issues that they alluded to during the livestream and elsewhere, which is expected given the 'wait equation' for large DL runs - after a certain point, no matter how much you have invested, it's a sunk cost and you're better off starting afresh, such ... (read more)

2Lukas Finnveden6d

If GPT-4.5 was supposed to be GPT-5, why would Sam Altman underdeliver on compute for it? Surely GPT-5 would have been a top priority? Maybe Sam Altman just hoped to get way more compute in total, and then this failed, and OpenAI simply didn't have enough compute to meet GPT-5's demands no matter how high of a priority they made it? If so, I would have thought that's a pretty different story from the situation with superalignment (where my impression was that the complaint was "OpenAI prioritized this too little" rather than "OpenAI overestimated the total compute it would have available, and this was one of many projects that suffered").

Habryka's Shortform Feed

gwern7d100

at that time the median estimate for GPT5 release was at December 2024.

Which was correct ex ante, and mostly correct ex post - that's when OA had been dropping hints about releasing GPT-4.5, which was clearly supposed to have been GPT-5, and seemingly changed their mind near Dec 2024 and spiked it before it seems like the DeepSeek moment in Jan 2025 unchanged their minds and they released it February 2025. (And GPT-4.5 is indeed a lot better than GPT-4 across the board. Just not a reasoning model or dominant over the o1-series.)

james oofou7d1211

which was clearly supposed to have been GPT-5

I have seen people say this many times, but I don't understand. What makes it so clear?

GPT-4.5 is roughly a 10x scale-up of GPT-4, right? And full number jumps in GPT have always been ~100x? So GPT-4.5 seems like the natural name for OpenAI to go with.

I do think it's clear that OpenAI viewed GPT-4.5 as something of a disappointment, I just haven't seen anything indicating that they at some point planned to break the naming convention in this way.

Habryka's Shortform Feed

gwern8d227

GPT was $20/month in 2023 and it's still $20/month.

Those are buying wildly different things. (They are not even comparable in terms of real dollars. That's like a 10% difference, solely from inflation!)

[Meta] New moderation tools and moderation guidelines

gwern8d*3224

It’s not my view at all. I think a community will achieve much better outcomes if being bothered by the example message is considered normal and acceptable, and writing the example message is considered bad.

That's a strange position to hold on LW, where it has long been a core tenet that one should not be bothered by messages like that. And that has always been the case, whether it was LW2, LW1 (remember, say, 'babyeaters'? or 'decoupling'? or Methods of Rationality), Overcoming Bias (Hanson, 'politics is the mindkiller'), SL4 ('Crocker's Rules') etc.

I ... (read more)

4Rafael Harth6d

This doesn't feel like it engages with anything I believe. None of the things you listed are things I object to. I don't object to how you wrote the the Scaling Hypothesis post, I don't object to the Baby Eaters, I super don't object to decoupling, and I super extra don't object to 'politics is the mind-killer'. The only one I'd even have to think about is Crocker's Rules, but I don't think I have an issue with those, either. They're notably something you opt into. I claim that Said's post is bad because it can be rewritten into a post that fulfills the same function but doesn't feel as offensive.[1] Nothing analogous is true for the Scaling Hypothesis. And it's not just that you couldn't rewrite it to be less scary but convey the same ideas; rather the whole comparison in a non-starter because I don't think that your post on the scaling hypothesis has bad vibes, at all. If memory serves (I didn't read your post in its entirety back then, but I read some of it and I have some memory of how I reacted), it sparks a kind of "holy shit this is happening and extremely scary ---(.Ó﹏Ò.)" reaction. This is, like, actively good. It's not in the same category as Said's comment in any way whatsoever. I agree that it is better to to not be bothered. My position is not "you should be more influenced by vibes", it's something like "in the real world vibes are about 80% of the causal factors behind most people's comments on LW and about 95% outside of LW, and considering this fact about how brains work in how you write is going to be good, not bad". In particular, as I described in my latest response to Zack, I claim that the comments that I actually end up leaving on this site are significantly less influenced by vibes than Said's because recognizing what my brain does allows me to reject it if I want to. Someone who earnestly believes to be vibe-blind while not being vibe-blind at all can't do that. This honestly just doesn't seem related, either. Status-blindness is more spe

Time Machine as Existential Risk

gwern8d143

You might enjoy my review of the movie Timecrimes.

4avturchin7d

Thanks! Fantastic read. It occurred to me that sending code or AI back in time, rather than a person, is more likely since sending data to the past could be done serially and probably requires less energy than sending a physical body. Some loops could be organized by sending a short list of instructions to the past to an appropriate actor – whether human or AI. Additionally, some loops might not require sending any data at all: Roko's Basilisk is an example of such acausal data transmission to the past. Could there be an outer loop for Roko's Basilisk? For example, a precommitment not to be acausally blackmailed. Also (though I'm not certain about this), loops like you described require that the non-cancellation principle is false – meaning that events which have happened can be turned into non-existence. To prevent this, we would need to travel to the past and compensate for any undesirable changes, thus creating loops. This assumption motivated the character in Timecrimes to try to recreate all events exactly as they happened. However, if the non-cancellation principle is false, we face a much more serious risk than nested loops (which are annoying, but most people would live normal lives, especially those who aren't looped and would continue through loops unaffected). The risk is that a one-time time machine could send a small probe into the remote past and prevent humanity from appearing at all. We can also hypothesize that an explosion of nested loops and time machines might be initiated by aliens somewhere in the multiverse – perhaps in the remote future or another galaxy. Moreover, what we observe as UAPs might be absurd artifacts of this time machine explosion.

Racial Dating Preferences and Sexual Racism

gwern8d*61

I would also point out that, despite whatever she said in 1928 about her 1909 inheritance, Woolf committed suicide in 1941 after extensive mental health challenges which included "short periods in 1910, 1912, and 1913" in a kind of insane asylum, and then afterwards beginning her serious writing career (which WP describes as heavily motivated by her psychiatric problems as a refuge/self-therapy), so one can certainly question her own narrative of the benefits of her UBI or the reasons she began writing. (I will further note that the psychological & psy... (read more)

AI forecasting bots incoming

gwern8dΩ244813

Update: Bots are still beaten by human forecasting teams/superforecasters/centaurs on truly heldout Metaculus problems as of early 2025: https://www.metaculus.com/notebooks/38673/q1-ai-benchmarking-results/

A useful & readable discussion of various methodological problems (including the date-range search problems above) which render all forecasting backtesting dead on arrival (IMO) was recently compiled as "Pitfalls in Evaluating Language Model Forecasters", Paleka et al 2025, and is worth reading if you are at all interested in the topic.

Genomic emancipation

gwern9d80

Personality traits are especially nasty a danger because given the existence of: stabilizing selection + non-additive variance + high social homogamy/assortative mating + many personality traits with substantial heritability, you can probably create extreme self-sustaining non-coercive population structure with a package of edits. I should probably write some more about this because I think that embryo selection doesn't create this danger (or in general result in the common fear of 'speciation'), but embryo editing/synthesis does.

2TsviBT9d

Interesting. (I don't immediately see where you're going with that, so sounds like I have something to learn!) In practical terms, it should be feasible sooner to do small amounts of personality nudging using what data we already have, operating on linear variance. Later on we'll have more data, better psychometrics, and better ways of modeling some of the nonlinear effects. My current take is that it's better to use the weaker versions while the strong ones are infeasible (https://www.lesswrong.com/posts/rdbqmyohYJwwxyeEt/genomic-emancipation#Genomic_engineering_overhang), but not sure.

Forecasting AI Forecasting

gwern11d165

Key lesson:

One conclusion we have drawn from this is that the most important factor for good forecasting is the base model, and additional prompting and infrastructure on top of this provide marginal gains.

Scaling remains undefeated.

Situational Awareness: A One-Year Retrospective

gwern11d1814

It'd be a lot easier to check claims here if you included the original hyperlinks (or in the handful of cases that a URL is provided, made it clickable).

7Said Achmiz11d

Note that if you view this post on GreaterWrong, all URLs are automatically clickable hyperlinks.

sunwillrise's Shortform

gwern15d*2218

If you know what to search for, you can dig out that old post. Of course, leaving memorable breadcrumbs you can search for three years later is, at best, an art

Yes, that has been my experience too. Sure, Discord (like Twitter) gives you fairly powerful search primitives, to a greater extent than most people ever notice. You can filter by user, date-ranges, that sort of thing... It was written by nerds for nerds, originally, and it shows. However, I have still struggled to find many older Discord comments by myself or others, because it is inherent to th... (read more)

2niplav13d

It seems worth noting that 𝕏 search has been broken for quite a while, and shows no sign of improvement.

Ok, AI Can Write Pretty Good Fiction Now

gwern15d75

“buying vegetables they didn't need” doesn’t make any sense. Either nobody needs vegetables or everybody does; they’re healthy but not necessary to stay alive.

On Tuesday at Esmeralda in California, I watched a lot of people just like the protagonists at the farmer's market buying vegetables they didn't need. (I bought a sourdough loaf which I did need, and ate it.) At the house I'm staying at, I just got buzzed by the fly from the vegetables that the house renters bought which they didn't need. (Cherry tomatoes, if you were wondering.) It makes perfect sen... (read more)

4JustisMills14d

Sure; the more detailed version of my critique of that specific line is something like: "Ambiguity is a really powerful resource in extremely short fiction, such that pointless or unclear ambiguity is really bad. When I see 'buying vegetables they didn't need' I'm not sure what is meant; literally speaking, vegetables (potatoes notwithstanding) are often not that calorically dense, making them a healthy extra to add to a meal. Taken that way, "they didn't need" feels kind of redundant - you don't need a side salad, sure, but who cares? Nobody ever does. Or "they didn't need" can be taken as vaguely judgmental, like, vegetables that'll probably rot uneaten. But that's weird, since the rest of the piece is non-judgmental and in fact takes an over-the-shoulder-camera style perspective aligned with the protagonist's. So a single line tut-tutting their vegetable purchase feels weird. All of this being a sort of vague gesture at why I see that phrase and my nose wrinkles up, and I'm taken out of the story." I'm not sure I understand your last parenthetical; everybody definitely needs water to stay alive, and doesn't need veggies; veggies specifically are a pretty easy food group to forgo (maybe not literally, but you can certainly avoid eating the things people generally are thinking of when they say "eat your vegetables" and just be... slightly less healthy). I suppose my point wasn't clear, there.

2Chastity Ruth15d

You're right, but the better description of the phenomenon is probably something like: "Buying vegetables they didn't want" "Buying vegetables they'd never eat" "Buying vegetables they didn't plan to use" "Aimlessly buying vegetables" "Buying vegetables for the sake of it" "Buying vegetables because there were vegetables to buy" Because you don't really "need" any grocery shop, so long as you have access to other food. It's imprecise language that annoys some readers, though I don't think it's the biggest deal

Knight Lee's Shortform

gwern15d125

What splits do you have in mind which are so much more often happening than mergers? We just saw Scale merge into FAIR, and not terribly long before that, Character.ai returned to the mothership, while Tesla AI de facto merged into Xai and before that Adept merged into Amazon and Inflection into Microsoft etc, in addition to the de facto 'merges' which occur when an AI lab quietly drops out and concedes the frontier (eg Mistral) or where they pivot to opensource as a spoiler or commoditize your complement play. So I see plenty of merging, consistent with t... (read more)

Said Achmiz's Shortform

gwern15d*184

I would also point out a perverse consequence of applying the rate limiter to old high-karma determined commenters: because it takes two to tango, a rate-limiter necessarily applies to the non-rate-limited person almost as much as the rate-limited person...

You know what's even more annoying than spending some time debating Said Achmiz? Spending time debating him when he pings you exactly once a day for the indefinite future as you both are forced to conduct it in slow-motion molasses. (I expect it is also quite annoying for anyone looking at that page, or the site comments in general.)

1ProgramCrafter15d

Are the dialogues rate limited too? If not, they might be a more suitable medium. They are admittedly harder to branch, but the object-level point of Said vs GSW case has been lost already.

5habryka15d

I actually happen to prefer it in once a day spurts, and think this generalizes some to others. I don’t think it’s obvious in general which way is better on this dimension though.

sunwillrise's Shortform

gwern15d7330

And yet... lukeprog hasn't been seriously active on this site for 7 years, Wei Dai hasn't written a post in over a year (even as he engages in productive discussions here occasionally), Turntrout mostly spends his time away from LW, Quintin Pope spends all his time away from LW, Roko comments much less than he used to more than a decade ago, Eliezer and Scott write occasional comments once every 3 months or so, Richard Ngo has slowed down his pace of posting considerably, gwern posts here very infrequently (and when he does, it's usually just linking to o

... (read more)

3Ruby14d

I wouldn't say the scope was narrowed, in fact the admin team took a lot of actions to preserve the scope, but a lot of people have shown up for AI or are now heavily interested in AI, simply making that the dominant topic. But, I like to think that people don't think of LW as merely an "AI website".

3dbohdan14d

The YouTube channel Rational Animations seems pretty successful in terms of sheer numbers: 385K subscribers, which is comparable to YouTubers who talk about media and technology. Their videos "The True Story of How GPT-2 Became Maximally Lewd" and "The Goddess of Everything Else" have over two million views. Qualitatively, I have seen their biggest videos mentioned a few times where a LW post wouldn't be. However, the channel principally adapts existing rationalist and AI-safety content. (Sort the videos by popular to see.) I think they're good at it. Through their competence, new incisive rationalist-related videos exist—as adaptations of older incisive rationalist-related writing. I don't know of another channel like it, even though popular YouTube channels attract imitators, and it is hard to imagine them switching to new ideas. Part of it is the resources involved in producing animation compared to writing. With animation so labor-intensive, it makes sense to try out and refine ideas in text and only then adapt them to video. Posters on video-LW with original high-effort content would come to resent how much each mistake cost them compared to a textual post or comment. AI video generation will make it easier to create videos, but precise control over content and style will still demand significantly more effort than text.

1Three-Monkey Mind15d

I generally agree with any and all criticisms of Discord, but its search is pretty good. If you know what to search for, you can dig out that old post. Of course, leaving memorable breadcrumbs you can search for three years later is, at best, an art, and in my case seems like something that’s purely luck-of-the-draw when it comes to improbable phrases that you’ve mentioned only once or a handful of times. On the other hand, Discord users do tend to have a lower threshold of reading stamina; “I ain’t reading all that — I’m happy for you, or sorry that happened” seems to happen more often in Discord unless you’re in a Discord guild that’s pre-selected for people who can read long things — a Gaming Lawyers guild, perhaps.

Do you even have a system prompt? (PSA / repo)

gwern15d30

Wait I don't think @gwern literally pastes this into the LLM? "Third parties like LLMs" sounds like "I'm writing for the training data".

That actually is the idea for the final version: it should be a complete, total guide to 'writing a gwernnet essay' written in a way comprehensible to LLMs, which they can read in a system prompt, a regular prompt, or retrieve from the Internet & inject into their inner-monologue etc. It should define all of the choices about how to markup stuff like unique syntax (eg. the LLMs keep flagging the interwiki links as s... (read more)

LLMs as amplifiers, not assistants

gwern15d*143

use a trick discovered by Janus to get Claude Opus 4 to act more like a base model and drop its “assistant” persona

Have you or Janus done anything more rigorous to check to what extent you are getting 'the base model', rather than 'the assistant persona pretending to be a base model'? This is something I've noticed with jailbreaks or other tweaks: you may think you've changed the bot persona, but it's really just playing along with you, and will not be as good as a true base model (even if it's at least stylistically superior to the regular non-roleplay... (read more)

the silk pajamas effect

gwern16d146

These ancient F1 drivers sound like a good contrast to the NBA stats presented: if it was simply wealth/success, shouldn't there be a 'pyjama effect' there too? F1 drivers get paid pretty well too.

Most players don’t survive very long. Over a third don’t make it past two years. The average person lasts five years. The odds of making it to the ten year mark is less than 25%. This curve is starkly contrasted to most modern jobs, with the median teacher career for example lasting over 25 years.
...Once the wealth and fame pile up, the grind feels optional. Yo

... (read more)

2CstineSublime14d

Sometimes more than NBA players but what's perhaps more interesting is the source of those earnings: For basketballers at least most of their earnings come from extracurricular activities and retirement often doesn't curtail their ability to make huge earnings. 1969/71/73 World Drivers Champion Jackie Stewart was still doing endorsements for Heineken at least 8 years ago from a TVC where they Forest Gump'd in their beers, despite his retiring way back in 1973. Pro-Golfer Greg Norman has earned many times what he did when he was playing at his peak. I mentioned outlier Kimi Raikkonen, in 2009 Forbes tied him for #2 of the world's highest earning athletes in the world. Higher than Lebron James and David Beckham (I have recently seen David Bechkam's likeness being used to sell mattresses of all things). What is important to note is one of the athletes Raikkonen was tied with was a retired Michael Jordan. Not only is an Athlete's earning potential not tied to continuing to compete, but much of their earnings even when at the peak of their careers comes from extracurricular activities like endorsements. A long career might be lucrative, but sometimes you can outearn in retirement. Returning to current players: In Forbes' most recent global rankings, the only basketballers who out-earn F1's top earners are Stephen Curry, Lebron James, Kevin Durant, Giannis Antetokounmpo. Forbes claims Stephen Curry is the second highest paid athlete in the world, and estimate he's pulling in $156 million, $56 of which is "on-field" (on-court?) as they call it. Lewis Hamilton, who is one of the outliers I mentioned, and currently not performing at his peak is, is currently ranked all the way down in tied for 22 with boxer Canelo Alvarez, but both Alvarez Hamilton are earning even more on-field (in the ring and on-track?) than Curry. Hamilton earns $60 million from a lucrative contract with Ferrari. Right below him is Max Verstappen in 24 who earns $78 million on the track from his c

5thiccythot16d

Personally I think careers getting derailed by non-contact accidents, creeping athletic decline, and coach/system variance aren't exogenous to the silk pajama thesis, because those areas are exactly where complacency if it exists manifests. 1, non-contact injuries can be minimized with year round strength and mobility work, recovery tech, dedicated staff, prehab, weight/diet control, etc... Lebron famously spends 2m a year on his body and takes his regimen very seriously and has never suffered a serious injury. 2. athletic decline can be addressed by evolving mechanics of your game to rely more on craft. Vince Carter pivoted from an athletic dunker to a 40% shooter and played until he was 43. 3. Same goes with coach/system, if you don't adjust your game it increases the probability you will get cut. In my head it's something like this: Base hazard (pure bad luck) is X%. hazard given “kept the hunger” is X – δ. hazard given “cashed out & coasted” is X + δ. I agree with you that a simple one line survival curve is too coarse to reveal what δ is if it even exists at all. To show it statistically, you would likely need to stratify the data more. There is a lot of early washouts that cause the first few years to be steep and contract timing isn't uniform which keeps the aggregate survival line exponential is my guess. Maybe if you show survival careers of only top 5-10 draft picks it could better show us what we wanted?

Debate experiments at The Curve, LessOnline and Manifest

gwern22d31

I wonder why no-one has just directly tried to do turing debate, where the debaters submit ~2000 words that explain their views to each other beforehand, then the actual debate is them taking on the position of the other side and trying to debate that.

One idea might be to pair debates with Delphi panels: do the usual Delphi method to get a consensus report beforehand, and then have them explain & debate what is left over as non-consensus (or possibly, if there are some experts who disagree hotly with the consensus report, bring them on for a debate with the original panel).

Ghiblification for Privacy

gwern25d164

First, I didn't say it wasn't communicating anything. But since you bring it up: it communicated exactly what jefftk said in the post already describing the scene. And what it did communicate that he didn't say cannot be trusted at all. As jefftk notes, 4o in doing style transfer makes many large, heavily biased, changes to the scene, going beyond even just mere artifacts like fingers. If you don't believe that people in that room had 3 arms or that the room looked totally different (I will safely assume that the room was not, in fact, lit up in tastefully... (read more)

1drossbucket15d

+1 for "what has been seen cannot be unseen", wow I'm seeing a lot of cat-urine yellow around now

9habryka25d

Gwern, look, my drawing skills are pretty terrible. We've had sequences posts with literal pictures of napkins where Eliezer drew bad and ugly diagrams up here for years. Yes, not everything in the image can be trusted, but surely I have learned many real and relevant things about the atmosphere and vibe from the image that I would not from a literal description (and at the very least it is much faster for me to parse than a literal description). I know the kinds of errors that image models make, and so I can adjust for them. They overall make many fewer errors than jefftk would make if he were to draw some stick figures himself, which would still be useful. The image is clearly working at achieving its intended effect, and I think the handwringing about it being unaesthetic is overblown compared to all realistic alternatives. Yes, it would be cool if jeff prompted more times, but why bother, it's getting the job done fine, and that's what the whole post is about.

Do you even have a system prompt? (PSA / repo)

gwern25d50

Seems similar to the "anti-examples" prompting trick I've been trying: taking the edits elicited from a chatbot, and reversing them to serve as few-shot anti-examples of what not to do. (This would tend to pick up X-isms.)

2Croissanthology25d

Any specifics about system prompts you use in general? Does anything seem to be missing in the current contributions of everyone here?

Ghiblification for Privacy

gwern25d26

One obvious reason to get upset is how low the standards of people posting them are. Let's take jefftk's post. It takes less than 5 seconds to spot how lazy, sloppy, and bad the hands and arms are, and how the picture is incoherent and uninformative. (Look at the fiddler's arms, or the woman going under 2 arms that make zero sense, or the weird doors, or the table which seems to be somehow floating, or the dubious overall composition - where are the yellow fairy and non-fairy going, exactly?, or the fact that the image is the stereotypical cat-urine yellow of all 4o images.) Why should you not feel disrespected and insulted that he was so careless and lazy to put in such a lousy, generic image?

4jefftk25d

The left arm is holding the fiddle and is not visible behind my body, while the right arm has the sleeve rolled up above the elbow and you can see a tiny piece of the back of my right hand poking out above my forearm. The angle of the bow is slightly wrong for the hand position, but only by a little since there is significant space between the back of the hand and the fingertips holding the bow. (Of course, as I write in my post, it certainly gets a lot of other things wrong. Which is useful to me from a privacy perspective, though probably not the most efficient way to anonymize.)

5habryka25d

I was in this case assuming it was a ghiblified version of a photo, illustrating the very core point of this post. Via this mechanism it communicated a lot! Like how many people were in the room, how old they were, a lot about their emotional affect, how big the room was, and lots of other small details.

Self-Coordinated Deception in Current AI Models

gwern1mo50

I think that's exactly how it goes, yeah. Just free association: what token arbitrarily comes to mind? Like if you stare at some static noise, you will see some sort of lumpiness or pattern, which won't be the same as what someone else sees. There's no explaining that at the conscious level. It's closer to a hash function than any kind of 'thinking'. You don't ask what SHA is 'thinking' when you put in some text and it spits out some random numbers & letters. (You would see the same thing if you did a MLP or CNN on MNIST, say. The randomly initialized ... (read more)

Self-Coordinated Deception in Current AI Models

gwern1mo*70

It is not clear how the models are able to self-coordinate. It seems likely that they are simply giving what they believe would be the most common answer the same way a group of humans might. However, it is possible the models are engaging in more sophisticated introspection focussing on how they specifically would answer. Follow-up investigations could capture models’ chain of thought as well as tweak the prompt to indicate that the model should strive to be consistent with an answer a human might give or another company’s AI model might give. Circuit-tr

... (read more)

1Avi Brach-Neufeld1mo

Do you have ideas about the mechanism by which models might be exploiting these spurious correlations in their weights? I can imagine this would be analogous to a human “going with their first thought” or “going with their gut”, but I have a hard time conceptualizing what that would look like for an LLM . If there is any existing research/writing on this, I’d love to check it out

Florian_Dietz's Shortform

gwern1mo30

Yes, a NN can definitely do something like know if it recognizes a datapoint, but it has no access to the backwards step per se. Like take my crashing example: how, while thinking in the forward pass, can it 'know' there will be a backward pass when there might be no backward pass (eg because there was a hardware fault)? The forward pass would appear to be identical in every way between the forward pass that happens when there is a backward pass, and when the backward pass doesn't happen because it crashed. At best, it seems like a NN cannot do more than s... (read more)

1Florian_Dietz1mo

It can't tell for sure if there will be a backward pass, but it doesn't need to. Just being able to tell probabilistically that it is currently in a situation that looks like it has recently been trained on implies pretty strongly that it should alter its behavior to look for things that might be training related.

In defense of memes (and thought-terminating clichés)

gwern1mo2616

You can look this up on knowyourmeme and confirm it, and I've done an interview on the topic as well. Now I don't know much about "improving public discourse" but I have a long string of related celebrity hoaxes and other such nonsense which often crosses over into a "War of the Worlds" effect in which it is taken quite seriously...I have had some people tell me that I'm doing what you're calling "degrading the public discourse," but that couldn't be farther from the truth. It's literature of a very particular kind, in fact. Are these stories misinterpret

... (read more)

Gemini Diffusion: watch this space

gwern1mo60

There are some use-cases where quick and precise inference is vital: for example, many agentic tasks (like playing most MOBAs or solving a physical Rubik's cube; debatably most non-trivial physical tasks) require quick, effective, and multi-step reasoning.

Yeah, diffusion LLMs could be important not for being better at predicting what action to take, but for hitting real-time latency constraints, because they intrinsically amortize their computation more cleanly over steps. This is part of why people were exploring diffusion models in RL: a regular bidir... (read more)

Ownership: the principle of "Deprive first, ask questions later"

gwern1mo93

This post is an example of my method. Over the last 1-2 years, I’ve made heavy use of AIs, lately DeepSeek and Claude. I do the same with them: present my ideas, deal with their criticisms and objections—whether to correct them or take correction myself—until we’re agreed or the AI starts looping or hallucinating. So, when I say I have yet to hear, after all this time, credible, convincing arguments to the contrary, it’s after having spent the time and done the work that most people don’t even attempt.

Or, to put it less flatteringly, "I harangue the mos... (read more)

-7MillardJMelnyk1mo

Florian_Dietz's Shortform

gwern1mo*87

I think there are many ways that a LLM could have situated awareness about what phase it is in, but I'm not sure if the gradient descent itself is a possibility?

While a NN is running the forward pass without any backprop, it is computing exactly the same thing (usually) that it would be computing if it was running a forward pass before a backwards pass to do a backprop. Otherwise, the backprop can't really work - if it doesn't see the 'real' forward pass, how does it 'know' how to adjust the model parameters to make the model compute a better forward pass ... (read more)

0Florian_Dietz1mo

Those sounds like good counterarguments, but I still think there could be enough information there for the LLM to pick it up: It seems plausible to me that a set of weights that is being updated often is different in some measurable way than a set of weights that has already converged. I don't have proof for this, only intuition. It feels similar to how I can tell if my own movement is well-practiced or not, or if my intuition about a topic is well-founded or not, even withou consciously thinking about how confident I should be based on objective measures.

Virtues related to honesty

gwern1mo161

I agree it is poorly written, but I don't think it is, strictly speaking, 'LLM slop'. Or if it is, it's not an LLM I am familiar with, or is an unusual usage pattern in some way... It's just not written with the usual stylistic tics of ChatGPT (4o or o3), Claude-3/4, Gemini-2.5, or DeepSeek-r1.

For example, he uses a space after EM DASH but not before; no LLM does that (they either use no space or both before-after); he also uses '1) ' number formatting, where LLMs invariably use '1. ' or '#. ' proper Markdown (and generally won't add in stylistic redundanc... (read more)

2Jiro1mo

He probably used a LLM and lightly edited it. The non-LLM punctuation and references would come from the editing.

gwern's Shortform

gwern1mo*2510

It also sounds like a piece of paper, or a map, or a person having vivid hallucinations before falling asleep. But unless you have a whiteboard which can be copied among several hundred people and teleport and be rolled up and fit in a jean pocket, which lets you timetravel so you can look at what used to be on the whiteboard or look at what people might write on it in the future, or 'a whiteboard' which is neither white (because there's a colored map printed on it) nor 'a board' (because it's arbitrarily many), which has a ledgerbook next to itself which writes itself, and so on, I would suggest that this does not 'sound like a whiteboard' to most people. (No, not even a Biblically-accurate whiteboard.)

Security Mindset: Hacking Pinball High Scores

gwern1mo30

Yes, there's a lot of computer-related ones depending on how finegrained you get. (There's a similar issue with my "Ordinary Life Improvements": depending on how you do it, you could come up with a bazillion tiny computer-related 'improvements' which sort of just degenerates into 'enumerating every thing ever involving a transistor in any way' and is not enlightening the same way that, say, 'no indoors smoking' or 'fresh mango' is.) So I would just lump that one under 'Machine Configuration/Administration § Software' as one of the too-obvious-to-be-worth-mentioning hacks.

Alexander Gietelink Oldenziel's Shortform

gwern1mo50

How did you check Claude's claims here?

2Alexander Gietelink Oldenziel1mo

I spotcheked the first claim about eratosthenes. The second part on eratothenes is directly from wikipedia.

gwern's Shortform

gwern1mo437

Idea: "Conferences as D&D tabletops": you may be able to better organize a conference or convention by borrowing a tool from tabletop roleplaying games - players collaborate by directly manipulating or modifying a 2D map. It seems to me like this could be low-friction and flexibly handles a lot of things that existing 'conware' design patterns don't handle well.

3bohaska1mo

This sounds like a whiteboard to me

Is Building Good Note-Taking Software an AGI-Complete Problem?

gwern1mo460

I have not done any work directly on it. The LLMs have kept improving so rapidly since then, especially at coding, that it has not seemed like a good idea to work on it.

Instead, I've been thinking more about how to use LLMs for creative writing or personalization (cf. my Dwarkesh Patel interview, "You should write more online"). To review the past year or two of my writings:

So for example, my meta-learning LLM interviewing proposal is about how to teach a LLM to ask you useful questions about your psychology so it can better understand & personalize

... (read more)

If you're not sure how to sort a list or grid—seriate it!

gwern1mo50

I was trying out a hierarchical approach when I stopped, because I wasn't sure if I could trust a LLM to rewrite a whole input without dropping any characters or doing unintended rewrites, and aside from being theoretically more scalable and potentially better by making each step easier and propagating the sorting top-down, if you explicitly turn it into a tree, you can easily check that you get back an exact permutation of the list each time and so that the rewrite was safe. I think that might be unnecessary at this point, given the steady improvement in prompt adherence, so maybe the task is now trivial.

There's no explicit distances calculated: just asking the LLM to sort the list meaningfully.

If you're not sure how to sort a list or grid—seriate it!

gwern1mo*150

Very funny, but the OA embeddings were always bad at sentence embedding, specifically, compared to other NN sentence-specialized embeddings; and as the original OA embedding paper somewhat defensively argues, it's not even clear a priori what a sentence embedding should do because a sentence is such a cut-down piece of text, and doing well at a sentence embedding task may only be overfitting or come at the cost of performance on more meaningful text embedding tasks. (Similar to a word embedding: they are so poly-semantic or context-dependent that it seems ... (read more)

1ArthurB1mo

Do you prompt the LLM to do the whole rewrite or call it n(n-1)/2 times to get the distances?

If you're not sure how to sort a list or grid—seriate it!

gwern1mo176

Yeah, it's limited by what kind of structure you have. It did seriate your list successfully, sounds like, it's just you have a lot of structure in the list that you don't care about, and so no embedding is going to prioritize the other stuff and the distances aren't useful to you in general. This will hurt any embedding-related use-case, not just seriation - presumably your k-NN lookups aren't terribly useful either and they mostly just pull up hits which have superficial syntactic similarities.

This is probably less of a problem with my annotations becaus... (read more)

2Algon1mo

Good point, and thanks for the suggestions.

Alignment Proposal: Adversarially Robust Augmentation and Distillation

gwern1mo70

As I've said before, I think you greatly overrate the difficulty of putting search into neural nets, and this is an example of it. It seems to me like it is entirely possible to make a generic LLM implement an equivalent to AlphaZero and be capable of expert iteration, without an elaborate tree scaffolding. A tree search is just another algorithm which can be reified as a sequence, like all algorithms (because they are implemented on a computer).

All AlphaZero is, is a way of doing policy iteration/Newton updates by running a game state forward for a few pl... (read more)

2Steven Byrnes1mo

Hmm, I don’t particularly disagree with anything you wrote. I think you’re misunderstanding the context of this conversation. I wasn’t bringing up tree search because I think tree search is required for AGI. (I don’t think that.) Rather, I was making a point that there will need to be some system that updates the weights (not activations) of an AGI as it runs, just as adult humans learn and figure out new things over time as they work on a project. What is this system that will update the weights? I have opinions, but in general, there are lots of possible approaches. Self-play-RL with tree search is one possibility. RL without tree search is another possibility. The system you described in your comment is yet a third possibility. Whatever! I don’t care, that’s not my point here. What is my point? How did this come up? Well, Cole’s OP is relying on the fact that “[pure] imitation learning is probably existentially safe”. And I was saying that pure imitation learning imposes a horrific capability tax that destroys his whole plan, because a human has open-ended autonomous learning, whereas a model trained by pure imitation learning (on that same human) does not. So you cannot simply swap out the former for the latter. In Cole’s most recent reply, it appears that what he has in mind is actually a system that’s initialized by being trained to imitate humans, but then it also has some system for open-ended continuous learning from that starting point. And then I replied that this would solve the capability issue, but only by creating a new problem that “[pure] imitation learning is probably existentially safe” can no longer function as part of his safety argument, because the continuous learning may affect alignment. For example, if you initialize a PacMan RL agent on human imitation (where the humans were all very nice to the ghosts during play), and then you set up that agent to continuously improve by RL policy optimization, using the score as the reward functi

Is Building Good Note-Taking Software an AGI-Complete Problem?

gwern1mo153

My earlier commentary on what I think note-taking tools tend to get wrong: https://gwern.net/blog/2024/tools-for-thought-failure

6Mo Putera1mo

Have you by any chance gotten further along on your Nenex idea, or know of anyone online who's gone somewhat in that direction far enough to be interesting? To be fair the Nenex features you listed are pretty extensive so I doubt anyone's gone all that far, which is a bummer since is a seductive vision that feels like it should be a lot closer today than it actually is.

AI #117: OpenAI Buys Device Maker IO

gwern1mo52

Here is another way to defend yourself against bot problems:

Turned out to be fake, BTW. His friend just pranked him.

silentbob's Shortform

gwern1mo*90

for text, you might realize that different parts of the text refer to each other, so need a way to effectively pass information around, and hence you end up with something like the attention mechanism

If you are trying to convince yourself that a Transformer could work and to make it 'obvious' to yourself that you can model sequences usefully that way, it might be a better starting point to begin with Bengio's simple 2003 LM and MLP-Mixer. Then Transformers may just look like a fancier MLP which happens to implement a complicated way of doing token-mixin... (read more)

Semen and Semantics: Understanding Porn with Language Embeddings

gwern1mo12

Or just clipped out. It takes 2 seconds to clip it out and you're done. Or you just fast forward, assuming you saw the intro at all and didn't simply skip the first few minutes. Especially as 'incest' becomes universal and viewers just roll their eyes and ignore it. This is something that is not true of all fetishes: there is generally no way to take furry porn, for example, and strategically clip out a few pixels or frames and make it non-furry. You can't easily take a video of an Asian porn star and make them white or black. And so on and so forth.

Warty's Shortform

gwern2mo92

But if a metric is trivially gameable, surely that makes it sus and less impressive, even if someone is not trivially, or even at all gaming it.

Why would you think that? Surely the reason that a metric being gameable matters is if... someone is or might be gaming it?

Plenty of metrics are gameable in theory, but are still important and valid given that you usually can tell if they are. Apply this to any of the countless measurements you take for granted. Someone comes to you and say 'by dint of diet, hard work (and a bit of semaglutide), my bathroom scal... (read more)

1Warty2mo

Hmm yea gameability might not be so interesting of a property of metrics as I've expressed. (though I still feel there is something in there. Fixing your calibration chart after the fact by predicting one-sided coins dice is maybe a lot like taking a foot off the bathroom scale. But, for example, predicting every event as a constant p%, is that even cheating in the calibration game? Though neither of these directly applies to the case of prediction market platforms)

Warty's Shortform

gwern2mo176

Good calibration is impressive and an interesting property because many prediction sources manage to not clear even that minimal bar (almost every human who has not undergone extensive calibration training, for example, regardless of how much domain expertise they have).

Further, you say one shouldn't be impressed by those sources because they could be flipping a coin, but then you refuse to give any examples of 'impressive' sources which are doing just the coin-flip thing or an iota of evidence for this bold claim, or to say what they are unimpressive compared to.

3Warty2mo

Yea I would be impressed if a human showed me they have a good calibration chart. (though part of it is that humans usually put few questions in their calibration charts. It would be nice to look at people's performance in a range of improving calibration exercises) I don't think anyone is brute-forcing calibration with fake predictions, it would be easy to see if the predictions are public. But if a metric is trivially gameable, surely that makes it sus and less impressive, even if someone is not trivially, or even at all gaming it. I don't claim that any entity is not impressive, just that we shouldn't be impressed by calibration (humans get a pass, it takes so much effort for us to do anything). There is probably some bravery debate aspect here, if you look at my linked tweets, it's like in my world people are just going around saying good calibration implies good predictions, which is false. (edit 1: for human calibration exercises, note that with a stream of questions where p% resolve true, it's perfectly calibrated to always predict p%. Humans who do calibration exercises have other goals than calibration. Maybe I should pivot to activism in favor of prediction scores)

Thomas Kwa's Shortform

gwern2moΩ8243

I think I would have predicted that Tesla self-driving would be the slowest

For graphs like these, it obviously isn't important how the worst or mediocre competitors are doing, but the best one. It doesn't matter who's #5. Tesla self-driving is a longstanding, notorious failure. (And apparently is continuing to be a failure, as they continue to walk back the much-touted Cybertaxi launch, which keeps shrinking like a snowman in hell, now down to a few invited users in a heavily-mapped area with teleop.)

I'd be much more interested in Waymo numbers, as that... (read more)

5Thomas Kwa2mo

I would love to have Waymo data. It looks like it's only available since September 2024 so I'll still need to use Tesla for the earlier period. More critically they don't publish disengagement data, only crash/injury. There are Waymo claims of things like 1 disengagement every 17,000 miles but I don't believe them without a precise definition for what this number represents.

Semen and Semantics: Understanding Porn with Language Embeddings

gwern2mo*716

The trends reflect the increasingly intense tastes of the highest spending, most engaged consumers.

https://logicmag.io/play/my-stepdad's-huge-data-set/

While a lot of people (most likely you and everyone you know) are consumers of internet porn (i.e., they watch it but don’t pay for it), a tiny fraction of those people are customers. Customers pay for porn, typically by clicking an ad on a tube site, going to a specific content site (often owned by MindGeek), and entering their credit card information.
This “consumer” vs. “customer” division is key to

... (read more)

Elizabeth2mo*104

This theory feels insufficient to me, or like it's missing a step. It makes sense to me for people to pay when their preferred porn is undersupplied, but incest porn is now abundant. You need a more specific reason incest fans will pay even when they don't have to.

Additionally, "but you're my stepdad" isn't equivalent to a couple of foot shots. Lots of people are (or at least were) turned off by incest.

1future_detective2mo

Hard to parse the reasons for the big clusters with complete certainty but this is a basically plausible story. Other macro factors I have mulled include FOSTA/SESTA - I find the timing interesting, given that it was one of the only major pieces of porn-centric legislation in the last 10 years and it took place right around the time of the big jump - and Nick Kristof's 2020 investigation, which clearly shows up in the data but did not dislodge the main trend.

A widely shared AI productivity paper was retracted, is possibly fraudulent

gwern2mo124

I think aside from the general implausibility of the effect sizes and the claimed AI tech (GANs?) delivering those effect sizes across so many areas of materials, one of the odder claims which people highlighted at the time was that supposedly the best users got a lot more productivity enhancement than the worst ones. This is pretty unusual: usually low performers get a lot more out of AI assistance, for obvious reasons. And this lines up with what I see anecdotally for LLMs: until very recently, possibly, they were just a lot more useful for people not very good at writing or other stuff, than for people like me who are.