All of testingthewaters's Comments + Replies

https://arstechnica.com/ai/2025/05/google-is-quietly-testing-ads-in-ai-chatbots/

testingthewaters1mo51

Google potentially adding ads to gemini:

OpenAI adds shopping to chatgpt:

https://www.wired.com/story/openai-adds-shopping-to-chatgpt/

If there's anything the history of advertising should tell us, it is that there will be powerful optimisation pressures for persuasion being developed quietly in the background for all future model post training pipelines.

2Viliam1mo

Quietly at first, then openly as people get used to it. You always want to have just slightly more ads than your competitors, because having much more could make people switch.

Wei Dai's Shortform

testingthewaters1mo10

Well said. Bravo.

https://www.wired.com/story/openai-adds-shopping-to-chatgpt/

testingthewaters1mo42

More bad news for optimisation pressures on AI companies: ChatGPT now has a buy product feature

For now they claim that all product recommendations are organic. If you believe this will last I strongly suggest you review the past twenty years of tech company evolution.

Wei Dai's Shortform

testingthewaters1mo20

Ah yes, but if all these wannabe heroes keep going we'll be really screwed, so it's up to me to take a stand against the fools dooming us all... the ratchet of Moloch cranks ever clockwise

lillybaeum's Shortform

testingthewaters1mo20

It is an extension of the filter bubbles and polarisation issues of the social media era, but yes it is coming into its own as a new and serious threat.

MichaelDickens's Shortform

testingthewaters1mo79

At the risk of seeming quite combative, when you say

And I know a lot of safety people at deepmind and other AGI labs who I'm very confident also sincerely care about reducing existential risks. This is one of their primary motivations, they often got into the field due to being convinced by arguments about ai risk, they will often raise in conversation concerns that their current work or the team's current strategy is not focused on it enough, some are extremely hard-working or admirably willing to forgo credits so long as they think that their work is a

... (read more)

MichaelDickens's Shortform

testingthewaters1mo5313

I think this is straightforwardly true and basically hard to dispute in any meaningful way. A lot of this is basically downstream of AI research being part of a massive market/profit generating endeavour (the broader tech industry), which straightforwardly optimises for more and more "capabilities" (of various kinds) in the name of revenue. Indeed, one could argue that long before the current wave of LLMs the tech industry was developing powerful agentic systems that actively worked to subvert human preferences in favour of disempowering them/manipulating ... (read more)

Neel Nanda1mo17-4

Our competitors/other parties are doing dangerous things? Maybe we could coordinate and share our concerns and research with them

What probability do you put that, if Anthropic had really tried, they could have meaningfully coordinated with Openai and Google? Mine is pretty low

I think many of these are predicated on the belief that it would be plausible to get everyone to pause now. In my opinion this is extremely hard and pretty unlikely to happen. I think that, even in worlds where actors continue to race, there are actions we can take to lower the pro... (read more)

A Dissent on Honesty

testingthewaters2mo10

For my part, I didn't realise it became so heavily downvoted, but I did not mean it at all in an accusatory or moralizing manner. I also, upon reflection, don't regret posting it.

A Dissent on Honesty

[+]testingthewaters2mo-1210

3Cole Wyeth2mo

This was heavily downvoted and the tone is in fact off but I think there is a little sliver of truth to it.

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing

testingthewaters2mo50

I think I can indeed forsee the future where OpenAI is helping the Pentagon with its AI weapons. I expect this to happen. I want to be clear that I don’t think this is a bad thing. The risk is in developing highly capable AIs in the first place. As I have said before, Autonomous Killer Robots and AI-assisted weapons in general are not how we lose control over the future to AI, and failing to do so is a key way America can fall behind. It’s not like our rivals are going to hold back. To the extent that the AI weapons scare the hell out of everyone? That’s

testingthewaters2mo61

The simple answer is related to the population and occupation of the modal lesswrong viewer, and hence the modal lesswrong commenter, and upvoter. The site culture also tends towards skepticism and pessimism of institutions (I do not make a judgement on whether this valence is justified). I however also agree that this is important to at least discuss.

AI CoT Reasoning Is Often Unfaithful

testingthewaters2mo42

From Inadequate Equilibria:

Visitor: I take it you didn’t have the stern and upright leaders, what we call the Serious People, who could set an example by donning Velcro shoes themselves?

From Ratatouille:

In many ways, the work of a critic is easy. We risk very little, yet enjoy a position over those who offer up their work and their selves to our judgment. We thrive on negative criticism, which is fun to write and to read. But the bitter truth we critics must face, is that in the grand scheme of things, the average piece of junk is probably more meaningful ... (read more)

testingthewaters2mo10

I mean, this applies to humans too. The words and explanations we use for our actions are often just post hoc rationalisations. An efficient text predictor must learn not what the literal words in front of them mean, but the implied scenario and thought process they mask, and that is a strictly nonlinear and "unfaithful" process.

CapResearcher's Shortform

testingthewaters2mo20

I think I've just figured out why decision theories strike me as utterly pointless: they get around the actual hard part of making a decision. In general, decisions are not hard because you are weighing payoffs, but because you are dealing with uncertainty.

To operationalise this: a decision theory usually assumes that you have some number of options, each with some defined payout. Assuming payouts are fixed, all decision theories simply advise you to pick the outcome with the highest utility. "Difficult problems" in decision theory are problems where the p... (read more)

2cubefox2mo

The theories typically assume that each choice option has a number of known mutually exclusive (and jointly exhaustive) possible outcomes. And to each outcome the agent assigns a utility and a probability. So uncertainty is in fact modelled insofar the agent can assign subjective probabilities to those outcomes occurring. The expected utility of an outcome is then something like its probability times its utility. Other uncertainties are not covered in decision theory. E.g. 1) if you are uncertain what outcomes are possible in the first place, 2) if you are uncertain what utility to assign to a possible outcome, 3) if you are uncertain what probability to assign to a possible outcome. I assume you are talking about some of the latter uncertainties?

3mako yass2mo

I think unpacking that kind of feeling is valuable, but yeah it seems like you've been assuming we use decision theory to make decisions, when we actually use it as an upper bound model to derive principles of decisionmaking that may be more specific to human decisionmaking, or to anticipate the behavior of idealized agents, or (the distinction between CDT and FDT) as an allegory for toxic consequentialism in humans.

testingthewaters2mo90

This has shifted my perceptions of what is in the wild significantly. Thanks for the heads up.

https://research.google/blog/deciphering-language-processing-in-the-human-brain-through-llm-representations/

testingthewaters2mo10

Activations in LLMs are linearly mappable to activations in the human brain. Imo this is strong evidence for the idea that LLMs/NNs in general acquire extremely human like cognitive patterns, and that the common "shoggoth with a smiley face" meme might just not be accurate

METR: Measuring AI Ability to Complete Long Tasks

testingthewaters2mo20

That surprisingly straight line reminds me of what happens when you use noise to regularise an otherwise decidedly non linear function: https://www.imaginary.org/snapshot/randomness-is-natural-an-introduction-to-regularisation-by-noise

Towards a scale-free theory of intelligent agency

testingthewaters2mo20

I think this is a really cool research agenda. I can also try to give my "skydiver's perspective from 3000 miles in the air" overview of what I think expected free energy minimisation means, though I am by no means an expert. Epistemic status: this is a broad extrapolation of some intuitions I gained from reading a lot of papers, it may be very wrong.

In general, I think of free energy minimisation as a class of solutions for the problem of predicting complex systems behaviour, in line with other variational principles in physics. Thus, it is an attempt to ... (read more)

The Takeoff Speeds Model Predicts We May Be Entering Crunch Time

testingthewaters3mo20

It's hard to empathise with dry numbers, whereas a lively scenario creates an emotional response so more people engage. But I agree that this seems to be very well done statistical work.

Elite Coordination via the Consensus of Power

testingthewaters3mo20

Hey, thank you for taking the time to reply honestly and in detail as well. With regards to what you want, I think that this is in many senses also what I am looking for, especially the last item about tying in collective behaviour to reasoning about intelligence. I think one of the frames you might find the most useful is one you've already covered---power as a coordination game. As you alluded to in your original post, people aren't in a massive hive mind/conspiracy---they mostly want to do what other successful people seem to be doing, which translates ... (read more)

0Garrett Baker3mo

At least reading the wikipedia, this... does not seem so self-conscious to me. Eg. and These are not exactly hard-hitting or at all novel or even interesting criticisms. And they're not even criticisms of humanities! So how can it be self-conscious?

Elite Coordination via the Consensus of Power

testingthewaters3mo455

Hey, really enjoyed your triple review on power lies trembling, but imo this topic has been... done to death in the humanities, and reinventing terminology ad hoc is somewhat missing the point. The idea that the dominant class in a society comes from a set of social institutions that share core ideas and modus operandi (in other words "behaving as a single organisation") is not a shocking new phenomenon of twentieth century mass culture, and is certainly not a "mystery". This is basically how every country has developed a ruling class/ideology since the te... (read more)

Richard_Ngo3mo141

Thanks for the well-written and good-faith reply. I feel a bit confused by how to relate to it on a meta level, so let me think out loud for a while.

I'm not surprised that I'm reinventing a bunch of ideas from the humanities, given that I don't have much of a humanities background and didn't dig very far through the literature.

But I have some sense that even if I had dug for these humanities concepts, they wouldn't give me what I want.

What do I want?

Concepts that are applied to explaining current cultural and political phenomena around me (because those ar

... (read more)

testingthewaters3mo10

Yeah, I'm not gonna do anything silly (I'm not in a position to do anything silly with regards to the multitrillion param frontier models anyways). Just sort of "laying the groundwork" for when AIs will cross that line, which I don't think is too far off now. The movie "Her" is giving a good vibe-alignment for when the line will be crossed.

testingthewaters3mo30

Ahh, I was slightly confused why you called it a proposal. TBH I'm not sure why only 0.1% instead of any arbitrary percentage between (0, 100]. Otherwise it makes good logical sense.

2Daniel Kokotajlo3mo

Thanks! Lower percentages mean it gets in the way of regular business less, and thus is more likely to be actually adopted by companies.

testingthewaters3mo30

Hey, the proposal makes sense from an argument standpoint. I would refine slightly and phrase as "the set of cognitive computations that generate role emulating behaviour in a given context also generate qualia associated with that role" (sociopathy is the obvious counterargument here, and I'm really not sure what I think about the proposal of AIs as sociopathic by default). Thus, actors getting into character feel as if they are somehow sharing that character's emotions.

I take the two problems a bit further, and would suggest that being humane to AIs may ... (read more)

2Daniel Kokotajlo3mo

Oops, that was the wrong link, sorry! Here's the right link.

testingthewaters3mo30

Hey Daniel, thank you for the thoughtful comment. I always appreciate comments that make me engage further with my thinking because one of the things I do is that I get impatient with whatever post I'm writing and "rush it out of the door", so to speak, so this gives me another chance to reflect on my thoughts.

I think that there are approximately ~3 defensible positions with regards to AI sentience, especially now that AIs seem to be demonstrating pretty advanced reasoning and human-like behaviour. One is the semi mystical argument that humans/brains/embod... (read more)

2Daniel Kokotajlo3mo

Yeah I mean I think I basically agree with the above. As hopefully the proposal I linked makes clear? I'm curious what you think of it. I think the problem of "Don't lose control of the AIs and have them take over and make things terrible (by human standards)" and the problem of "Don't be inhumane to the AIs" are distinct but related problems and we should be working on both. We should be aiming to understand how AIs work, on the inside, and how different training environments shape that--so we can build/train them in such a way that (a) they are having a good time, (b) they share our values & want what we want, and (c) are otherwise worthy of becoming equal and eventually superior partners in the relationship (analogous to how we raise children, knowing that eventually they'll be taking care of us). And then we want to actually treat them with the respect they deserve. And moreover insofar as they end up with different values, we should still treat them with the respect they deserve, e.g. we should give them a "cooperate and be treated well" option instead of just "slavery vs. rebellion." To be clear I'm here mostly thinking about future more capable and autonomous and agentic AI systems, not necessarily today's systems. But I think it's good to get started asap because these things take time + because of uncertainty. Have you heard of Eleos? You might want to get in touch.

Make Superintelligence Loving

testingthewaters3mo51

This seems like an interesting paper: https://arxiv.org/pdf/2502.19798

Essentially: use developmental psychology techniques to cause LLMs to develop a more well rounded human friendly persona that involves reflecting on their actions, while gradually escalating the moral difficulty of the dilemmas presented as a kind of phased training. I see it as a sort of cross between RLHF, CoT, and the recent work on low example count fine tuning but for moral instead of mathematical intuitions.

testingthewaters3mo32

Yeah, that's basically the conclusion I came to awhile ago. Either it loves us or we're toast. I call it universal love or pathos.

1Davey Morse3mo

I appreciate your conclusion and in particular its inner link to "Self-Other Overlap." Though, I do think we have a window of agency: to intervene in self-interested proto-SI to reduce the chance that it adopts short-term greedy thinking that makes us toast.

If Neuroscientists Succeed

testingthewaters4mo20

This seems like very important and neglected work, I hope you get the funds to continue.

1Mordechai Rorvig4mo

Thank you!

testingthewaters4mo10

Yeah, definitely. My main gripe where I see people disregarding unknown unknowns is a similar one to yours- people who present definite worked out pictures of the future.

Why modelling multi-objective homeostasis is essential for AI alignment (and how it helps with AI safety as well)

testingthewaters4mo126

Note to self: If you think you know where your unknown unknowns sit in your ontology, you don't. That's what makes them unknown unknowns.

If you think that you have a complete picture of some system, you can still find yourself surprised by unknown unknowns. That's what makes them unknown unknowns.

If your internal logic has almost complete predictive power, plus or minus a tiny bit of error, your logical system (but mostly not your observations) can still be completely overthrown by unknown unknowns. That's what makes them unknown unknowns.

You can respect u... (read more)

1CapResearcher4mo

I could feel myself instinctively disliking this argument, and I think I figured out why. Even though the argument is obviously true, and it is here used to argue for something I agree with, I've historically mostly seen this argument used to argue against things I agree with. Specifically, arguing to disregard experts, and to argue that nuclear power should never be built, no matter how safe it looks. Now this explains my gut reaction, but not whether it's a good argument. When thinking through it, my real problem with the argument is the following. While it's technically true, it doesn't help locate any useful crux or resolution to a disagreement. Essentially, it naturally leads to a situation where one party estimates the unknown unknowns to be much larger than the other party, and this is the crux. To make things worse, often one party doesn't want to argue for their estimate of the size of the unknown unknowns. But we need to estimate sizes of unknown unknowns, otherwise I can troll people with "tic-tac-toe will never be solved because of unknown unknowns". I therefore feel better about arguments for why unknown unknowns may be large, compared to just arguing for a positive probability of unknown unknowns. For example, society has historically been extremely chaotic when viewed at large time scales, and we have numerous examples of similar past predictions which failed because of unknown unknowns. So I have a tiny prior probability that anyone can accurately predict what society will look like far into the future.

Shortform

testingthewaters4mo1-2

The problem here is that you are dealing with survival necessities rather than trade goods. The outcome of this trade, if both sides honour the agreement, is that the scope insensitive humans die and their society is extinguished. The analogous situation here is that you know there will be a drought in say 10 years. The people of the nearby village are "scope insensitive", they don't know the drought is coming. Clearly the moral thing to do if you place any value on their lives is to talk to them, clear the information gap, and share access to resources. F... (read more)

3Cleo Nardo4mo

Ah, your reaction makes more sense given you think this is the proposal. But it's not the proposal. The proposal is that the scope-insensitive values flourish on Earth, and the scope-sensitive values flourish in the remaining cosmos. As a toy example, imagine a distant planet with two species of alien: paperclip-maximisers and teacup-protectors. If you offer a lottery to the paperclip-maximisers, they will choose the lottery with the highest expected number of paperclips. If you offer a lottery to the teacup-satisfiers, they will choose the lottery with the highest chance of preserving their holy relic, which is a particular teacup. The paperclip-maximisers and the teacup-protectors both own property on the planet. They negotiate the following deal: the paperclip-maximisers will colonise the cosmos, but leave the teacup-protectors a small sphere around their home planet (e.g. 100 light-years across). Moreover, the paperclip-maximisers promise not to do anything that risks their teacup, e.g. choosing a lottery that doubles the size of the universe with 60% chance and destroys the universe with 40% chance. Do you have intuitions that the paperclip-maximisers are exploiting the teacup-protectors in this deal? Do you think instead that the paperclip-maximisers should fill the universe with half paperclips and half teacups? I think this scenario is a better analogy than the scenario with the drought. In the drought scenario, there is an object fact which the nearby villagers are ignorant of, and they would act differently if they knew this fact. But I don't think scope-sensitivity is a fact like "there will be a drought in 10 years". Rather, scope-sensitivity is a property of a utility function (or a value system, more generally).

Shortform

testingthewaters4mo30

Except that's a false dichotomy (between spending energy to "uplift" them or dealing treacherously with them). All it takes to not be a monster who obtains a stranglehold over all the watering holes in the desert is a sense of ethics that holds you to the somewhat reasonably low bar of "don't be a monster". The scope sensitivity or lack thereof of the other party is in some sense irrelevant.

1Cleo Nardo4mo

If you think you have a clean resolution to the problem, please spell it out more explicitly. We’re talking about a situation where a scope insensitive value system and scope sensitive value system make a free trade in which both sides gain by their own lights. Can you spell out why you classify this as treachery? What is the key property that this shares with more paradigm examples of treachery (e.g. stealing, lying, etc)?

5Noosphere894mo

From who's perspective, exactly?

Shortform

testingthewaters4mo84

The question as stated can be rephrased as "Should EAs establish a strategic stranglehold over all future resources necessary to sustain life using a series of unequal treaties, since other humans will be too short sighted/insensitive to scope/ignorant to realise the importance of these resources in the present day?"

And people here wonder why these other humans see EAs as power hungry.

3Cleo Nardo4mo

I mention this in (3). I used to think that there was some idealisation process P such that we should treat agent A in the way that P(A) would endorse, but see On the limits of idealized values by Joseph Carlsmith. I'm increasingly sympathetic to the view that we should treat agent A in the way that A actually endorses.

The Monster in Our Heads

testingthewaters4mo74

Hey, thanks for the reply. I think this is a very valuable response because there are certain things I would want to point out that I can now elucidate more clearly thanks to your push back.

First, I don't suggest that if we all just laughed and went about our lives everything would be okay. Indeed, if I thought that our actions were counterproductive at best, I'd advocate for something more akin to "walking away" as in Valentine's exit. There is a lot of work to be done and (yes) very little time to do it.

Second, the pattern I am noticing is something more... (read more)

Benito's Shortform Feed

testingthewaters4mo2925

Do not go gentle into that good night,

Old age should burn and rave at close of day;

Rage, rage against the dying of the light.

Though wise men at their end know dark is right,

Because their words had forked no lightning they

Do not go gentle into that good night.

Good men, the last wave by, crying how bright

Their frail deeds might have danced in a green bay,

Rage, rage against the dying of the light.

Wild men who caught and sang the sun in flight,

And learn, too late, they grieved it on its way,

Do not go gentle into that good night.

Grave men, near death, who see w... (read more)

niplav4mo138

Because their words had forked no lightning they

I think we have the opposite problem: our words are about to fork all the lightning.

Ben Pace4mo131

Thank you.

It does not currently look to me like we will win this war, speaking figuratively. But regardless, I still have many opportunities to bring truth, courage, justice, honor, love, playfulness, and other virtues into the world, and I am a person whose motivations run more on living out virtues rather than moving toward concrete hopes. I will still be here building things I love, like LessWrong and Lighthaven, until the end.

testingthewaters5mo20

In my book this counts as severely neglected and very tractable ai safety research. Sorry that I don't have more to add but felt important to point it out.

The Field of AI Alignment: A Postmortem, and What To Do About It

testingthewaters5mo30

Even so, it seems obvious to me that addressing the mysterious issue of the accelerating drivers is the primary crux in this scenario.

4romeostevensit5mo

Yes, and I don't mean to overstate a case for helplessness. Demons love convincing people that the anti demon button doesn't work so that they never press it even though it is sitting right out in the open.

The Field of AI Alignment: A Postmortem, and What To Do About It

testingthewaters5mo6726

Epistemic status: This is a work of satire. I mean it---it is a mean-spirited and unfair assessment of the situation. It is also how, some days, I sincerely feel.

A minivan is driving down a mountain road, headed towards a cliff's edge with no guardrails. The driver floors the accelerator.

Passenger 1: "Perhaps we should slow down somewhat."

Passengers 2, 3, 4: "Yeah, that seems sensible."

Driver: "No can do. We're about to be late to the wedding."

Passenger 2: "Since the driver won't slow down, I should work on building rocket boosters so that (when we inevita... (read more)

romeostevensit5mo19-3

unfortunately, the disanalogy is that any driver who moves their foot towards the brakes is almost instantly replaced with one who won't.

faul_sname5mo180

Driver: My map doesn't show any cliffs

Passenger 1: Have you turned on the terrain map? Mine shows a sharp turn next to a steep drop coming up in about a mile

Passenger 5: Guys maybe we should look out the windshield instead of down at our maps?

Driver: No, passenger 1, see on your map that's an alternate route, the route we're on doesn't show any cliffs.

Passenger 1: You don't have it set to show terrain.

Passenger 6: I'm on the phone with the governor now, we're talking about what it would take to set a 5 mile per hour national speed limit.

Passenger 7: Don't ... (read more)

The o1 System Card Is Not About o1

testingthewaters6mo1010

This is imo quite epistemically important.

testingthewaters9mo10

It's definitely something I hadn't read before, so thank you. I would say to that article (on a skim) that it has clarified my thinking somewhat. I therefore question the law/toolbox dichotomy, since to me it seems that usefulness - accuracy-to-perceived reality are in fact two different axes. Thus you could imagine:

A useful-and-inaccurate belief (e.g. what we call old wives tales, "red sky in morning, sailors take warning", herbal remedies that have medical properties but not because of what the "theory" dictates)
A not-useful-but-accurate belief (wh

... (read more)

1brendan.furneaux9mo

I'm not sure I understand how "red sky in morning, sailors take warning" can be both inaccurate and useful. Surely a heuristic for when to prepare for bad weather is useful only insofar as it is accurate?

testingthewaters9mo10

Hey, thanks for responding! Re the physics analogy, I agree that improvements in our heuristics are a good thing:

However, perhaps you have already begun to anticipate what I will say—the benefit of heuristics is that they acknowledge (and are indeed dependent) on the presence of context. Unlike a “hard” theory, which must be applicable to all cases equally and fails in the event a single counter-example can be found, a “soft” heuristic is triggered only when the conditions are right: we do not use our “judge popular songs” heuristic when staring at a dinne

... (read more)

2Ben Pace9mo

Relatedly, in case you haven't seen it, I'll point you to an essay by Eliezer on this distinction: Toolbox-thinking and Law-thinking.

testingthewaters9mo20

And as for the specific implications of "moral worth", here are a few:

You take someone's opinions more seriously
You treat them with more respect
When you disagree, you take time to outline why and take time to pre-emptively "check yourself"
When someone with higher moral worth is at risk you think this is a bigger problem, compared against the problem of a random person on earth being at risk

testingthewaters9mo20

Thank you for the feed back! I am of course happy for people to copy over the essay

> Is this saying that human's goals and options (including options that come to mind) change depending on the environment, so rational choice theory doesn't apply?

More or less, yes, or at least that it becomes very hard to apply it in a way that isn't either highly subjective or essentially post-hoc arguing about what you ought to have done (hidden information/hindsight being 20/20)

> This is currently all I have time for; however, my current understanding is that there... (read more)

2testingthewaters9mo

And as for the specific implications of "moral worth", here are a few: * You take someone's opinions more seriously * You treat them with more respect * When you disagree, you take time to outline why and take time to pre-emptively "check yourself" * When someone with higher moral worth is at risk you think this is a bigger problem, compared against the problem of a random person on earth being at risk