Please make this a top level post, just copying and pasting the text here would be enough.
I'm really sorry for your loss. It's a crushing thing and as my parents get older I often feel that gnawing terror and anxiety as well. I hope you can find some peace eventually.
I believe that this is the crux of debate:
With computationally unbounded debaters, debate with optimal play (and cross-examination) can answer any question in NEXP given polynomial time judges.
The crux being:
Interesting questions about the world can be formalised as problems with deterministic solutions to be resolved via an iterated proof process.
Yet consider that most "weighty" problems safe AI version N will have to resolve may be of the form:
Forecast the impact of policy X on the world. Is the impact good or bad?
Where X can be "deploy safe ...
If you want a semi-amibtious idea shaped around UI, how about "serendipity generation for AI Safety/other related topics"
This would be a LW addon or separate site with a few features:
Google potentially adding ads to gemini:
https://arstechnica.com/ai/2025/05/google-is-quietly-testing-ads-in-ai-chatbots/
OpenAI adds shopping to chatgpt:
https://www.wired.com/story/openai-adds-shopping-to-chatgpt/
If there's anything the history of advertising should tell us, it is that there will be powerful optimisation pressures for persuasion being developed quietly in the background for all future model post training pipelines.
Well said. Bravo.
More bad news for optimisation pressures on AI companies: ChatGPT now has a buy product feature
https://www.wired.com/story/openai-adds-shopping-to-chatgpt/
For now they claim that all product recommendations are organic. If you believe this will last I strongly suggest you review the past twenty years of tech company evolution.
Ah yes, but if all these wannabe heroes keep going we'll be really screwed, so it's up to me to take a stand against the fools dooming us all... the ratchet of Moloch cranks ever clockwise
It is an extension of the filter bubbles and polarisation issues of the social media era, but yes it is coming into its own as a new and serious threat.
At the risk of seeming quite combative, when you say
...And I know a lot of safety people at deepmind and other AGI labs who I'm very confident also sincerely care about reducing existential risks. This is one of their primary motivations, they often got into the field due to being convinced by arguments about ai risk, they will often raise in conversation concerns that their current work or the team's current strategy is not focused on it enough, some are extremely hard-working or admirably willing to forgo credits so long as they think that their work is a
I think this is straightforwardly true and basically hard to dispute in any meaningful way. A lot of this is basically downstream of AI research being part of a massive market/profit generating endeavour (the broader tech industry), which straightforwardly optimises for more and more "capabilities" (of various kinds) in the name of revenue. Indeed, one could argue that long before the current wave of LLMs the tech industry was developing powerful agentic systems that actively worked to subvert human preferences in favour of disempowering them/manipulating ...
Our competitors/other parties are doing dangerous things? Maybe we could coordinate and share our concerns and research with them
What probability do you put that, if Anthropic had really tried, they could have meaningfully coordinated with Openai and Google? Mine is pretty low
I think many of these are predicated on the belief that it would be plausible to get everyone to pause now. In my opinion this is extremely hard and pretty unlikely to happen. I think that, even in worlds where actors continue to race, there are actions we can take to lower the pro...
For my part, I didn't realise it became so heavily downvoted, but I did not mean it at all in an accusatory or moralizing manner. I also, upon reflection, don't regret posting it.
...I think I can indeed forsee the future where OpenAI is helping the Pentagon with its AI weapons. I expect this to happen. I want to be clear that I don’t think this is a bad thing. The risk is in developing highly capable AIs in the first place. As I have said before, Autonomous Killer Robots and AI-assisted weapons in general are not how we lose control over the future to AI, and failing to do so is a key way America can fall behind. It’s not like our rivals are going to hold back. To the extent that the AI weapons scare the hell out of everyone? That’s
The simple answer is related to the population and occupation of the modal lesswrong viewer, and hence the modal lesswrong commenter, and upvoter. The site culture also tends towards skepticism and pessimism of institutions (I do not make a judgement on whether this valence is justified). I however also agree that this is important to at least discuss.
From Inadequate Equilibria:
Visitor: I take it you didn’t have the stern and upright leaders, what we call the Serious People, who could set an example by donning Velcro shoes themselves?
From Ratatouille:
In many ways, the work of a critic is easy. We risk very little, yet enjoy a position over those who offer up their work and their selves to our judgment. We thrive on negative criticism, which is fun to write and to read. But the bitter truth we critics must face, is that in the grand scheme of things, the average piece of junk is probably more meaningful ...
I mean, this applies to humans too. The words and explanations we use for our actions are often just post hoc rationalisations. An efficient text predictor must learn not what the literal words in front of them mean, but the implied scenario and thought process they mask, and that is a strictly nonlinear and "unfaithful" process.
I think I've just figured out why decision theories strike me as utterly pointless: they get around the actual hard part of making a decision. In general, decisions are not hard because you are weighing payoffs, but because you are dealing with uncertainty.
To operationalise this: a decision theory usually assumes that you have some number of options, each with some defined payout. Assuming payouts are fixed, all decision theories simply advise you to pick the outcome with the highest utility. "Difficult problems" in decision theory are problems where the p...
This has shifted my perceptions of what is in the wild significantly. Thanks for the heads up.
Activations in LLMs are linearly mappable to activations in the human brain. Imo this is strong evidence for the idea that LLMs/NNs in general acquire extremely human like cognitive patterns, and that the common "shoggoth with a smiley face" meme might just not be accurate
That surprisingly straight line reminds me of what happens when you use noise to regularise an otherwise decidedly non linear function: https://www.imaginary.org/snapshot/randomness-is-natural-an-introduction-to-regularisation-by-noise
I think this is a really cool research agenda. I can also try to give my "skydiver's perspective from 3000 miles in the air" overview of what I think expected free energy minimisation means, though I am by no means an expert. Epistemic status: this is a broad extrapolation of some intuitions I gained from reading a lot of papers, it may be very wrong.
In general, I think of free energy minimisation as a class of solutions for the problem of predicting complex systems behaviour, in line with other variational principles in physics. Thus, it is an attempt to ...
It's hard to empathise with dry numbers, whereas a lively scenario creates an emotional response so more people engage. But I agree that this seems to be very well done statistical work.
Hey, thank you for taking the time to reply honestly and in detail as well. With regards to what you want, I think that this is in many senses also what I am looking for, especially the last item about tying in collective behaviour to reasoning about intelligence. I think one of the frames you might find the most useful is one you've already covered---power as a coordination game. As you alluded to in your original post, people aren't in a massive hive mind/conspiracy---they mostly want to do what other successful people seem to be doing, which translates ...
Hey, really enjoyed your triple review on power lies trembling, but imo this topic has been... done to death in the humanities, and reinventing terminology ad hoc is somewhat missing the point. The idea that the dominant class in a society comes from a set of social institutions that share core ideas and modus operandi (in other words "behaving as a single organisation") is not a shocking new phenomenon of twentieth century mass culture, and is certainly not a "mystery". This is basically how every country has developed a ruling class/ideology since the te...
Thanks for the well-written and good-faith reply. I feel a bit confused by how to relate to it on a meta level, so let me think out loud for a while.
I'm not surprised that I'm reinventing a bunch of ideas from the humanities, given that I don't have much of a humanities background and didn't dig very far through the literature.
But I have some sense that even if I had dug for these humanities concepts, they wouldn't give me what I want.
What do I want?
Yeah, I'm not gonna do anything silly (I'm not in a position to do anything silly with regards to the multitrillion param frontier models anyways). Just sort of "laying the groundwork" for when AIs will cross that line, which I don't think is too far off now. The movie "Her" is giving a good vibe-alignment for when the line will be crossed.
Ahh, I was slightly confused why you called it a proposal. TBH I'm not sure why only 0.1% instead of any arbitrary percentage between (0, 100]. Otherwise it makes good logical sense.
Hey, the proposal makes sense from an argument standpoint. I would refine slightly and phrase as "the set of cognitive computations that generate role emulating behaviour in a given context also generate qualia associated with that role" (sociopathy is the obvious counterargument here, and I'm really not sure what I think about the proposal of AIs as sociopathic by default). Thus, actors getting into character feel as if they are somehow sharing that character's emotions.
I take the two problems a bit further, and would suggest that being humane to AIs may ...
Hey Daniel, thank you for the thoughtful comment. I always appreciate comments that make me engage further with my thinking because one of the things I do is that I get impatient with whatever post I'm writing and "rush it out of the door", so to speak, so this gives me another chance to reflect on my thoughts.
I think that there are approximately ~3 defensible positions with regards to AI sentience, especially now that AIs seem to be demonstrating pretty advanced reasoning and human-like behaviour. One is the semi mystical argument that humans/brains/embod...
This seems like an interesting paper: https://arxiv.org/pdf/2502.19798
Essentially: use developmental psychology techniques to cause LLMs to develop a more well rounded human friendly persona that involves reflecting on their actions, while gradually escalating the moral difficulty of the dilemmas presented as a kind of phased training. I see it as a sort of cross between RLHF, CoT, and the recent work on low example count fine tuning but for moral instead of mathematical intuitions.
Yeah, that's basically the conclusion I came to awhile ago. Either it loves us or we're toast. I call it universal love or pathos.
This seems like very important and neglected work, I hope you get the funds to continue.
Yeah, definitely. My main gripe where I see people disregarding unknown unknowns is a similar one to yours- people who present definite worked out pictures of the future.
Note to self: If you think you know where your unknown unknowns sit in your ontology, you don't. That's what makes them unknown unknowns.
If you think that you have a complete picture of some system, you can still find yourself surprised by unknown unknowns. That's what makes them unknown unknowns.
If your internal logic has almost complete predictive power, plus or minus a tiny bit of error, your logical system (but mostly not your observations) can still be completely overthrown by unknown unknowns. That's what makes them unknown unknowns.
You can respect u...
The problem here is that you are dealing with survival necessities rather than trade goods. The outcome of this trade, if both sides honour the agreement, is that the scope insensitive humans die and their society is extinguished. The analogous situation here is that you know there will be a drought in say 10 years. The people of the nearby village are "scope insensitive", they don't know the drought is coming. Clearly the moral thing to do if you place any value on their lives is to talk to them, clear the information gap, and share access to resources. F...
Except that's a false dichotomy (between spending energy to "uplift" them or dealing treacherously with them). All it takes to not be a monster who obtains a stranglehold over all the watering holes in the desert is a sense of ethics that holds you to the somewhat reasonably low bar of "don't be a monster". The scope sensitivity or lack thereof of the other party is in some sense irrelevant.
The question as stated can be rephrased as "Should EAs establish a strategic stranglehold over all future resources necessary to sustain life using a series of unequal treaties, since other humans will be too short sighted/insensitive to scope/ignorant to realise the importance of these resources in the present day?"
And people here wonder why these other humans see EAs as power hungry.
Hey, thanks for the reply. I think this is a very valuable response because there are certain things I would want to point out that I can now elucidate more clearly thanks to your push back.
First, I don't suggest that if we all just laughed and went about our lives everything would be okay. Indeed, if I thought that our actions were counterproductive at best, I'd advocate for something more akin to "walking away" as in Valentine's exit. There is a lot of work to be done and (yes) very little time to do it.
Second, the pattern I am noticing is something more...
Do not go gentle into that good night,
Old age should burn and rave at close of day;
Rage, rage against the dying of the light.
Though wise men at their end know dark is right,
Because their words had forked no lightning they
Do not go gentle into that good night.
Good men, the last wave by, crying how bright
Their frail deeds might have danced in a green bay,
Rage, rage against the dying of the light.
Wild men who caught and sang the sun in flight,
And learn, too late, they grieved it on its way,
Do not go gentle into that good night.
Grave men, near death, who see w...
Because their words had forked no lightning they
I think we have the opposite problem: our words are about to fork all the lightning.
Thank you.
It does not currently look to me like we will win this war, speaking figuratively. But regardless, I still have many opportunities to bring truth, courage, justice, honor, love, playfulness, and other virtues into the world, and I am a person whose motivations run more on living out virtues rather than moving toward concrete hopes. I will still be here building things I love, like LessWrong and Lighthaven, until the end.
In my book this counts as severely neglected and very tractable ai safety research. Sorry that I don't have more to add but felt important to point it out.
Even so, it seems obvious to me that addressing the mysterious issue of the accelerating drivers is the primary crux in this scenario.
Epistemic status: This is a work of satire. I mean it---it is a mean-spirited and unfair assessment of the situation. It is also how, some days, I sincerely feel.
A minivan is driving down a mountain road, headed towards a cliff's edge with no guardrails. The driver floors the accelerator.
Passenger 1: "Perhaps we should slow down somewhat."
Passengers 2, 3, 4: "Yeah, that seems sensible."
Driver: "No can do. We're about to be late to the wedding."
Passenger 2: "Since the driver won't slow down, I should work on building rocket boosters so that (when we inevita...
unfortunately, the disanalogy is that any driver who moves their foot towards the brakes is almost instantly replaced with one who won't.
Driver: My map doesn't show any cliffs
Passenger 1: Have you turned on the terrain map? Mine shows a sharp turn next to a steep drop coming up in about a mile
Passenger 5: Guys maybe we should look out the windshield instead of down at our maps?
Driver: No, passenger 1, see on your map that's an alternate route, the route we're on doesn't show any cliffs.
Passenger 1: You don't have it set to show terrain.
Passenger 6: I'm on the phone with the governor now, we're talking about what it would take to set a 5 mile per hour national speed limit.
Passenger 7: Don't ...
This is imo quite epistemically important.
It's definitely something I hadn't read before, so thank you. I would say to that article (on a skim) that it has clarified my thinking somewhat. I therefore question the law/toolbox dichotomy, since to me it seems that usefulness - accuracy-to-perceived reality are in fact two different axes. Thus you could imagine:
Hey, thanks for responding! Re the physics analogy, I agree that improvements in our heuristics are a good thing:
...However, perhaps you have already begun to anticipate what I will say—the benefit of heuristics is that they acknowledge (and are indeed dependent) on the presence of context. Unlike a “hard” theory, which must be applicable to all cases equally and fails in the event a single counter-example can be found, a “soft” heuristic is triggered only when the conditions are right: we do not use our “judge popular songs” heuristic when staring at a dinne
And as for the specific implications of "moral worth", here are a few:
Thank you for the feed back! I am of course happy for people to copy over the essay
> Is this saying that human's goals and options (including options that come to mind) change depending on the environment, so rational choice theory doesn't apply?
More or less, yes, or at least that it becomes very hard to apply it in a way that isn't either highly subjective or essentially post-hoc arguing about what you ought to have done (hidden information/hindsight being 20/20)
> This is currently all I have time for; however, my current understanding is that there...
Yeah, of course
Uh, to be honest, I'm not sure why that's supposed to make me feel better. The substantive argument here is that the process by which safety assessments are produced is flawed, and the response is "well the procedure is flawed but we'll come up with a better one by the time it gets really dangerous".
My response would be that if you don't have a good procedure when the models are stateless and passive, you probably will find it difficult to design a better one when models are stateful and proactive.
I was going to write a similar response, albeit including the fact that Anthropic's current aim, afacit, is to build recursively self-improving models; ones which Dario seems to believe might be far smarter than any person alive as early as next year. If the current state of alignment testing is "there's a substantial chance this paradigm completely fails to catch alignment problems," as I took nostalgebraist to be arguing, it raises the question of how this might transition into "there's essentially zero chance this paradigm fails" on the timescale of wha... (read more)