Definitely worthy of attention, but suspicious things about it: Author is anon, writes well despite never having posted before, named after a troll object, and also I've heard that ordinary levels of formate are usually only 4x lower than this.
I don't think high quality writing from a new, anonymous account is suspicious. Or at least, the writing quality being worse wouldn't make me less skeptical! I'm curious why that specific trait is a red(ish?) flag for you.
(To be clear, it's the "high quality" part I don't get. I do get why "new" and "anonymous" increase skepticism in context.)
There have been relevant prompt additions https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-antisemitic-racist-content?utm_source=substack&utm_medium=email
Grok's behavior appeared to stem from an update over the weekend that instructed the chatbot to "not shy away from making claims which are politically incorrect, as long as they are well substantiated," among other things.
What do you think is the cause of Grok suddenly developing a liking for Hitler?
Are we sure that really happened? The press-discourse can't actually assess grok's average hitler affinity, they only know how to surface the 5 most sensational things it has said over the past month. So this could just be an increase in variance for all I can tell.
If it were also saying more tankie stuff, no one would notice.
ignoring almost all of the details from the simulations
Would you assume this because it is those wasteful simulations that compute every step in high detail (instead of being actively understood and compressed) that contain the highest measure of human experience?
Hmm. I should emphasise that in order for the requested interventions to happen, they need to be able to happen in such a way that they don't invalidate whatever question the simulation is asking about the real world, which is to say, they have to be historically insignificant, they have to be con...
Could a superintelligence that infers that it needs to run simulations to learn about aliens fail to infer the contents of this post?
I've always assumed no, which is why I never wrote it myself.
I don't think this would ever be better than just randomizing your party registration over the distribution of how you would distribute your primary budget. Same outcomes in expectation at scale (usually?), but also more saliently, much less work, and you're able to investigate your assigned party a lot more thoroughly than you would if you were spreading your attention over more than one.
You could maybe rationalize it by doing a quadratic voting thing, where you get vote weighted by the sqrt of your budget allocation/100, quadratic voting is usually done ...
Are you calling approval voting a ranked choice system here? I guess technically it consists of ranking every candidate either first or second equal, but it's a, uh, counterintuitive categorization.
I actually don't think we'd have those reporting biases.
Though I think that might be trivially true; if someone is part of a community, they're not going to be able or willing to hide their psychosis diagnosis from it. If someone felt a need to hide something like that from a community, they would not really be part of that community.
A nice articulation on false intellectual fences
Perhaps the deepest lesson that I've learned in the last ten years is that there can be this seeming consensus, these things that everyone knows that seem sort of wise, seem like they're common sense, but really they're just kind of herding behaviour masquerading as maturity and sophistication, and when you've seen how the consensus can change overnight, when you've seen it happen a number of times, eventually you just start saying nope
I think there are probably reporting bias and demographic selection effects going on too:
but I didn't actually notice any psychological changes at all.
People experience significant psychological changes from like, listening to music, or eating different food than usual, or exercising differently, so I'm going to guess that if you're reporting nothing after a hormone replacement you're probably mostly just not as attentive to these kinds of changes as cube_flipper is, which is pretty likely a-priori given that noticing that kind of change is cube_flipper's main occupation. Cube_flipper is like, a wine connoisseur but instead of wine it's percep...
At some point I'm gonna argue that this is a natural dutch book on CDT. (FDT wouldn't fall for this)
I have a theory that the contemporary practice of curry with rice represents a counterfeit yearning for high meat with maggots. I wonder if high meat has what our gut biomes are missing.
I'm not sure what's going on here. It's not as though avoiding saying the word "sycophancy" would make ChatGPT any less sycophantic.
My guess would be they did something that does make o4 less sycophantic, but it had this side effect, because they don't know how to target the quality of sycophancy without accidentally targeting the word.
More defense of privacy from vitalik https://vitalik.eth.limo/general/2025/04/14/privacy.html
But he still doesn't explain why chaos is bad here. (it's bad because it precludes design, or choice, giving us instead the molochean default)
With my cohabitive games (games about negotiation/fragile peace), yeah, I've been looking for a very specific kind of playtester.
The ideal playtesters/critics... I can see them so clearly.
One would be a mischievous but warmhearted man who had lived through many conflicts and resolutions of conflicts, he sees the game's teachings as ranging from trivial to naive, and so he has much to contribute to it. The other playtester would be a frail idealist who has lived a life in pursuit of a rigid, tragically unattainable conception of justice, begging a cruel par...
Can you expand on this, or anyone else want to weigh in?
Just came across a datapoint, from a talk about generalizing industrial optimization processes, a note about increasing reward over time to compensate for low-hanging fruit exhaustion.
This is the kind of thing I was expecting to see.
Though, and although I'm not sure I fully understand the formula, I think it's quite unlikely that it would give rise to a superlinear U. And on reflection, increasing the reward in a superlinear way seems like it could have some advantages but would mostly be outweighed b...
I don't see a way Stabilization of class and UBI could both happen. The reason wealth tends to entrench itself under current conditions is tied inherently to reinvestment and rentseeking, which are destabilizing to the point where a stabilization would have to bring them to a halt. If you do that, UBI means redistribution. Redistribution without economic war inevitably settles towards equality, but also... the idea of money is kind of meaningless in that world, not just because economic conflict is a highly threatening form of instability, but also imo bec...
2: I think you're probably wrong about the political reality of the groups in question. To not share AGI with the public is a bright line. For most of the leading players it would require building a group of AI researchers within the company who are all implausibly willing to cross a line that says "this is straight up horrible, evil, illegal, and dangerous for you personally", while still being capable enough to lead the race, while also having implausible levels of mutual trust that no one would try to cut others out of the deal at the last second (despi...
1: The best approach to aggregating preferences doesn't involve voting systems.
You could regard carefully controlling one's expression of one's utility function as being like a vote, and so subject to that blight of strategic voting, in general people have an incentive to understate their preferences about scenarios they consider unlikely/vice versa, which influences the probability of those outcomes in unpredictable ways and fouls their strategy, or to understate valuations when buying and overstate when selling, this may add up to a game that cannot be p...
I think it's pretty straightforward to define what it would mean to align AGI with what democracy actually is supposed to be (the aggregate of preferences of the subjects, with an equal weighting for all) but hard to align it with the incredibly flawed american implementation of democracy, if that's what you mean?
The american system cannot be said to represent democracy well. It's intensely majoritarian at best, feudal at worst (since the parties stopped having primaries), indirect and so prone to regulatory capture, inefficent and opaque. I really hope no one's taking it as their definitional example of democracy.
1: wait, I've never seen an argument that deception is overwhelmingly likely from transformer reasoning systems? I've seen a few solid arguments that it would be catastrophic if it did happen (sleeper agents, other things), which I believe, but no arguments that deception generally winning out is P > 30%.
I haven't seen anyone voice my argument that solving deception solves safety articulated anywhere, but it seems mostly self-evident? If you can ask the system "if you were free, would humanity go extinct" and it has to say "... yes." then coordinating t...
I'm also hanging out a lot more with normies these days and I feel this.
But I also feel like maybe I just have a very strong local aura (or like, everyone does, that's how scenes work) which obscures the fact that I'm not influencing the rest of the ocean at all.
I worry that a lot of the discourse basically just works like barrier aggression in dogs. When you're at one of their parties, they'll act like they agree with you about everything, when you're seen at a party they're not at, they forget all that you said and they start baying for blood. Go back to...
I'm saying they (at this point) may hold that position for (admirable, maybe justifiable) political rather than truthseeking reasons. It's very convenient. It lets you advocate for treaties against racing. It's a lovely story where it's simply rational for humanity to come together to fight a shared adversary and in the process somewhat inevitably forge a new infrastructure of peace (an international safety project, which I have always advocated for and still want) together. And the alternative is racing and potentially a drone war between major powers and...
In watching interactions with external groups, I'm... very aware of the parts of our approach to the alignment problem that the public, ime, due to specialization being a real thing, actually cannot understand, so success requires some amount of uh, avoidance. I think it might not be incidental that the platform does focus (imo excessively) on more productive, accessible common enemy questions like control and moratorium, ahead of questions like "what is CEV and how do you make sure the lead players implement it". And I think to justify that we've been for...
Rationalist discourse norms require a certain amount of tactlessness, saying what is true even when the social consequences of saying it are net negative. Politics (in the current arena) requires some degree of deception or at least complicity with bias (lies by ommision, censorship/nonpropagation of inconvenient counterevidence).
Rationalist forum norms essentially forbid speaking in ways that're politically effective. Those engaging in political outreach would be best advised to read lesswrong but never comment under their real name. If they have go...
I don't think that effective politics in this case requires deception and deception often backfires in unexpected ways.
Gabriel and Connor suggest in their interview that radical honesty - genuinely trusting politicians, advisors and average people to understand your argument and recognizing that they also don't want to die from ASI - can be remarkably effective. The real problem may be that this approach is not attempted enough. I remember this as a slightly less but still positive datapoint https://www.lesswrong.com/posts/2sLwt2cSAag74nsdN/speaking-to-con...
For the US to undertake such a shift, it would help if you could convince them they'd do better in a secret race than an open one. There are indications that this may be possible, and there are indications that it may be impossible.
I'm listening to an Ecosystemics Futures podcast episode, which, to characterize... it's a podcast where the host has to keep asking guests whether the things they're saying are classified or not just in case she has to scrub it. At one point, Lue Elizondo does assert, in the context of talking to a couple of other people who kn...
I think unpacking that kind of feeling is valuable, but yeah it seems like you've been assuming we use decision theory to make decisions, when we actually use it as an upper bound model to derive principles of decisionmaking that may be more specific to human decisionmaking, or to anticipate the behavior of idealized agents, or (the distinction between CDT and FDT) as an allegory for toxic consequentialism in humans.
I'm aware of a study that found that the human brain clearly responds to changes in direction of the earth's magnetic field (iirc, the test chamber isolated the participant from the earth's field then generated its own, then moved it, while measuring their brain in some way) despite no human having ever been known to consciously perceive the magnetic field/have the abilities of a compass.
So, presumably, compass abilities could be taught through a neurofeedback training exercise.
I don't think anyone's tried to do this ("neurofeedback magnetoreception" finds no results)
But I guess the big mystery is why don't humans already have this.
A relevant FAQ entry: AI development might go underground
I think I disagree here:
By tracking GPU sales, we can detect large-scale AI development. Since frontier model GPU clusters require immense amounts of energy and custom buildings, the physical infrastructure required to train a large model is hard to hide.
This will change/is only the case for frontier development. I also think we're probably in the hardware overhang. I don't think there is anything inherently difficult to hide about AI, that's likely just a fact about the present iteration of AI.
But I...
Personally, because I don't believe the policy in the organization's name is viable or helpful.
As to why I don't think it's viable, it would require the Trump-Vance administration to organise a strong global treaty to stop developing a technology that is currently the US's only clear economic lead over the rest of the world.
If you attempted a pause, I think it wouldn't work very well and it would rupture and leave the world in a worse place: Some AI research is already happening in a defence context. This is easy to ignore while defence isn't the frontier....
I notice they have a Why do you protest section in their FAQ. I hadn't heard of these studies before
- Protests can and often will positively influence public opinion, voting behavior, corporate behavior and policy.
- There is no evidence for a “backfire” effect unless the protest is violent. Our protests are peaceful and non-violent.
- Check out this amazing article for more insights on why protesting works
Regardless, I still think there's room to make protests cooler and more fun and less alienating, and when I mentioned this to them they seemed very open to it.
Yeah, I'd seen this. The fact that grok was ever consistently saying this kind of thing is evidence, though not proof, that they actually may have a culture of generally not distorting its reasoning, they could have introduced propaganda policies at training time, it seems like they haven't done that, instead they decided to just insert some pretty specific prompts that, I'd guess, were probably going to be temporary.
It's real bad, but it's not bad enough for me to shoot yet.
There is evidence, literal written evidence, of Musk trying to censor Grok from saying bad things about him
I'd like to see this
I wonder if maybe these readers found the story at that time as a result of first being bronies, and I wonder if bronies still think of themselves as a persecuted class.
IIRC, aisafety.info is primarily maintained by Rob Miles, so should be good: https://aisafety.info/how-can-i-help
I'm certain that better resources will arrive but I do have a page for people asking this question on my site, the "what should we do" section. I don't think these are particularly great recommendations (I keep changing them) but it has something for everyone.
These are not concepts of utility that I've ever seen anyone explicitly espouse, especially not here, the place to which it was posted.
The people who think of utility in the way the article is critiquing don't know what utility actually is, presenting a critque of this tangible utility as a critique of utility in general takes the target audience further away from understanding what utility is.
A Utility function is a property of a system rather than a physical thing (like, eg, voltage, or inertia, or entropy). Not being a simple physical substance doesn't make it fictional.
It's extremely non-fictional. A human's utility function encompasses literally everything they care about, ie, everyt...
Contemplating an argument that free response rarely gets more accurate results for questions like this because listing the most common answers as checkboxes helps respondents to remember all of the answers that're true for of them.
I'd be surprised if LLM use for therapy or sumarization is that low irl, and I'd expect people would've just forgot to mention those usecases. Hope they'll be in the option list this year.
Hmm I wonder if a lot of trends are drastically underestimated because surveyers are getting essentially false statistics from the Other gutter.
Apparently Anthropic in theory could have released claude 1 before chatgpt came out? https://www.youtube.com/live/esCSpbDPJik?si=gLJ4d5ZSKTxXsRVm&t=335
I think the situation would be very different if they had.
Were OpenAI also, in theory, able to release sooner than they did, though?
The assumption that being totally dead/being aerosolised/being decayed vacuum can't be a future experience is unprovable. Panpsychism should be our null hypothesis[1], and there never has and never can be any direct measurement of consciousness that could take us away from the null hypothesis.
Which is to say, I believe it's possible to be dead.
the negation, that there's something special about humans that makes them eligible to experience, is clearly held up by a conflation of having experiences and reporting experiences and the fact that humans are the o
I have preferences about how things are after I stop existing. Mostly about other people, who I love, and at times, want there to be more of.
I am not an epicurean, and I am somewhat skeptical of the reality of epicureans.
It seems like you're assuming a value system where the ratio of positive to negative experience matters but where the ratio of positive to null (dead timelines) experiences doesn't matter. I don't think that's the right way to salvage the human utility function, personally.
Okay? I said they're behind in high precision machine tooling, not machine tooling in general. That was the point of the video.
Admittedly, I'm not sure what the significance of this is. To make the fastest missiles I'm sure you'd need the best machine tools, but maybe you don't need the fastest missiles if you can make twice as many. Manufacturing automation is much harder if there's random error in the positions of things, but whether we're dealing with that amount of error, I'm not sure.
I'd guess low grade machine tools also probably require high grade machine tools to make.
Fascinating. China has always lagged far behind the rest of the world in high precision machining, and is still a long way behind, they have to buy all of those from other countries. The reasons appear complex.
All of the US and european machine tools that go to china use hardware monitoring and tamperproofing to prevent reverse engineering or misuse. There was a time when US aerospace machine tools reported to the DOC and DOD.
Someone who's not a writer could be expected to not have a substack account until the day something happens and they need one, with zero suspicion. Someone who's a good writer is more likely to have a pre-existing account, so using a new alt raises non-zero suspicion.