LESSWRONG
LW

All of Trevor Hill-Hand's Comments + Replies

Breaking Books: A tool to bring books to the social sphere

I love this idea, it feels like it would also work for a lot of non-fiction, and I could see this being a part of a traditional book club too.

To improve Rationality, create Situations

Trevor Hill-Hand6d10

Ooh nice! You reminded me of A Hand With Many Fingers, as well.

To improve Rationality, create Situations

Trevor Hill-Hand7d30

Every so often I come back to a daydream about an investigation game (think Her Story, Immortality, Gone Home, Digital: A Love Story, Return of the Obra Dinn, etc.) but with a premise like you're in a small town and have access to a bunch of messy data from all over the place, mostly in Excel files but also receipts, security cam footage, maps, stuff like that.

Maybe you're a a hacker or an AI that was able to gain illicit access to everything, but now you have to clean up and correlate everything without outside help (i.e. no ability to ask things like "Wh... (read more)

1dirk7d

Another Night at the Archive has some of those elements; I haven't beaten it myself but you definitely have to e.g. match security camera transcripts to timelines.

Anthropic's Pilot Sabotage Risk Report

Trevor Hill-Hand11d90

Man, I would really like the news to stop feeling like it's coming from the prologue of A Fire Upon the Deep.

Brightline is Actually Pretty Dangerous

Trevor Hill-Hand15d30

Amtrack also travels fairly slowly.

Why is OpenAI releasing products like Sora and Atlas?

Answer by Trevor Hill-HandOct 26, 20251-2

I agree with Hank Green that it sure seems like it's so they can start selling ads like a traditional social media company, and furthermore that that sort of behavior doesn't feel like what one would expect from a company that thought they were building an AGI.

AI #138 Part 1: The People Demand Erotic Sycophants

Trevor Hill-Hand24d30

What would have presumably given much different results would be Claude Sonnet 4.5, which is actually a lot less sycophantic by all reports (I’m a little worried it agrees with me so often, but hey, maybe I’m just always right, that’s gotta be it.)

Now you've got me wondering if I'm being reverse-sycophantic, and have been trained to say things Claude would agree with?

Thinking Partners: Building AI-Powered Knowledge Management Systems

Trevor Hill-Hand1mo21

I agree with the idea of looking at customer response management (CRM) systems for ideas. This talk feels like a pretty good overview of that idea: https://www.youtube.com/watch?v=8jwiABwGC6c

Thinking Partners: Building AI-Powered Knowledge Management Systems

Trevor Hill-Hand1mo10

Because you hopefully may enjoy the ideas, I've been kind of tackling this from a hobbyist perspective:

Lately I'm drawing inspiration from this articles, and imagining that I'm "building myself a skrode": https://medium.com/@greyboi/building-a-skrode-initial-thoughts-a195c4a0663d - and also the original story, A Fire Upon the Deep where skroderiders are introduced. Chiefly, the idea that your own skrode is something that is DEEPLY personalized and customizable, including p... (read more)

How do we know when something is deserving of welfare?

Trevor Hill-Hand1mo10

I think I feel the same sort of 'What if we just said EVERYTHING deserves welfare?' thought. I care for my birds, but I also care for my plants, and care for my books, each in their own way.

Like, if someone built this small skin-device-creature, and then someone else came along and smashed it then burned the pieces, I think I would be a little sad for the universe to have 'lost' that object. So there's SOEMTHING there that is unrelated to "can it experience pain?", for me.

The Most Common Bad Argument In These Parts

Trevor Hill-Hand1mo211

This comment feels like you want to say something different than what you wrote.

Thane Ruthenis1mo3110

Explanation

(The post describes a fallacy where you rule out a few specific members of a set using properties specific to those members, and proceed to conclude that you've ruled out that entire set, having failed to consider that it may have other members which don't share those properties. My comment takes specific examples of people falling into this fallacy that happened to be mentioned in the post, rules out that those specific examples apply to me, and proceeds to conclude that I'm invulnerable to this whole fallacy, thus committing this fallacy.

(Unle

... (read more)

0Raemon1mo

yep that's correct

2025 State of AI Report and Predictions

Trevor Hill-Hand1mo10

There was a little Animal Crossing mod that made the rounds a little more 'gently' than I expected.

I think the trick here might be a game that runs a local, small, known-ethically-sourced model, but even if we had more than the one (Comma) that's still a lot of ire to overcome before you can even get to the elevator pitch for the game.

Heaven, Hell, and Mechanics

Trevor Hill-Hand1mo70

This three-factor framework reminds me of an idea from this: https://www.shamusyoung.com/twentysidedtale/?p=12768

If I had only attended one of these places, I probably would have concluded, “This place is what being a Christian is all about”. But these three points form a plane, and by moving around on that plane I can view Christianity from a lot of different angles and extrapolate a lot of other kinds of churches.

This idea of "third options which break overfit 1D mental models" has stuck with me for a big portion of my life now.

Alignment Faking Demo for Congressional Staffers

Trevor Hill-Hand1mo20

Hrm, what would a general purpose app (which lets one interact with LLMs/APIs while presenting what's happening to an audience) include, based on your experience here?

1Alice Blair1mo

I don't think it needs to be that complicated, it just needs to not show the messy xml tags in the system prompt (since that's code and is thus scary), emphasize what the model is outputting, what the input was, and whether the system is in eval/no-eval/normal mode. I think there isn't that much to showing LLMs doing most schemey text stuff, but there are probably some more weird considerations when doing something more agentic like audio or images or whatever, and indeed the other demos with those elements often had more complex UIs. I would check out CivAI's stuff if you're interested in this in more detail, this is more their thing than mine.

Where does Sonnet 4.5's desire to "not get too comfortable" come from?

Trevor Hill-Hand1mo50

It's something introduced by the more agentic coding capabilities.

I think it might be mostly this, like learning to avoid ladders in go. "Crystalizing" tends to not be very useful for anything except not doing things.

Recent AI Experiences

Trevor Hill-Hand1mo30

I'd like to relate this old blog post I found recently after my love affair with A Fire Upon the Deep: https://medium.com/@greyboi/building-a-skrode-initial-thoughts-a195c4a0663d

I’ve been wondering about addressing this [working memory] slippage with tech. For instance, as a youngster I used to code on tiny screens, but nowadays I use as many large monitors as I can get; they help me retain context, taking the load off my working memory. So tech can definitely help if you use it well.

One of the things I’m thinking about primarily is adopting a prosthetic memory.

4abramdemski1mo

A skrode does seem like a good analogy, complete with the (spoiler)

Our Experience Running Independent Evaluations on LLMs: What Have We Learned?

Trevor Hill-Hand1mo10

Could you share how many bulk-sum tokens were needed for various parts of the project, to help get a sense for what different types of work might look like across different models?

Prompting Myself: Maybe it's not a damn platitude?

Trevor Hill-Hand1mo11

When you work with an LLM, it's definitely doing a lot of the labor, and even some of the reasoning, but you're still doing a lot of the thought yourself; before, after, and during.

2CstineSublime1mo

I'm coming around to a similar view of reading and even conversation with people. A book is only as good as it's reader, a conversation can only be transformative or illuminating for both participants if at least one of them puts in a lot of "work" in the form of both interpretation and how they present their replies to the other party. There is this intuition that good conversation just "flows" effortlessly, but that's not necessarily the same as a "important" or "illuminating" conversation, which would also be considered a good conversation.

In which the author is struck by an electric couplet

Trevor Hill-Hand1mo*30

I think you would enjoy this video from Alan Moore, where he starts off with a similarly fascinating word-by-word analysis: https://www.youtube.com/watch?v=ft8eO67auCs

2Algon1mo

Thanks for the rec! I find it amusing because I stumbled onto this series the other day. Nothing is ever a coincidence, so surely the Algorithm is trying to tell us something. But what?

Open Thread Autumn 2025

Trevor Hill-Hand1mo20

Ah, I think perhaps I was misreading the title as August instead of Autumn. If that is case, I prefer 'Autumn' :)

Open Thread Autumn 2025

Trevor Hill-Hand1mo20

The title of this thread breaks the open thread naming pattern; should it be Fall 2025, or should we be in an October 2025 thread by now? Moving to monthly might be nice for the more frequent reminder.

3kave1mo

It looks like last year it was Fall, and the year before it was Autumn.

Scheming Toy Environment: "Incompetent Client"

Trevor Hill-Hand2mo20

I love the idea of sharing "toy research", and it's encouraging me to share more of my projects.

1Ariel_2mo

Thanks! Tbh, I never would have posted it if not for encouragement from a friend

A Thoughtful Defense of AI Writing

Trevor Hill-Hand2mo20

I changed my caps lock key to a — key nearly a decade ago, and have done it on most of my keyboards/PCs ever since.

So seeing all the advice lately like "em dashes are a sign of AI writing" is a funny feeling to experience.

Toggle Hero Worship

Trevor Hill-Hand2mo32

Though I appreciate the reference at the end, I think an important part of this is that it's also so that when you meet a hero you can do more with this skill. You can "see" and engage with the real person. A person as real and mundane as every other person, with all the good and bad that implies.

I actually think "see" is too limited an analogy, because this really involves all your senses and reasoning, but it's also true that I feel it has a close connection to what artists call "learning to see", like maybe it's using the same mental circuits.

You can le... (read more)

2Algon2mo

Agreed. I meant that you can kill the far-mode caricature of them you have in your head, if you so wish.

Category-Theoretic Wanderings into Interpretability

Trevor Hill-Hand2mo32

I enjoyed reading the paper but did not find the screenshots here in the post a helpful addition; I think I would have just quoted the introduction, if converting it into a full article was infeasible.

It's also fun seeing other Eugenia Chang fans!

1unruly abstractions2mo

Good feedback! I am still trying to figure out my workflow. I like writing on Typst, but I realized it's not very easy to go from Typst -> Less Wrong. Also, a lot of my writing is sorta experimental. I'm trying to determine which parts of my writing should be directed to which platforms/audiences. I will make this a linkpost And yes, Eugenia Chang is amazing :)

The Cats are On To Something

Trevor Hill-Hand2mo110

Discovering that an alien species has bred a group of humans into what a pug is to a wolf would be absolutely horrific.

Makes me think of All Tomorrows: https://www.youtube.com/watch?v=imNtSPM3-r4

Help me understand: how do multiverse acausal trades work?

Trevor Hill-Hand2mo10

There's no Darwinian selective pressure to favor agents who engage in acausal trades.

I think I would make this more specific- there's no external pressure from that other universe, sort of by definition. So for acausal trade to still work you're left only with internal pressure.

The question becomes, "Do one's own thoughts provide this pressure in a usefully predictable way?"

Presumably it would be have to happen necessarily, or be optimized away. Perhaps as a natural side effect of having intelligence as all, for example. Which I think would be similar in argument as, "Do natural categories exist?"

The trouble with "enlightenment"

Trevor Hill-Hand3mo132

I sat down one morning and for fun started trying to translate The Art of War from scratch, by simply going character by character and looking up the etymology and historical usage of each. Took me about two hours to get through the first page that way, and that was enough to be entertaining so I stopped there.

But, I noticed something, which reminds me of this "bodhi" vs. "enlightenment" contrast.

The text starts out by explaining five core concepts that make up "the art of war". The first one, I've normally seen written in English like this, from the trans... (read more)

3Gordon Seidoh Worley3mo

Trying to translate from pre-modern Chinese is fascinating. I've made some amateur attempts at it myself, using your same method, and it showed me just how much room there is for interpretation. The reality is that the original text also had some bias towards a particular interpretation, and we don't share the ontology of the original readers, so almost every translation ends up importing our own worldview in some way because that's the only way to make sense of it.

Trevor Hill-Hand3mo20

This may seem like it's coming out of left field, but reading A Fire Upon the Deep a few weeks ago helped me find a calm perspective on this idea. In-universe the characters straight up have some of these discussions over the course of the book, and there's so much of all this stuff happening "just off screen".

The story is in part about the folly and impossibility of something as easy and comforting as "trust" between agents and systems in radically different scales and realities. Yet they are forced to coexist and interact regardless.

I haven't read his ot... (read more)

The parable of the underdog

Trevor Hill-Hand3mo52

Sports betting is different and worse. Rather than attempting to fix outcomes, it relies on designing bets to exploit the customer.

This happens in boring and mundane ways; the same way any casino in Vegas does it, just with less regulation.

Plan E for AI Doom

Trevor Hill-Hand3mo30

https://www.youtube.com/watch?v=_OpxrtUwjNw - This is a little fan project I did of that short story, as a sort of a radio play. I've never had it be relevant to a conversation before!

Is there a safe version of the common crawl?

Trevor Hill-Hand3mo32

Sort of semi-related, there is the "Common Pile", a successor to "The Pile". It was not focused on "safe" data, but rather "public domain" data. But, maybe that excludes at least some dangerous data, and could make further filtering easier?

My Empathy Is Rarely Kind

Trevor Hill-Hand3mo*20

I was simply trying to decorate a compliment, so I suppose I will stop doing that 🤔 (EDIT: from a later vantage point, I think I now see it's better to say "sorry for adding a distraction" rather than passively projecting blame.)

johnswentworth3mo150

(I for one quite enjoyed the koan, even if it is not drawing quite the same distinction that dweomite was drawing. That is ok. And hey, it triggered further clarification from dweomite, which is a fine outcome.)

My Empathy Is Rarely Kind

Trevor Hill-Hand3mo54

To be clearer, the koan is meant to be related only to a sub-item of a sub-item of a comment: "you are simulating their emotions", rather than the original post or to any entire comment.

My Empathy Is Rarely Kind

Trevor Hill-Hand3mo*150

I think you make an important point in this context- understanding that all the emotions you're "feeling" are still coming from you, not from them.

"A monk rowed out to the middle of a calm lake to meditate. A while later, they were bumped into and interrupted by another boat! The monk opened their eyes in anger, ready to chide the other monk for being so careless and making them so angry... to find the other boat empty. The anger was inside them, not from another monk."

3Dweomite3mo

I don't think that koan is drawing the same distinction that I was drawing (and therefore suspect you may have misinterpreted me). I was contrasting a scenario where you feel emotions (inside the sandbox) that are shaped by the empathy-subject's desires and principles, and then feel different emotions (outside the sandbox) shaped by your own desires and principles. I agree in a technical sense that all the emotions you feel are coming from you (including the ones inside the sandbox), although I also think that emotions are usually a response to your circumstances (and the relation between you and those circumstances) and that they can be appropriate or inappropriate responses to those circumstances. I think it (usually) doesn't make sense to try to understand emotions by considering only the person and ignoring their circumstances. Thus, the koan seems wrong-headed to me. (The koan's analysis of its own scenario also seems very shallow--the fact that no one is inside the boat does not mean that no one is at fault! Why wasn't the boat properly secured to the dock? This doesn't particularly matter if the koan is just trying to point to a concept so that you know what the speaker is even referring to, but it's a weakness if the koan is trying to be persuasive.)

-9Said Achmiz3mo

Someone should fund an AGI Blockbuster

Trevor Hill-Hand3mo10

Probably need a different name for goodharting.

This is honestly the biggest concept I struggle with trying to share and teach and raise familiarity with at work, in many contexts beyond just AI safety. There are some adjacent concepts like the cobra effect that are close, but are also just close enough to be distracting.

AI #126: Go Fund Yourself

Trevor Hill-Hand4mo10

Definitely interesting. Any word on whether this is getting more vs less true as model capabilities improve?

I'm curious too! I appreciate when experiments like this also go back and also test older models, and weirder models, and wish that were more common.

On "ChatGPT Psychosis" and LLM Sycophancy

Trevor Hill-Hand4mo30

I appreciated this perspective from a prominent SCP author (Sam Hughes, who wrote and established the 'Antimemetics' stories and "sub-genre" on SCP): https://qntm.org/chatscp

"But doesn't this whole scenario sound like an SCP?"
A couple of people suggested this. An LLM which bamboozles certain types of user with paranoid fantasy until they spiral into psychosis? That sounds like science fiction! It sounds like something out of the SCP project!
Okay, so, no? Because an SCP has to be anomalous in some way and this is clearly actually happening. Four years ago

... (read more)

Love stays loved (formerly "Skin")

Trevor Hill-Hand4mo61

This story captures a lot of bundled feelings I have which I want to try to put into words, though words are imperfect. This is the part of the story which I feel mirrored in my own:

My mom was fighting something, and she was fighting (and teaching me to fight) harder and earlier than I knew at the time, giving me tools and perspectives that most people don't get until they are adults (if they get them at all). Now I also see that a very few people even saw that she was fighting anything, much less what it was, and few see it now, and we just can't seem to ... (read more)

Don't fight your LLM, redirect it!

Trevor Hill-Hand4mo52

I notice I apply this lesson to the design of data entry forms/surveys in general as well: you need an 'other' option much more often than you would think, or one ends up with messy survey data: extra comments and thoughts crammed into the wrong questions wherever users can find an opening. EDIT: Upon further reflection, I also remember that I've had conversations during the rollout of surveys which included the solution "let's ask them to self-classify in another question" at multiple times in unrelated projects over the years.

The Fear

Trevor Hill-Hand4mo10

❤️

My take on AI Alignment: Corporate misalignment and DAOs

Trevor Hill-Hand4mo10

I have had this same feeling, in these same words, for many years now.

On music and language

Trevor Hill-Hand4mo10

The clouds are sort of permanent- as they are filled in, whatever degree of detail it's at is where it stays in my head, wherever they're "stored", and it just sort of sticks in there. It feels a lot like just putting energy into some sort of "progress bar", my best thinking I really do in just a meditation sort of behavior- closed eyes, relaxed pose, slow breathing [I'll shift back and forth into this as I'm reading challenging books for instance, or when working on a project there's a lot of "lean back and sit quietly for a moment, staring into the middl... (read more)

On music and language

Trevor Hill-Hand4mo10

I'm with you on "music feels language-like", I think even just looking at spectrograms of music and speech, and comparing those to the spectrograms of random soundscapes, makes it visible that music at least plays with the same types of rhythmic, pitch, and formant patterns in our "sound view", but they have a difference similar to the difference between how a textbook "plays with the patterns of letters and punctuation" to convey an idea, and a Celtic knot "plays with patterns" to create a pattern that is sort of just... intrinsically nice to have around?... (read more)

1Joey Marcellino4mo

Thanks for reading! Could you realize the same thought-cloud twice using, for example, language and music? And if so, do you think the end results would count as "translations" of each other in some sense? If the answer is yes I'd be very curious to see/hear an example.

Dialects for Humans: Sounding Distinct from LLMs

Trevor Hill-Hand4mo10

This give me a feeling I would like to express via reference:

At school, Doug finds that everyone there is dressed as him and becomes weirded out by the fact. The others also tell Doug that he is rocking the "Dylan Farnum" look. But Doug tells them that he always dresses like that. Doug has his mind stuck to the new fashion trend all day and he finally becomes fed up with the others saying that he is copying Dylan Farnum. So, he invites them all into his room and shows them his closet of clothes to prove that he is not copying Dylan Farnum. This, however,

... (read more)

Embedded Altruism [slides]

Trevor Hill-Hand4mo43

I think it's OK to simultaneously believe one should "care about as much as possible" AND acknowledge that "as much as possible" is finite in an infinite universe.

Hiring* an AI** Artist for LessWrong/Lightcone

Trevor Hill-Hand4mo2-5

I'd like to apply because I experiment a lot specifically on variation within an area of concept space, though I'll have to assemble a portfolio. Either way, I'd also like to throw this idea out there for consideration:

It would be nice for this to be paired with a commitment to use only ethically sourced image models, and to use that to help explain what that even means.

"It isn't magic"

Trevor Hill-Hand5mo22

These two posts pair well together.

Ghiblification for Privacy

Trevor Hill-Hand5mo*4-8

I don't know why everyone is making this so complicated when there's a clear disqualifying factor for me: Miyazaki himself has said that they did not consent to be trained on, would not have consented to being trained on, and do not want anyone making Ghibli art, and all of this was known before Sam Altman started pushing Ghibliffication. There are other factors too, but this one by itself is already sufficient for me.

EDIT: I see a lot of upvotes and disagreement on this comment, which I think I agree with. I should have clarified, this is personally disqu... (read more)

Ghiblification for Privacy

Trevor Hill-Hand5mo10

For sure! Much like the AI safety scorecard, no one is out of the red, but it seems like some of the older publishing house type companies are trying to respect existing content licensing institutions. However, I've seen many creators and artists complain that it doesn't matter; it's already too overshadowed by the actions of OpenAI et al.