What’s up with LLMs representing XORs of arbitrary features?

Sam Marks

LESSWRONG
LW

All of hath's Comments + Replies

I might start a newsletter on the economics of individual small businesses. Does anyone know anyone who owns or manages e.g. a restaurant, or a cafe, or a law firm, or a bookstore, or literally any kind of small business? Would love intros to such people so that I can ask them a bunch of questions about e.g. their main sources of revenue and costs or how they make pricing decisions.

1benjaminikuta1y

Can I subscribe to your newsletter?

Announcing Dialogues

hath1y40

I'd be interested in doing something resembling an interview/podcast, where my main role is to facilitate someone else talking about their models (and maybe asking questions that get them to look closer at blurry parts of their own models). If you have something you want to talk/write about, but don't feel like making a whole post about it, consider asking me to do a dialogue with you as a relatively low-effort way to publish your takes.

Some potential topics:

What's your organization trying to do? What are some details about the world that are informing y

... (read more)

The Apprentice Thread 2

hath2y30

Took APCS (Java 101-102) in high school (culminating with coding Tetris in Java), read through Diveintopython3.net, done a bunch of miscellaneous programs in Python, lots of experience in Linux.

2Viliam2y

Hm. My background is computer science at university, about 20 years of Java, plus some Python in my free time. Chances are, you know more about Python than I do, so the useful advice I could provide would be in the "programming in general" area, which of course also applies to Python code. Either something theoretical (e.g. finite-state machines), or something from practice (e.g. test-driven development); actually those are the only two examples that come to my mind right now, but maybe if I think longer about it, I will find something more. Also, you could show me your code for a review. If you want a project, I think a fun exercise could be to design your own toy language and write its interpreter in Python. If you are interested, send me a private message and let's figure out how to communicate. * Thinking about math, the part relevant for alignment seems to be logic. Could give you a short introduction, but nothing too advanced. Seems like in both programming and math, I can help you make a step or two forward, but it will still be far from where you want to get. Just saying it in advance, to keep your expectations appropriately low.

The Apprentice Thread 2

hath2y*40

[APPRENTICE]:

For a bunch of these, the minimum viable product for mentoring me is a combination of pointing me to books/textbooks and checking in on me to make sure I actually do it.

Some things I'd like mentorship on:

People willing to review my writing, and accountability on spewing out a bunch of blog posts. (and maybe on starting an actual novel!)
Operations. I've run a couple large projects in the past, including a group house, and there's a lot I can do better. Would love to hear from people who have run group houses or organizations in the past.
Economi

... (read more)

1ProgramCrafter2y

Do you have a plot for your novel, or just an idea? I have an idea on HP:MoR-like novel with different negative forces implying different setting (though still magical world and Hogwarts). I've also written a plot for it (with help of ChatGPT), however the plot doesn't actually matter much to me - I just want to read an exciting story! (It also seems the most realistic way to write a novel, because the finish is not fixed in real life.)

3Viliam2y

What are your current skills in Python and programming in general? (Just approximately, like "never programmed at all", "never wrote a program longer than one screen", "senior developer in C++ but never used Python" etc.)

hath2y62

Some notes on the Dialogue format:

Seems like a less effortful/more social way of writing--like glowfic, but for nonfiction!

Probably better at conveying more implicit knowledge, like interviews do (maybe?)

Just because it's public doesn't mean we won't stop adding pieces to the dialogue; Elizabeth and I still have a lot to say.

hath's Shortform

hath2y20

Day 1, adding ~500 words of nuance.

A lot of these models are “this was my lived experience, it seems to generalize a fair bit”. I sent out an interest form to see how much demand there was for something like this, as a way to test whether it did in fact generalize a bunch to other people, and it got a lot of responses.
Default BATNA to high school is “live by yourself, maybe on a grant, while you self-teach or work on a project”. I did this! It sucked!
- Solo productivity is hard. Creating systems that help you get work/studying done every day, without e

... (read more)

hath's Shortform

hath2y10

Super rough expansion of the first couple bullet points, Day 0:

Intro:

Why write this?
- I’m writing this post because I have a bunch of models about group houses, minors, and the combination of the two that I think other people might be interested in. I also want to have some publicly available thing I can point to that says what this whole thing is about.
Short version of what this was.
- Ascension Beta was a month-long experimental group house I ran in October 2022, with participants aged 16-22. It was intended primarily as a test of the below models (to see if

... (read more)

hath's Shortform

hath2y40

I'm writing up my models on why my pet project, Ascension, is a good idea. This is the outline. As I expand the post, I'll add the incremental bits as comments.

Intro:

Why write this?
Short version of what this was.
Why run Ascension (short version)

Models:

Most important model here: it worked. Everything below is mostly informed by that, and the beta was a really good way to develop those models.
High school, as an institution, is absolutely dog shit.
- Signaling race to the bottom that sucks up all of your time.
Default BATNA to high school is “live by yourself, ma

... (read more)

2hath2y

Day 1, adding ~500 words of nuance. * A lot of these models are “this was my lived experience, it seems to generalize a fair bit”. I sent out an interest form to see how much demand there was for something like this, as a way to test whether it did in fact generalize a bunch to other people, and it got a lot of responses. * Default BATNA to high school is “live by yourself, maybe on a grant, while you self-teach or work on a project”. I did this! It sucked! * Solo productivity is hard. Creating systems that help you get work/studying done every day, without external deadlines and check-ins, is really difficult. Also, I have pretty bad ADHD, which means that my default for extended periods of working alone involves forgetting to eat, take my ADHD medication, or do anything productive whatsoever during the day. * I care a lot about seeing friends, and don’t really have a lot of ways to do that, especially because most of my really good friends are scattered across the US and Europe. * Being stuck at home is corrosive for a bunch of reasons that aren’t always immediately apparent. Some of this is due to the loss of the counterfactual environment, and some of this is due to specific details about people’s home lives. * Agency is a pretty important thing. By default, it gets crushed. Giving people power over their own lives, and encouraging them when they do weird things, helps turn them into the kind-of-person who comes up with weird new things to do that would help their lives, and overall makes the world a better place. * A lot of people who have the potential to do a lot of great things have their creativity and agency crushed by The System and their parents. The K-12 education system isn’t centrally designed to do any one thing, but the result of the system is that your creativity and independence is crushed. The parents of really smart kids can be slightly obtuse and limiting at best and controlling and manipulative at worst. I know people who we

1hath2y

Super rough expansion of the first couple bullet points, Day 0: Intro: * Why write this? * I’m writing this post because I have a bunch of models about group houses, minors, and the combination of the two that I think other people might be interested in. I also want to have some publicly available thing I can point to that says what this whole thing is about. * Short version of what this was. * Ascension Beta was a month-long experimental group house I ran in October 2022, with participants aged 16-22. It was intended primarily as a test of the below models (to see if a larger, longer-running version was worthwhile) and a chance to practice running a group house of this type, working out the major kinks before running a longer version. * The major goals of Ascension were to give residents social accountability, agency over their environment, and community. * Why run Ascension (short version) * Because I wanted it to exist (so I could live there), other people wanted it to exist for the same reason, and nobody else was going to step up and make it happen. I had a lot of models about agency, environment, and productivity, and in particular a specific kind of environment I wanted to live in. However, it didn’t exist, especially not for minors. I also hypothesized that the people I had met who were similar to me would also want this to exist, and that was borne out by the evidence. There are a bunch of reasons for why Ascension provides value to these people, and that’s what most of this post is about. Models: * Most important model here: it worked. Everything below is mostly informed by that, and the beta was a really good way to develop those models. * Before I ran the beta, I was pretty uncertain about some of these models. My models on high school and agency were fairly strong, but everything about how something like Ascension would actually function in practice was fairly blurry. However, the beta, while janky, proved that something like Asc

You Don't Exist, Duncan

hath2y111

I'm reminded of Falsehoods Programmers Believe About Names, an essay on the problems with handling "weird" data inputs that are normal for the people involved.

How to Convince my Son that Drugs are Bad

Answer by hathDec 19, 202250

Not sure if this would help, but I'm also a 16 year old^[1]who's been reading LW for a bit over two years, and who doesn't think that taking most drugs is a great idea (and have chosen not to e.g. drink alcohol when I've had the opportunity to). I don't think all drugs are bad (I have an Adderall prescription for my ADHD) but the things your son mentioned seem likely to harm him. If he wanted to talk to me about it, he can PM me on LW or message me on Discord @ sammy!#0521.

As someone who often has... disagreements with their parents, sometimes it's ea... (read more)

hath's Shortform

hath3y90

Meritxell has made the serious error of mentioning that she didn't fully grasp some of what Keltham said earlier about stock companies.
Keltham is currently explaining how a Lawful corporation has an internal prediction market, which forecasts the observable results on running various possible projects that company could be trying, which in turn is used to generate an estimate of marginal returns on marginal internal investment; this prevents a corporation from engaging in obvious madness like accepting an internal project with 6% returns while turning

... (read more)

hath's Shortform

hath3y10

Some quotes from Planecrash that I might collect into a full post:

9hath3y

hath's Shortform

hath3y60

Upcoming Posts

Now that I'm back from [Atlas Fellowship+SPARC+EAG+Future Forum], I have some post ideas to write up. A brief summary:

Agency and Authority, an actual in-depth, gears-level explanation of agency, parenting, the two kinds of respect, moral conflation with that respect, the fact that those in power are incentivized to make their underlings more legible and predictable to them, arbitrarily high punishments and outcome matrices, absolute control and concessions, incentives for those not in power and how those incentives turn you into less of an a... (read more)

Cultivating And Destroying Agency

hath3y70

I’m really sorry to hear that, man. It’s honestly a horrible thing that this is what happens to so many people; it’s another sign of a generally inadequate civilization.

For what it’s worth, the first chapter of Smarter Faster Better is explicitly on motivation, and how to build it from nothing. It mentions multiple patients with brain injuries who were able to take back control over their own lives because someone else wanted to help them become agentic. I think reading that might help.

On another note, thank you for being open about this. I appreciate all ... (read more)

Godzilla Strategies

hath3y62

Not only is this post great, but it led me to read more James Mickens. Thank you for that! (His writings can be found here).

LessWrong Now Has Dark Mode

hath3y30

Intercom doesn't change in Dark Mode. Also, the boxes around the comment section are faded, and the logo in the top left looks slightly off. Good job implementing it, though, and I'm extremely happy that LW has this feature.

2jimrandomh3y

Comment-section borders: I agree that they're currently on the faint side, and will darken them. Intercom: Darkening the button should be easy enough, I'll do that. The box that appears when you click it has enough custom styling from the library that it's probably not worth the trouble, though it looks like people have made extensions for this (eg this one which I have not vetted or tested at all).

[$20K in Prizes] AI Safety Arguments Competition

hath3y10

If you are going to downvote this, at least argue why.

Fair. Should've started with that.

To the extent that rationality has a purpose, I would argue that it is to do what it takes to achieve our goals,

I think there's a difference between "rationality is systematized winning" and "rationality is doing whatever it takes to achieve our goals". That difference requires more time to explain than I have right now.

if that includes creating "propaganda", so be it.
I think that if this works like they expect, it truly is a net positive.

I think that the whole AI... (read more)

[$20K in Prizes] AI Safety Arguments Competition

hath3y210

You didn't refute his argument at all, you just said that other movements do the same thing. Isn't the entire point of rationality that we're meant to be truth-focused, and winning-focused, in ways that don't manipulate others? Are we not meant to hold ourselves to the standard of "Aim to explain, not persuade"? Just because others in the reference class of "movements" do something doesn't mean it's immediately something we should replicate! Is that not the obvious, immediate response? Your comment proves too much; it could be used to argue for literally a... (read more)

-1P.3y

To the extent that rationality has a purpose, I would argue that it is to do what it takes to achieve our goals, if that includes creating "propaganda", so be it. And the rules explicitly ask for submissions not to be deceiving, so if we use them to convince people it will be a pure epistemic gain. Edit: If you are going to downvote this, at least argue why. I think that if this works like they expect, it truly is a net positive.

hath3y60

Can confirm that this is all accurate. Some of it is much less weird in context. Some of it is much, much weirder in context.

9Vanilla_cabs3y

Ok but what's the takeaway for us who do not know the context?

hath3y160

Yeah, my reaction to this was "you could have done a much better job of explaining the context" but:

"Your writing would be easier to understand if you explained things," the student said.

That was me, so I guess my opinion hasn't changed.

Feature proposal: Close comment as resolved

hath3y10

I'd like to have the ability to leave Google-Doc style suggestions on normal posts about typos; seems like something that might be superior of our current system of doing it through the comments? Removing the trivial inconvenience might go a long way.

3Pattern3y

It's still a trivial inconvenience sometimes, but: Two tabs: one for the response comment writing as reading one for the reading Note, sometimes people downvote typo comments. Doesn't happen often, but, sometimes it seems like, when the author fixes it, it happens?

Refine: An Incubator for Conceptual Alignment Research Bets

hath3y40

Are you accepting minors for this program?

9adamShimi3y

I think this is something we will have to address on a case by case basis. By default I would say probably no, but for really brilliant minors, there might be an option. Not promising anything, but if you know anyone in this situation they should apply, it's not long at all.

Editing Advice for LessWrong Users

hath3y40

Thank you for the post, and thank you for all the editing you've done!

2JustisMills3y

My pleasure!

hath3y20

I'm an idiot; Blue Bottle is closed. Maybe the park next to it?

hath3y10

The park next to there works as well.

hath3y10

I've heard good things about Blue Bottle Coffee. It's also next to Lightcone.

2lsusr3y

Blue Bottle Coffee it is.

1hath3y

The park next to there works as well.

20 Modern Heresies

hath3y10

I second this, I sincerely thought these were thoughts you held.

Two Forms of Moral Judgment

hath3y10

Yeah, you're right. Oops.

3Pattern3y

I mean, you basically stated that*: *unless you edited it.

MIRI announces new "Death With Dignity" strategy

hath3y40

>Do you have any experience in programming or AI?

Programming yes, and I'd say I'm a skilled amateur, though I need to just do more programming. AI experience, not so much, other than reading (a large amount of) LW.

>Let's suppose you were organising a conference on AI safety. Can you name 5 or 6 ways that the conference could end up being net-negative?

The conference involves someone talking about an extremely taboo topic (eugenics, say) as part of their plan to save the world from AI; the conference is covered in major news outlets as "AI Safety has a

... (read more)

5Chris_Leong3y

Cool, so I'd suggest looking into movement-building (obviously take with a grain of salt given how little we've talked). It's probably good to try to develop some AI knowledge as well so that people will take you more seriously, but it's not like you'd need that before you start. You did pretty well in terms of generating ways it could be net-negative. That's makes me more confident that you would be able to have a net-positive impact. I guess it'd also be nice to have some degree of organisational skills, but honestly, if there isn't anyone else doing AI safety movement-building in your area all you have to be is not completely terrible so long as you are aware of your limits and avoid organising anything that would go beyond them.

High schoolers can apply to the Atlas Fellowship: $50k scholarship + summer program

hath3y50

As far as I know, the purpose of the nomination is "provide an incentive for you to share the Atlas Fellowship with those you think might be interested" not "help make our admissions decisions". I agree that, if the nomination form was weighted heavily in the admissions decisions, we would be incentivized to speak highly of those who don't deserve it to get 500$.

MIRI announces new "Death With Dignity" strategy

hath3y20

High charisma/extroversion, not much else I can think of that's relevant there. (Other than generally being a fast learner at that type of thing.)
Not something I've done before.

8Chris_Leong3y

1. High charisma/extroversion seems useful for movement building. Do you have any experience in programming or AI? 2. Do you want to give it a go? Let's suppose you were organising a conference on AI safety. Can you name 5 or 6 ways that the conference could end up being net-negative?

Vaniver's Shortform

hath3y10

Enjoy it while it lasts. /s

Good Heart Week: Extending the Experiment

hath3y10

Are we changing from "payment sent every day at midnight" to "payment sent at end of week"?

6AprilSR3y

Yes

MIRI announces new "Death With Dignity" strategy

hath3y100

Also this comment:

Eliezer, do you have any advice for someone wanting to enter this research space at (from your perspective) the eleventh hour?

I don't have any such advice at the moment. It's not clear to me what makes a difference at this point.

Replacing Karma with Good Heart Tokens (Worth $1!)

hath3y20

If you didn't already try, I bet Lightcone would let you post more if you asked over Intercom.

6Vaniver3y

I'm guessing it would just take us editing the relevant posts, and so would be technically easy; I think it might be a bad idea to do it at this point, since there wouldn't be that long for the posts to be read (and it'd dilute attention between them).

hath3y10

Thank you so much! Fixed.

MIRI announces new "Death With Dignity" strategy

hath3y90

(although, measuring impact on alignment to that degree might be of a similar difficulty as actually solving alignment).

9AprilSR3y

Only if you need to be really accurate, which I don't think you necessarily do.

MIRI announces new "Death With Dignity" strategy

hath3y40

Sure, but it's dignity in the specific realm of "facing unaligned AGI knowing we did everything we could", not dignity in general.

jbash3y240

... but it discards all concerns outside of that. "If I regret my planet's death then I regret it, and it's beneath my dignity to pretend otherwise" does not imply that there might not be other values you could achieve during the time available.

Another way to put that, perhaps, is that "knowing we did everything we could" doesn't seem particularly dignified. Not if you had no meaningful expectation it could work. Extracting whatever other, potentially completely unrelated, value you could from the remaining available time would seem a lot more dignified to me than continuing on something you truly think is futile.

MIRI announces new "Death With Dignity" strategy

hath3y50

Do you have any ideas for how to go about measuring dignity?

9hath3y

(although, measuring impact on alignment to that degree might be of a similar difficulty as actually solving alignment).

MIRI announces new "Death With Dignity" strategy

hath3y440

I mean this completely seriously: now that MIRI has changed to the Death With Dignity strategy, is there anything that I or anyone on LW can do to help with said strategy, other than pursue independent alignment research? Not that pursuing alignment research is the wrong thing to do, just that you might have better ideas.

7Chris_Leong3y

Two questions: 1. Do you have skills relevant to building websites, marketing, running events, movement building or ops? 2. How good are you at generating potential downsides for any given project?

Adam Zerner3y200

I've always thought that something in the context of mental health would be nice.

The idea that humanity is doomed is pretty psychologically hard to deal with. Well, it seems that there is a pretty wide range in how people respond psychologically to it, from what I can tell. Some seem to do just fine. But others seem to be pretty harmed (including myself, not that this is about me; ie. this post literally brought me to tears). So, yeah, some sort of guidance for how to deal with it would be nice.

Plus it'd serve the purpose of increasing the productivity of ... (read more)

4rank-biserial3y

What about Hail Mary strategies that were previously discarded due to being too risky? I can think of a couple off the top of my head. A cornered rat should always fight.

Eliezer Yudkowsky3y100

I mean, I'd like to see a market in dignity certificates, to take care of generating additional dignity in a distributed and market-oriented fashion?

Two Forms of Moral Judgment

hath3y30

My inner Professor Quirrell is currently saying that if someone did have a moral policy in which animals had little-to-no value, they probably wouldn't abuse their pets where we could see; it'd be as if someone had read Snuff and thought "That man was a fool. He shouldn't have done that in public, because look what happened to him." Someone who really didn't care about animals in the slightest would still probably act like a normal member of society and just avoid interacting with animals whenever possible, because seeming like a stereotypical villain is g... (read more)

3Pattern3y

would not have a pet.

hath's Shortform

hath3y10

(there's also a level here of "i have no idea how to handle this situation/dynamic", and if you think I did something wrong either in the events described in these posts or by posting this, feel free to tell me i'm an idiot and that I should've done something different)

Replacing Karma with Good Heart Tokens (Worth $1!)

hath3y10

...I forgot about the annual review. I think I'll just say that doesn't count, and also commit to no more changes of the conditions.

EDIT: actually, just going to kill the market.

Replacing Karma with Good Heart Tokens (Worth $1!)

hath3y10

Created a market on Manifold to see if either today's GoodHeart system will last past today, or else if LW will try financial rewards for posting in 2022.

[This comment is no longer endorsed by its author]Reply

5Vaniver3y

If you're counting the book review, are you also counting the annual review? [We don't commit to running it each year, but Laplace's Rule of Succession says the odds of doing it in 2022 are pretty high.]

Replacing Karma with Good Heart Tokens (Worth $1!)

hath3y130

It's really interesting seeing the change in attitude toward low-effort asking-for-money posts. Earlier, people upvoted/put up with them; now people are actively punishing bullshit with strong downvotes. This is good for LW implementing monetary incentives in the future; we can punish Goodharters ourselves.

Replacing Karma with Good Heart Tokens (Worth $1!)

hath3y60

I've been working on setting up a TED talk at my high school, and since the beginning have been planning on asking for speakers through a post here. However, the day that we finally finished the website, and I can finally post here about it, is... when we're doing this whole GoodHeart thing. Not sure whether I should publish it today or tomorrow. (Pros: money. Cons: possibly fewer views because of everything else posted today.) What do you all think?

4lsusr3y

Pro: There's not just more posts on LW than usual. There's also way more eyeballs on LW than usual. Con: Readers might think your TED talk is an April Fool's Day prank.

hath's Shortform

hath3y50

This book occupies the same genre as The Theory And Practice of Oligarchial Collectivism, though I'm not sure what to call that genre. Thank you so much. Would you recommend the longer book?

Replacing Karma with Good Heart Tokens (Worth $1!)

hath3y100

I think that was part of the whole "haha goodhart's law doesn't exist, making value is really easy" joke. However, it's also possible that that's... actually one of the hard-to-fake things they're looking for (along with actual competence/intelligence). See PG's Mean People Fail or Earnestness. I agree that "just give good money to good people" is a terrible idea, but there's a steelman of that which is "along with intelligence, originality, and domain expertise, being a Good Person (whatever that means) and being earnest is a really good trait in EA/LW an... (read more)

hath's Shortform

hath3y20

1hath3y

hath's Shortform

hath3y40

(dialogue reconstructed as well as I can remember it)

For once, I actually cared about what we were doing in English. For our final essay on Macbeth I wrote 1260 words on Duncan's choices through the play, analyzing if he could have made better decisions given the information that he had, and trying to see whether his decisions would have worked out well if not for the supernatural occurrences of the play. This was a couple weeks after my English teacher had talked to me and told me that I wasn't putting enough effort into her class, and that I was doing si... (read more)

7Vaniver3y

You might find The Seven-Lesson Schoolteacher by John Gatto interesting; here it is in pdf form (plus the forward to the longer book that he wrote about it, you can skip ahead to 'chapter 1').

2hath3y

As a follow up: There have been a couple incidents with said teacher trying to assert authority and win debates over, like, actually listening to her students. Today, we had a quiz on 1984. When, during the allotted study time beforehand, students started to go over the material with each other, the teacher told everyone that this was a silent study time; after the quiz, she expanded on this, mentioning a story she had told earlier in the year. It was a story of how a student who had helped their friend on a quiz was rejected by a college the friend was accepted to; the moral from this that she repeated throughout the year was "Your peers are your enemies. You should not help them, because that just actively hurts you in college admissions. Also, let's be real, helping them in this way before the quiz, telling them the answers, is cheating. So, don't help your fellow students; it's cheating, and it only hurts you." I pointed out that a former teacher of mine had lamented grading on a curve strictly because it makes them see their fellow students as competitors instead of friends and allies, and that her argument proved too much; under that, helping other students study in any way counte--she interrupted me, saying that I was equivocating between helping and cheating; when I tried to explain myself she shut me down, saying "You don't want to argue with me about this." (in an earlier conversation, she attributed her aptitude in this to doing debate.) Another relevant time was when, when at one point I misspoke during a debate, repeatedly said "But you said X!" in response to me. "I don't believe that, either you misheard me or I misspoke." "You said X!" "You are purposefully misinterpreting my words." "I'm just saying back what you said!" "You aren't being at all charitable." "I'm just saying what you said!" The point here is that, repeatedly, she's only cared about asserting authority rather than listening or being a charitable debate partner. It's not fun to be ef