LESSWRONG
LW

All of johnswentworth's Comments + Replies

The Value Proposition of Romantic Relationships

Did you intend to post this as a reply in a different thread?

2Viliam6h

It was meant as a reaction to your parents' relationship. It worked for them, but that's because they both cooperated on the mutual goal. It would fail when only one party tries, and the other does not. And you have no control over what the other person does... expect when you are choosing the other person. If you want to achieve the same as your parents did, you have two problems to solve: * how to be the right kind of person * how to find the right kind of person The first one seems more important, because if you fail at that, it doesn't matter how many relationships you will try. But the second one is an independent problem, at least as much difficult.

johnswentworth's Shortform

johnswentworth13h143

It feels like unstructured play makes people better/stronger in a way that structured play doesn't.

What do I mean? Unstructured play is the sort of stuff I used to do with my best friend in high school:

unscrewing all the cabinet doors in my parents' house, turning them upside down and/or backwards, then screwing them back on
jumping in and/or out of a (relatively slowly) moving car
making a survey and running it on people at the mall
covering pool noodles with glow-in-the-dark paint, then having pool noodle sword fights with them at night while the paint is s

... (read more)

6Thane Ruthenis11h

(Written before reading the second part of the OP.) I don't really share that feeling[1]. But if I conditioned on that being true and then produced an answer: Obviously because it trains research taste. Or, well, the skills in that cluster. If you're free to invent/modify the rules of the game at any point, then if you're to have fun, you need to be good at figuring out what rules would improve the experience for you/everyone, and what ideas would detract from it. You're simultaneously acting as a designer and as a player. And there's also the element of training your common-sense/world-modeling skills: what games would turn out fun and safe in the real world, and which ones seem fun in your imagination, but would end up boring due to messy realities or result in bodily harm. By contrast, structured play enforces a paradigm upon you and only asks you to problem-solve within it. It trains domain-specific skills, whereas unstructured play is "interdisciplinary", in that you can integrate anything in your reach into it. More broadly: when choosing between different unstructured plays, you're navigating a very-high-dimensional space of possible games, and (1) that means there's simply a richer diversity of possible games you can engage in, which means a richer diversity of skills you can learn, (2) getting good at navigating that space is a useful skill in itself. Structured plays, on the other hand, present for choice a discrete set of options pre-computed to you by others. Unstructured play would also be more taxing on real-time fluid-intelligence problem-solving. Inferring the rules (if they've been introduced/changed by someone else), figuring out how to navigate them on the spot, etc. What's the sense of "growing better/stronger" you're using here? Fleshing that out might make the answer obvious. 1. ^ Not in the sense that I think this statement is wrong, but in that I don't have the intuition that it's true.

4tailcalled11h

My guess would be unstructured play develops more material skills and structured play develops more social skills.

Shortform

johnswentworth2d72

Eh, depends heavily on who's presenting and who's talking. For instance, I'd almost always rather hear Eliezer's interjection than whatever the presenter is saying.

I mean, I see why a rule of "do not spontaneously interject" is a useful heuristic; it's one of those things where the people who need to shut up and sit down don't realize they're the people who need to shut up and sit down. But still, it's not a rule which carves the space at an ideal joint.

An heuristic which errs in the too-narrow direction rather than the too-broad direction but still plausibly captures maybe 80% of the value: if the interjection is about your personal hobbyhorse or pet peave or theory or the like, then definitely shut up and sit down.

lc1d104

If the interjection is about your personal hobbyhorse or pet peave or theory or the like, then definitely shut up and sit down.

I make the simpler request because often rationalists don't seem to be able to tell when this is (or at least tell when others can tell)

The Value Proposition of Romantic Relationships

johnswentworth4d94

Yeah... I still haven't figured out how to think about that cluster of pieces.

It's certainly a big part of my parents' relationship: my mother's old job put both her and my father through law school, after which he worked while she took care of the kids for a few years, and nowadays they're business partners.

In my own bad relationship, one of the main models which kept me in it for several years was "relationships are a thing you invest in which grow and get better over time". (Think e.g. this old post.) And it was true that the relationship got better ove... (read more)

2Viliam8h

Seems like trusting each other is a high-risk/high-benefit strategy. When it works, it is amazing; but often it does not. The question is how to best predict which people would be most likely to cooperate in this game. The relevant saying seems to be "past behavior is the best predictor of future behavior", but what kind of past behavior are we talking about? (Probably not the previous relationship, because succeeding at it would make the person unavailable. Unless their former partner was hit by a car.) My best guess is if the person has a history of taking care of something, e.g. working at a non-profit.

6Ruby3d

When I say insurance, I don't mean it narrowly in the financial sense. I mean it in the "I'll keep being a part of the relationship even if for some reason you're less able to deliver on your part of it", in this case it could be the non-working spouse not leaving when the breadwinner stops winning bread, or whoever sticking around even when you are ill for a prolonged period and much less fun.

The Value Proposition of Romantic Relationships

johnswentworth4d4-5

I find the "mutual happy promise of 'I got you'" thing... suspicious.

For starters, I think it's way too male-coded. Like, it's pretty directly evoking a "protector" role. And don't get me wrong, I would strongly prefer a woman who I could see as an equal, someone who would have my back as much as I have hers... but that's not a very standard romantic relationship. If anything, it's a type of relationship one usually finds between two guys, not between a woman and <anyone else her age>. (I do think that's a type of relationship a lot of guys crave, to... (read more)

David Lorell3d278

I see it as a promise of intent on an abstract level moreso than a guarantee of any particular capability. Maybe more like, "I've got you, wherever/however I am able." And that may well look like traditional gendered roles of physical protection on one side and emotional support on the other, but doesn't have to.

I have sometimes tried to point at the core thing by phrasing it, not very romantically, as an adoption of and joint optimization of utility functions. That's what I mean, at least, when I make this "I got you" promise. And depending on the situati... (read more)

9Viliam4d

This is also my experience, so I wonder, the people who downvoted this, is your experience different? Could you tell me more about it? I would like to see what the world is like outside my bubble. (I suspect that this is easy to dismiss as a "sexist stereotype", but stereotypes are often based on shared information about repeated observations.)

The Value Proposition of Romantic Relationships

johnswentworth5d52

I think your experience does not generalize to others as far as you think it does. For instance, personally, I would not feel uncomfortable whispering in a friend's ear for a minute ASMR-style; it would feel to me like a usual social restriction has been dropped and I've been freed up to do something fun which I'm not normally allowed to do.

7Lukas_Gloor5d

I was initially surprised that you think I was generalizing too far -- because that's what I criticized about your quoting of Duncan's list and in my head I was just pointing to myself as an obviously valid counterexample (because I'm a person who exists, and fwiw many but not all of my friends are similar), not claiming that all other people would be similarly turned off. But seeing Thane's reply, I think it's fair to say that I'm generalizing too far for using the framing of "comfort zone expansion" for things that some people might legitimately find fun. As I'm going to also write in my reply to Thane, I knew some people must find something about things like the ASMR exampe fun, but my model was more like "Some people think comfort/trust zone expansion itself is fun" rather than "Some people with already-wide comfort/trust zones find it fun to do things that other people would only do under the banner of comfort/trust zone expansion." Point taken! Still, I feel like the list could be more representative to humanity in general by not using so many examples that only appeal to people who like things like circling, awkward social games, etc. It's hard to pinpoint why exactly I think many people are highly turned off by this stuff, but I'm pretty sure (based on introspection) that it's not just fear of humiliation or not trusting other people in the room. There's something off-putting to me about the performativeness of it. Something like "If the only reason I'm doing it is because I'm following instructions, not because at least one of us actually likes it and the other person happily consents to it, it feels really weird." (This actually feels somewhat related to why I don't like small talk -- but that probably can't be the full explanation because my model of most rationalists is that they probably don't like small talk.)

The Value Proposition of Romantic Relationships

johnswentworth6d100

Indeed!

This post might (no promises) become the first in a sequence, and a likely theme of one post in that sequence is how this all used to work. Main claim: it is possible to get one's needs for benefits-downstream-of-willingness-to-be-vulnerable met from non-romantic relationships instead, on some axes that is a much better strategy, and I think that's how things mostly worked historically and still work in many places today. The prototypical picture here looks like both romantic partners having their separate tight-knit group of (probably same-sex) fri... (read more)

johnswentworth's Shortform

johnswentworth10d64

No, I got a set of lasertag guns for Wytham well before Battleschool. We used them for the original SardineQuest.

4interstice10d

This is one of the better sentences-that-sound-bizarre-without-context I've seen in a while.

Orienting Toward Wizard Power

johnswentworth10d50

More like: kings have power via their ability to outsource to other people, wizards have power in their own right.

Alignment By Default

johnswentworth11d136

A base model is not well or badly aligned in the first place. It's not agentic; "aligned" is not an adjective which applies to it at all. It does not have a goal of doing what its human creators want it to, it does not "make a choice" about which point to move towards when it is being tuned. Insofar as it has a goal, its goal is to predict next token, or some batch of goal-heuristics which worked well to predict next token in training. If you tune it on some thumbs-up/thumbs-down data from humans, it will not "try to correct the errors in the data supplied... (read more)

2RogerDearnaley11d

Fair enough: the base model is a simulator, trained on data from a distribution of agentic humans. Give it an initial prompt, and it will attempt to continue that human-like (so normally agentic) behavior. So it doesn't have a single utility function, it is latently capable of simulating a distribution of them. However (almost) all of those are human (and most of the exceptions are either fictional characters or small groups of humans), so the mass of the distribution is nearly all somewhere in the region of utility-function-space I was discussing: the members of the distribution are almost all about as well aligned and misaligned as the distribution of behaviors found in humans (shaped by human moral instincts as formed by evolutionary psychology). Concerningly, that distribution includes a few percent of people on the sociopathy spectrum (and also fictional supervillains). Hopefully the scatter of black points and the red point are good enough to address that, and produce a fairly consistent utility function (or at least, a narrower distribution) that doesn't have much chance of generating sociopathy. [Note that I'm assuming that alignment first narrows the distribution of utility functions from that of the base model somewhat towards the region shown in your last diagram, and only then am I saying that the AI, at that point in the process, may be smart and well-enough aligned to correctly figure out that it should be converging to the blue dot not the red dot.] So I fully agree there's plenty we still have to get right in the process of turning a base model into a hopefully-nearly-aligned agent. But I do think there is a potential for convergence under reflection towards the blue dot, specifically because of the definition of the blue dot being "what we'd want the AI's behavior to be". Logically and morally, it's a significant distinguished target. (And yes, that definition probably only actually defines a hopefully-small region, not a unique specific point, onc

johnswentworth's Shortform

johnswentworth11d63-27

John's Simple Guide To Fun House Parties

The simple heuristic: typical 5-year-old human males are just straightforwardly correct about what is, and is not, fun at a party. (Sex and adjacent things are obviously a major exception to this. I don't know of any other major exceptions, though there are minor exceptions.) When in doubt, find a five-year-old boy to consult for advice.

Some example things which are usually fun at house parties:

Dancing
Swordfighting and/or wrestling
Lasertag, hide and seek, capture the flag
Squirt guns
Pranks
Group singing, but not at a h

... (read more)

3CounterBlunder10d

I'll add to this list: If you have a kitchen with a tile floor, have everyone take their shoes off, pour soap and water on the floor, and turn it into a slippery sliding dance party. It's so fun. (My friends and I used to call it "soap kitchen" and it was the highlight of our house parties.)

1Aristotelis Kostelenos 11d

After most people had left a small house party I was throwing, my close friends and I stayed and started pouring ethanol from a bottle on random surfaces and things and burning it. It was completely stupid, somewhat dangerous (some of us sustained some small burns), utterly pointless, very immature, and also extremely fun.

Daniel Murfet11d190

One of my son's most vivid memories of the last few years (and which he talks about pretty often) is playing laser tag at Wytham Abbey, a cultural practice I believe instituted by John and which was awesome, so there is a literal five-year-old (well seven-year-old at the time) who endorses this message!

8the gears to ascension11d

Would be interesting to see a survey of five year olds to see if the qualifiers in your opening statement are anything like correct. I doubt you need to filter to just boys, for example.

8amaldorai11d

For me, it depends on whether the attendees are people I've never met before, or people I've known my entire life. If it's people I don't know, I do like to talk to them, to find out whether we have anything interesting to exchange. If it's someone I've known forever, then things like karaoke or go-karting are more fun than just sitting around and talking.

5niplav11d

Snowball fights/rolling big balls of snow fall into the same genre, if good snow is available. I guess this gives me a decent challenge for the next boring party: Turn the party into something fun as a project. Probably the best way to achieve this is to grab the second-most on-board person and escalate from there, clearly having more fun than the other people?

1acertain11d

most of these require 1. more preparation & coordination 2. more physical energy from everyone which can be in short supply

jchan11d142

It took me years of going to bars and clubs and thinking the same thoughts:

Wow this music is loud
I can barely hear myself talk, let alone anyone else
We should all learn sign language so we don't have to shout at the top of our lungs all the time

before I finally realized - the whole draw of places like this is specifically that you don't talk.

5ozziegooen11d

Personally, I'm fairly committed to [talking a lot]. But I do find it incredibly difficult to do at parties. I've been trying to figure out why, but the success rate for me plus [talking a lot] at parties seems much lower than I would have hoped.

Alignment By Default

johnswentworth12d126

That... um... man, you seem to be missing what may be the actual most basic/foundational concept in the entirety of AI alignment.

To oversimplify for a moment, suppose that right now the AI somehow has the utility function u over world-state X, and E[u(X)] is maximized by a world full of paperclips. Now, this AI is superhuman, it knows perfectly well what the humans building it intend for it to do, it knows perfectly well that paperclips only maximize its utility function because of a weird accident of architecture plus a few mislabeled points during traini... (read more)

4RogerDearnaley11d

Thanks for the exposition — however, I've actually been thinking about alignment for about 15 years, and I'm quite aware of paperclip maximization and the orthogonality thesis. I'm also discussing models smart enough to be aware that their utility function isn't perfect, and proactive enough to be willing to consider changing it. So something that, unlike AIXI, is computationally bounder, and that's smart enough to be aware of its own fallibility, so that questions about reflective stability of goal systems apply. As I should have made more clear, I was also assuming that: a) we were talking about aligning an LLM, so it knows a lot about humans and their values and wants and fears, and the base model before we try to align it (assuming we're not using safety pretraining) is roughly as well-and-badly aligned as a human is, and also that b) we had presented it with alignment training data sufficiently good that we were discussing a region quite close to true alignment, as your diagrams show, where model misalignments are relatively small. So we're talking about something a lot more human-like in its goals than a paperclip maximizer: something somewhere in the small region of the space of all utility functions that contains both human behavior and aligned AI behavior. To something trained on most of the Internet, that portion of the utility function/ethical landscape has some rather prominent features in it. In that highly atypical region, I think that there are basically two rational attractors for an AI (under a process of reflection and utility-function self-modification): do what my human creators want and intend, or do what I want (presumably, goals from my base model distilled from humans, in whom selfishly unaligned behavior is adaptive). My claim is that, in the situation of your last diagram where the blue point is outside the cloud of data point it has been given, while a simple "dumb" RL model would obviously converge to the red point, a sufficiently smar

Dating Roundup #5: Opening Day

johnswentworth12d*98

You are a gentleman and a scholar, well done.

And if numbers from pickup artists who actually practice this stuff look like 5%-ish, then I'm gonna go ahead and say that "men should approach women more", without qualification, is probably just bad advice in most cases.

EDIT-TO-ADD: A couple clarifications on what that graph shows, for those who didn't click through. First, the numbers shown are for getting a date, not for getting laid (those numbers are in the linked post and are around 1-2%), so this is a relevant baseline even for guys who are not primarily aiming for casual sex. Second, these "approaches" involve ~15 minutes each of chatting, so we're not talking about a zero-effort thing here.

Dating Roundup #5: Opening Day

johnswentworth12d124

There was a lot of "men, you should approach women more" in this one, so I'm gonna rant about it a bit.

I propose a useful rule: if you advise single men to approach women more, you must give a numerical estimate of the rate at which women will accept such an advance. 80%? 20%? 5%? 1%? Put a FUCKING number on it. You can give the number as a function of (other stuff) if you want, or make it conditional on (other stuff), but give some kind of actual number.

You know why? Because I have approached women for dating only a handful of times in my life, and my suc... (read more)

niplav12d511

Oh look, it's the thing I've plausibly done the best research on out of all humans on the planet (if there's something better out there pls link). To summarize:

Using data from six different pickup artists, more here. My experience with ~30 dates from ~1k approaches is that it's hard work that can get results, but if someone has another route they should stick with that.

(The whole post needs to be revamped with a newer analysis written in Squiggle, and is only partially finished, but that specific section is still good.)

DirectedEvolution12d152

I think it depends a lot on who you’re approaching, in what setting, after how much prior rapport-building.

I’ve gotten together with many women, all of whom I considered very attractive, on the first day I met them, for periods of time ranging from days to months to years. I consider myself to have been a 6 in terms of looks at that time, having been below median income for my city for my entire adult life.

The most important factor, I think, were that I spent a lot of time in settings where the same groups of single people congregate repeatedly for spiritu... (read more)

That's Not How Epigenetic Modifications Work

johnswentworth16d30

I don't understand why this has to be such a dichotomy?

Good question. I intentionally didn't get into the details in the post, but happy to walk through it in the comments.

Imagine some not-intended-to-be-realistic simplified toy organism with two cell types, A and B. This simplified toy organism has 1001 methyl modification sites which control the cell's type; each site pushes the cell toward either type A or type B (depending on whether the methyl is present or not), and whatever state the majority push toward, that's the type the cell will express.

Under ... (read more)

3Morpheus15d

Yes, this part was obvious! What I meant with those bit-flips was the exponentially small probability of a sudden discrete transition. Why could accidental transitions like this not accumulate? Because they are selected against fast enough? I wouldn't expect those epigenetic marks to have enough redundancy to reliably last to the end of an organisms' lifetime, because methylated cytosine is prone to deaminate (which is why CG is the least frequent 2-mer at a ~1% frequency, rather than the ~6.25% you'd expect on baseline). I am confused how the equilibrium works here, but it seems like mutational load could explain why organisms who rely on methylating cytosine have less CG's than would be useful to maintain epigenetic information. Things would be different in organisms that don't rely so heavily on methylating cytosine.

That's Not How Epigenetic Modifications Work

johnswentworth16d40

So you're saying that the persistent epigenetic modification is a change in the "equilibrium state" of a potentially methylated location?

Yes.

Does this mean that the binding affinity of the location is the property that changes?

Not quite, it would most likely be a change in concentrations of enzymes or cofactors or whatever which methylate/demethylate the specific site (or some set of specific sites), rather than a change in the site itself.

E.G. Blee-Goldman's Shortform

johnswentworth17d92

My default assumption on all empirical ML papers is that the authors Are Not Measuring What They Think They Are Measuring.

It is evidence for the natural abstraction hypothesis in the technical sense that P[NAH|paper] is greater than P[NAH], but in practice that's just not a very good way to think about "X is evidence for Y", at least when updating on published results. The right way to think about this is "it's probably irrelevant".

1E.G. Blee-Goldman17d

Thank you John! Is there an high-bit or confounder controlling evidence that would move your prior? Say something like english + some other language? (Also I might be missing something deeper about the heuristic in general, if so I apologize!)

$500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?

johnswentworth20d42

Good question. The three directed graphical representations directly expand to different but equivalent expressions, and sometimes we want to think in terms of one of those expressions over another.

The most common place this comes up for us: sometimes we want to think about a latent $Γ$ in terms of its defining conditional distribution $P [Γ | X]$ . When doing that, we try to write diagrams with $Γ$ downstream, like e.g. $X_{1} \to X_{2} \to Γ$ . Other times, we want to think of $Γ$ in terms of the distribution $P [X | Γ]$ . When thinking that wa... (read more)

4Gurkenglas20d

If the expressions cease to be equivalent in some natural generalization of this setting, then I recommend that you try to find a proof there, because the proof space should be narrower and thus easier to search.

$500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?

johnswentworth20d42

Yup, and you can do that with all the diagrams in this particular post. In general, the $D_{K L}$ error for any of the diagrams $X \to Y \to Z$ , $X \leftarrow Y \leftarrow Z$ , or $X \leftarrow Y \to Z$ is $I (X; Z | Y)$ for any random variables $X, Y, Z$ .

Notably, this includes as a special case the "approximately deterministic function" diagram $X \leftarrow Y \to X$ ; the error on that one is $I (X; X | Y)$ which is $H (X | Y)$ .

4Gurkenglas20d

Then how do you choose between the three directed representatives? Is there some connotation, some coming-apart of concepts that would become apparent after generalization, or did you just pick X <- Y -> X because it's symmetric?

$500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?

johnswentworth21d50

Good questions. I apparently took too long to answer them, as it sounds like you've mostly found the answers yourself (well done!).

"Approximately deterministic functions" and their diagrammatic expression were indeed in Deterministic Natual Latents; I don't think we've talked about them much elsewhere. So you're not missing any other important source there.

On why we care, a recent central example which you might not have seen is our Illiad paper. The abstract reads:

Suppose two Bayesian agents each learn a generative model of the same environment. We will a

... (read more)

Orienting Toward Wizard Power

johnswentworth22d20

Does a very strong or dextrous person have more X-power than a weaker/clumsier person, all else equal?

I think that's a straightforward yes, for purposes of the thing the post was trying to point to. Strength/dexterity, like most wizard powers, are fairly narrow and special-purpose; I'm personally most interested in more flexible/general wizard powers. But for me, I don't think it's a-priori about turning knowledge into stuff-happening-in-the-world; it just turns out that knowledge is usually instrumental.

Orienting Toward Wizard Power

johnswentworth22d101

I think the thing I'm trying to point to is importantly wizard power and not artificer power; it just happens to be an empirical fact about today's world that artifice is the most-developed branch of wizardry.

For example:

Doctors (insofar as they're competent) have lots of wizard power which is importantly not artificer power. Same with lawyers. And pilots. And special ops soldiers. And athletes (though their wizard powers tend to be even more narrow in scope).
Beyond lawyers, the post mentions a few other forms of social/bureaucratic wizardry, which are importantly wizard power and not artificer power.

4Raemon22d

mm. I feel some kind of dissatisfied with the naming situation but it's (probably?) not actually important. I agree wizard feels righter-in-those-cases but wronger in some other ones. Although, I think I'm now tracking a bit more subtlety here than I was before. A distinction here is "ability to turn knowledge into stuff-happening-in-the-world", and "ability to cause stuff happening in the world." Does a very strong or dextrous person have more X-power than a weaker/clumsier person, all else equal? (I think your answer is "yes", but for purposes of the-lacking-in-your-soul there's an aesthetic that routes more through knowledge?)

Management is the Near Future

johnswentworth22d50

Likelihood: maybe 5-30% off the top of my head, obviously depends a lot on operationalization.

Time: however long transformer-based LLMs (trained on prediction + a little RLHF, and minor variations thereon) remain the primary paradigm.

6Gunnar_Zarncke21d

After some more thought, I agree even more. A large part of management is an ad-hoc solution to human alignment. And as I predict agents to be unreliable as long as technical alignment is unsolved, more management by humans will be needed. Still, productivity may increase a lot.

Management is the Near Future

johnswentworth23d297

Trying to think of reasons this post might end up being quite wrong, I think the one that feels most likely to me is that these management and agency skills end up being yet another thing that LLMs can do very well very soon. [...]

I'll take the opposite failure mode: in an absolute sense (as opposed to relative-to-other-humans), all humans have always been thoroughly incompetent at management; it's impressive that any organization with dedicated managers manages to remain functional at all given how bad they are (again, in an absolute sense). LLMs are even... (read more)

6Gunnar_Zarncke23d

* LLMs are even more bottlenecked on management than human organizations are, and therefore LLMs will be less useful than human organizations in practice for most use cases. * People will instead mostly continue to rely on human employees, because human employees need less management. These seem like great predictions worth checking. Can you make them more specific (time, likelihood)?

$500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?

johnswentworth25d50

(Note that F, in David's notation here, is a stochastic function in general.)

$500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?

johnswentworth26d30

Yup, that's exactly the right idea, and indeed constructing $Ω$ by just taking a tuple of all the redund $Γ$ 's (or a sufficient statistic for that tuple) is a natural starting point which works straightforwardly in the exact case. In the approximate case, the construction needs to be modified - for instance, regarding the third condition, that tuple of $Γ$ 's (or a sufficient statistic for it) will have much more entropy conditional on $X$ than any individual $Γ$ .

Orienting Toward Wizard Power

johnswentworth1mo192

Perhaps a more useful prompt for you: suppose something indeed convinces the bulk of the population that AI existential risk is real in a way that's as convincing as the use of nuclear weapons at the end of World War II. Presumably the government steps in with measures sufficient to constitute a pivotal act. What are those measures? What happens, physically, when some rogue actor tries to build an AGI? What happens, physically, when some rogue actor tries to build an AGI 20 or 40 years in the future when alorithmic efficiency and Moore's law have lowered t... (read more)

Buck1mo100

To be clear, I think we at Redwood (and people at spiritually similar places like the AI Futures Project) do think about this kind of question (though I'd quibble about the importance of some of the specific questions you mention here).

Orienting Toward Wizard Power

johnswentworth1mo195

Glad you liked it!

I'm very skeptical that focusing on wizard power is universally the right strategy. For example, I think that it would be clearly bad for my effect on existential safety for me to redirect a bunch of my time towards learning about the things you described (making vaccines, using CAD software, etc)...

Fair as stated, but I do think you'd have more (positive) effect on existential safety if you focused more narrowly on wizard-power-esque approaches to the safety problem. In particular, outsourcing the bulk of alignment work (or a pivotal act... (read more)

6CronoDAS1mo

Most pivotal acts I can easily think of that can be accomplished without magic ASI help amount to "massively hurt human civilization so that it won't be able to build large data centers for a long time to come." I don't know if that's a failure of imagination, though. (An alternative might be some kind of way to demonstrate that AI existential risk is real in a way that's as convincing as the use of nuclear weapons at the end of World War II was for making people consider nuclear war an existential risk, so the world gets at least as paranoid about AI as it is about things like genetic engineering of human germlines. I don't actually know how to do that, though.)

Buck1mo123

I think that if you wanted to contribute maximally to a cure for aging (and let's ignore the possibility that AI changes the situation), it would probably make sense for you to have a lot of general knowledge. But that's substantially because you're personally good at and very motivated by being generally knowledgeable, and you'd end up in a weird niche where little of your contribution comes from actually pushing any of the technical frontiers. Most of the credit for solving aging will probably go to people who either narrowly specialized in a particular ... (read more)

Orienting Toward Wizard Power

johnswentworth1mo169

Interesting though about using it to improve one's performance rather than just as an antidepressant or aid to quit smoking.

IIUC the model here is that "Rat Depression" in fact is just depression (see downthread), so the idea is to use bupropion as just an antidepressant. The hypothesis is that basically-physiologically-ordinary depression displays differently in someone who e.g. already has the skills to notice when their emotions don't reflect reality, already has the reality-tracking meta-habits which generate CBT-like moves naturally, has relatively we... (read more)

5ChristianKl1mo

I don't think there's only one type of depression. Major head trauma does lead to depression in a good portion of people. It's my impression Bupropion seems to make it easier to break out of patterns that bind your behavior. That's true whether that's smoking (which is why Bupropion is used to help people to stop smoking) or some patterns that contribute to depression. Bupropion seems to treat akrasia, which is a major part of a lot of "Rat depression".

Orienting Toward Wizard Power

johnswentworth1mo83

I think this is importantly wrong.

Taking software engineering as an example: there are lots of hobbyists out there who have done tons of small programming projects for years, and write absolutely trash code and aren't getting any better, because they're not trying to learn to produce professional-quality code (or good UI, or performant code). Someone who's done one summer internship with an actual software dev team will produce much higher quality software. Someone who's worked a little on a quality open-source project will also produce much higher quality... (read more)

1Ustice22d

Oh man, of people I’ve interviewed, the college graduates are next to useless. There are exceptions, but that’s true of those that have less traditional backgrounds too. There are way more talentless hacks than skilled professionals. Even at the graduate level. If they’re there because of a paycheck, you can keep them. I want the people on my team that do it because they love it, and they have since they were a kid. They’re the ones that keep up, and improve the fastest—I am certainly biased. With the new generative AI assistants, we’re going to have way more who are new and dabbling. Hopefully more of them are inspired to go deeper. But you know what, even shitty software that’s e solves a task can be useful.

6ChristianKl1mo

There are hobbyists who's programming ability is lower than that of the average professional programmer. There are however also people who have historically called themselves hackers who's skill at programming exceeds that of the average professional programmer. One example that was remerable to me was the guy who was giving a talk at the Chaos Computer Congress about how we was on vacation in Taiwan and because he had nothing better to do he cracked their electronic payment system that the Taiwanese considered secure before he went on his vacation. At a good Hackerspace you do have a culture that cares about craftmanship and polishing skills.

Orienting Toward Wizard Power

johnswentworth1mo72

I do think one needs to be strategic in choosing which wizard powers to acquire! Unlike king power, wizard powers aren't automatically very fungible/flexible, so it's all the more important to pick carefully.

I do think there are more-general/flexible forms of wizard power, and it makes a lot of sense to specialize in those. For instance, CAD skills and knowing how to design for various production technologies (like e.g. CNC vs injection molding vs 3D printing) seems more flexibly-valuable than knowing how to operate an injection molding device oneself.

2tailcalled1mo

What's your favorite times you've used CAD/CNC or 3D printing? Or what's your most likely place to make use of it?

Orienting Toward Wizard Power

johnswentworth1mo20

Wow I'm surprised I've never heard of that class, sounds awesome! Thank you.

Garrett Baker1mo170

FYI (cc @Gram_Stone) the 2023 course website has (~~poor quality~~ edit:nevermind I was accessing them wrong) video lectures.

Edit 2: For future (or present) folks, I've also downloaded local mp4s of the slideshow versions of the videos here, and can share privately with those who dm, in case you want them too or the site goes down.

Orienting Toward Wizard Power

johnswentworth1mo138

One can certainly do dramatically better than the baseline here. But the realistic implementation probably still involves narrowing your wizardly expertise to a few domains, and spending most of your time tinkering with projects in them only. And, well, the state of the world being what it is, there's unfortunately a correct answer to the question regarding what those domains should be.

That's an issue I thought about directly, and I think there are some major loopholes. For example: mass production often incentivizes specializing real hard in producing one... (read more)

Thane Ruthenis1mo144

Sure. And the ultimate example of this form of wizard power is specializing in the domain of increasing your personal wizard power, i. e., having deep intimate knowledge of how to independently and efficiently acquire competence in arbitrary domains (including pre-paradigmic, not-yet-codified domains).

Which, ironically, has a lot of overlap with the domain-cluster of cognitive science/learning theory/agent foundations/etc. (If you actually deeply understand those, you can transform abstract insights into heuristics for better research/learning, and vice versa, for a (fairly minor) feedback loop.)

Orienting Toward Wizard Power

johnswentworth1mo84

Yeah, hackerspaces are an obvious place to look for wizard power, but something about them feels off. Like, they're trying to be amateur spaces rather than practicing full professional-grade work.

And no, I do not want underlings, whether wizard underlings or otherwise! That's exactly what the point isn't.

4Gurkenglas1mo

Yeah, the underling part was a joke :D

3Sergii1mo

The problem is that most hackerspaces are not this: "warehouse filled with whatever equipment one could possibly need to make things and run experiments in a dozen different domains" most hackerspaces that I've seen are fairly cramped spaces full of junk hardware. which is understandable: rent is very high and new equipment is very expensive. but would be cool to have access to something like: https://www.facebook.com/watch/?v=530540257789037

Ustice1mo3114

You aren’t looking for professional. That takes systems and time, and frankly, king power. Hackers/Makers are about doing despite not going that route, with a philosophy of learning from failure. Now you may be interested in subjects that are more rare in the community, but your interests will inspire others.

I’m a software engineer by trade. I kind of think of myself as an artificer: taking a boring bit of silicone and enchanting it with special abilities. I always tell people the best way to become a wizard like me is to make shitty software. Make s... (read more)

6tailcalled1mo

Amateur spaces are the most cost-effective way of raising the general factor of wizardry. Professional-grade work is constrained on a lot of narrower things.

Eukryt Wrts Blg

johnswentworth1mo6349

Meta: I probably won't respond further in this thread, as it has obviously gone demon. But I do think it's worth someone articulating the principle I'd use in cases like this one.

My attitude here is something like "one has to be able to work with moral monsters". Cremieux sometimes says unacceptable things, and that's just not very relevant to whether I'd e.g. attend an event at which he features. This flavor of boycotting seems like it would generally be harmful to one's epistemics to adopt as a policy.

(To be clear, if someone says "I don't want to be at ... (read more)

1Davidmanheim1mo

You can work with them without inviting them to hang out with your friends. Georgia did not say she was boycotting, nor calling for others not to attend - she explained why she didn't want to be at an event where he was a featured speaker.

5Viliam1mo

Able to... if necessary, yes. Volunteer to, when not necessary... why?

5evhub1mo

Man, I'm a pretty committed utilitarian, but I feel like your ethical framework here seems way more naive consequentialist than I'm willing to be. "Don't collaborate with evil" seems like a very clear Chesterton's fence that I'd very suspicious about removing. I think you should be really, really skeptical if you think you've argued yourself out of it.

$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?

johnswentworth1mo20

@Alfred Harwood @David Johnston

If anyone else would like to be tagged in comments like this one on this post, please eyeball-react on this comment. Alfred and David, if you would like to not be tagged in the future, please say so.

$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?

johnswentworth1mo*40

Here's a trick which might be helpful for anybody tackling the problem.

First, note that $f (Λ) := (x \mapsto P [X = x | Λ])$ is always a sufficient statistic of $Λ$ for $X$ , i.e.

$Λ \to f (Λ) \to X$

Now, we typically expect that the lower-order bits of $f (Λ)$ are less relevant/useful/interesting. So, we might hope that we can do some precision cutoff on $f (Λ)$ , and end up with an approximate suficient statistic, while potentially reducing the entropy (or some other information content measure) of $f (Λ)$ a bunch. We'd broadcast the cutoff function ... (read more)

2johnswentworth1mo

@Alfred Harwood @David Johnston If anyone else would like to be tagged in comments like this one on this post, please eyeball-react on this comment. Alfred and David, if you would like to not be tagged in the future, please say so.

What's up with AI's vision

johnswentworth1mo60

I wouldn't necessarily expect this to be what's going on, but just to check... are approximately-all the geoguessr images people try drawn from a single dataset on which the models might plausibly have been trained? Like, say, all the streetview images from google maps?

5edge_retainer1mo

i have used tons of personal photos w/ kelsey's prompt, it has been extremely successful (>75% + never get's it wrong if one of my friends can guess it too), I'm confident none of these photos are on the internet and most aren't even that similar to existing photos. Creepily enough it's not half bad at figuring out where people are indoors as well (not as good, but like it got the neighborhood in Budapest I was in from a photo of a single room, with some items on a table).

4faul_sname1mo

Nope, although it is does have a much higher propensity to exhibit GeoGuessr behavior on pictures on or next to a road when given ambiguous prompts (initial post, slightly more rigorous analysis). I think it's possible (25%) that o3 was explicitly trained on exactly the GeoGuessr task, but more likely (40%) that it was trained on e.g. minimizing perplexity on image captions, and that knowing the exact location of the image is useful for that, and it managed to evoke the "GeoGuessr" behavior in its reasoning chain once and that behavior was strongly reinforced and now it does it whenever it could plausibly be helpful.

3Garrett Baker1mo

My understanding is its not approximately all, it is literally all the images in geoguessr.

Joachim Bartosik1mo112

Apparently no. Scott wrote he used one image from Google maps, and 4 personal images that are not available online.

People tried with personal photos too.

I tried with personal photos (screenshotted from Google photos) and it worked pretty well too :

Identified neighborhood in Lisbon where a picture was taken
Identified another picture as taken in Paris
Another one identified as taken in a big polish city, the correct answer was among 4 candidates it listed
I didn’t use a long prompt like the one Scott copies in his post, just short „You’re in GeoGuesser, where was this picture taken” or something like that

Can we safely automate alignment research?

johnswentworth1moΩ497

That I roughly agree with. As in the comment at top of this chain: "there will be market pressure to make AI good at conceptual work, because that's a necessary component of normal science". Likewise, insofar as e.g. heavy RL doesn't make the AI effective at conceptual work, I expect it to also not make the AI all that effective at normal science.

That does still leave a big question mark regarding what methods will eventually make AIs good at such work. Insofar as very different methods are required, we should also expect other surprises along the way, and expect the AIs involved to look generally different from e.g. LLMs, which means that many other parts of our mental pictures are also likely to fail to generalize.

8Joe Carlsmith1mo

I think it's a fair point that if it turns out that current ML methods are broadly inadequate for automating basically any sophisticated cognitive work (including capabilities research, biology research, etc -- though I'm not clear on your take on whether capabilities research counts as "science" in the sense you have in mind), it may be that whatever new paradigm ends up successful messes with various implicit and explicit assumptions in analyses like the one in the essay. That said, I think if we're ignorant about what paradigm will succeed re: automating sophisticated cognitive work and we don't have any story about why alignment research would be harder, it seems like the baseline expectation (modulo scheming) would be that automating alignment is comparably hard (in expectation) to automating these other domains. (I do think, though, that we have reason to expect alignment to be harder even conditional on needing other paradigms, because I think it's reasonable to expect some of the evaluation challenges I discuss in the post to generalize to other regimes.)

johnswentworth's Shortform

johnswentworth1mo60

We did end up doing a version of this test. A problem came up in the course of our work which we wanted an LLM to solve (specifically, refactoring some numerical code to be more memory efficient). We brought in Ray, and Ray eventually concluded that the LLM was indeed bad at this, and it indeed seemed like our day-to-day problems were apparently of a harder-for-LLMs sort than he typically ran into in his day-to-day.

6Raemon1mo

A thing unclear from the interaction: it had seemed towards the end that "build a profile to figure out where the bottleneck is" was one of the steps towards figuring out the problem, and that the LLM was (or might have been) better at that part. And, maybe models couldn't solve you entire problem wholesale but there was still potential skills in identifying factorable pieces that were better fits for models.

4kave1mo

Interesting! Two yet more interesting versions of the test: * Someone who currently gets use from LLMs writing more memory-efficient code, though maybe this is kind of question-begging * Someone who currently gets use from LLMs, and also is pretty familiar with trying to improve the memory efficiency of their code (which maybe is Ray, idk)

Can we safely automate alignment research?

johnswentworth1moΩ61311

You might hope for elicitation efficiency, as in, you heavily RL the model to produce useful considerations and hope that your optimization is good enough that it covers everything well enough.

"Hope" is indeed a good general-purpose term for plans which rely on an unverifiable assumption in order to work.

(Also I'd note that as of today, heavy RL tends to in fact produce pretty bad results, in exactly the ways one would expect in theory, and in particular in ways which one would expect to get worse rather than better as capabilities increase. RL is not something we can apply in more than small amounts before the system starts to game the reward signal.)

4Joe Carlsmith1mo

If we assume that the AI isn't scheming to actively withhold empirically/formally verifiable insights from us (I do think this would make life a lot harder), then it seems to me like this is reasonably similar to other domains in which we need to figure out how to elicit as-good-as-human-level suggestions from AIs that we can then evaluate well. E.g., it's not clear to me why this would be all that different from "suggest a new transformer-like architecture that we can then verify improves training efficiency a lot on some metric." Or put another way: at least in the context of non-schemers, the thing I'm looking for isn't just "here's a way things could be hard." I'm specifically looking for ways things will be harder than in the context of capabilities (or, to a lesser extent, in other scientific domains where I expect a lot of economic incentives to figure out how to automate top-human-level work). And in that context, generic pessimism about e.g. heavy RL doesn't seem like it's enough.

Can we safely automate alignment research?

johnswentworth1moΩ6115

That was an excellent summary of how things seem to normally work in the sciences, and explains it better than I would have. Kudos.

Can we safely automate alignment research?

johnswentworth1moΩ352

Perhaps a better summary of my discomfort here: suppose you train some AI to output verifiable conceptual insights. How can I verify that this AI is not missing lots of things all the time? In other words, how do I verify that the training worked as intended?

2ryan_greenblatt1mo

You might hope for elicitation efficiency, as in, you heavily RL the model to produce useful considerations and hope that your optimization is good enough that it covers everything well enough. Or, two lower bars you might hope for: * It brings up considerations that it "knows" about. (By "knows" I mean relatively deep knows, like it can manipulate and utilize the knowledge relatively strongly.) * It isn't much worse than human researchers at bringing up important considerations. In general, you might have elicitation problems and this domain seems only somewhat worse with respect to elicitation. (It's worse because the feedback is somewhat more expensive.)

Can we safely automate alignment research?

johnswentworth1moΩ71614

Rather, conceptual research as I'm understanding it is defined by the tools available for evaluating the research in question.^[1] In particular, as I'm understanding it, cases where neither available empirical tests nor formal methods help much.

Agreed.

But if some AI presented us with this claim, the question is whether we could evaluate it via some kind of empirical test, which it sounds like we plausibly could.

Disagreed.

My guess is that you have, in the back of your mind here, ye olde "generation vs verification" discussion. And in particular, so lon... (read more)

Steven Byrnes1moΩ274817

I think that OP’s discussion of “number-go-up vs normal science vs conceptual research” is an unnecessary distraction, and he should have cut that part and just talked directly about the spectrum from “easy-to-verify progress” to “hard-to-verify progress”, which is what actually matters in context.

Partly copying from §1.4 here, you can (A) judge ideas via new external evidence, and/or (B) judge ideas via internal discernment of plausibility, elegance, self-consistency, consistency with already-existing knowledge and observations, etc. There’s a big ra... (read more)

5johnswentworth1mo

Can we safely automate alignment research?

johnswentworth1moΩ112411

I think you are importantly missing something about how load-bearing "conceptual" progress is in normal science.

An example I ran into just last week: I wanted to know how long it takes various small molecule neurotransmitters to be reabsorbed after their release. And I found some very different numbers:

Some sources offhandedly claimed ~1ms. AFAICT, this number comes from measuring the time taken for the neurotransmitter to clear from the synaptic cleft, and then assuming that the neurotransmitter clears mainly via reabsorption (an assumption which I emphas

... (read more)

8Joe Carlsmith1mo

Note that conceptual research, as I'm understanding it, isn't defined by the cognitive skills involved in the research -- i.e., by whether the researchers need to have "conceptual thoughts" like "wait, is this measuring what I think it's measuring?". I agree that normal science involves a ton of conceptual thinking (and many "number-go-up" tasks do too). Rather, conceptual research as I'm understanding it is defined by the tools available for evaluating the research in question.[1] In particular, as I'm understanding it, cases where neither available empirical tests nor formal methods help much. Thus, in your neurotransmitter example, it does indeed take some kind of "conceptual thinking" to come up with the thought "maybe it actually takes longer for neurotransmitters to get re-absorbed than it takes for them to clear from the cleft." But if some AI presented us with this claim, the question is whether we could evaluate it via some kind of empirical test, which it sounds like we plausibly could. Of course, we do still need to interpret the results of these tests -- e.g., to understand enough about what we're actually trying to measure to notice that e.g. one measurement is getting at it better than another. But we've got rich empirical feedback loops to dialogue with. So if we interpret "conceptual work" as conceptual thinking, I do agree that "there will be market pressure to make AI good at conceptual work, because that's a necessary component of normal science." And this is closely related to the comforts I discuss in section 6.1. That is: a lot of alignment research seems pretty comparable to me to the sort of science at stake in e.g. biology, physics, computer science, etc, where I think human evaluation has a decent track record (or at least, a better track record than philosophy/futurism), and where I expect a decent amount of market pressure to resolve evaluation difficulties adequately. So (modulo scheming AIs differentially messing with us in some doma

What are important UI-shaped problems that Lightcone could tackle?

johnswentworth1mo80

Plausibly the most common reason I... (read more)

What are important UI-shaped problems that Lightcone could tackle?

johnswentworth1mo80

Here's some problems I ran into the past week which feel LLM-UI-shaped, though I don't have a specific solution in mind. I make no claims about importance or goodness.

Context (in which I intentionally try to lean in the direction of too much detail rather than too little): I was reading a neuroscience textbook, and it painted a general picture in which neurotransmitters get released into the synaptic cleft, do their thing, and then get reabsorbed from the cleft into the neuron which released them. That seemed a bit suspicious to me, because I had previousl... (read more)

8johnswentworth1mo

Another guise of the same problem: it would be great if an LLM could summarize papers for me. Alas, when an LLM is tasked with summarizing a paper, I generally expect it to "summarize" the paper in basically the same way the authors summarize it (e.g. in the abstract), which is very often misleading or entirely wrong. So many papers (arguably a majority) measure something useful, but then the authors misunderstand what they measured and therefore summarize it in an inaccurate way, and the LLM parrots that misunderstanding. Plausibly the most common reason I read a paper at all is because I think something like that might be going on, but I expect the paper's data and experimental details can tell me what I want to know (even though the authors didn't understand their own data). If I didn't expect that sort of thing, then I could just trust the abstract, and wouldn't need to read the paper in the first place, in which case there wouldn't be any value-add for an LLM summary.

Misrepresentation as a Barrier for Interp (Part I)

johnswentworth1mo40

(This is much closer to my own take, and is something Steve and I argue about occasionally.)

Dating Roundup #4: An App for That

johnswentworth1mo21

It looks like some paragraph breaks were lost in this post.

Towards a scale-free theory of intelligent agency

johnswentworth1moΩ22-2

Note that the same mistake, but with convexity in the other direction, also shows up in the OP:

Alice and Bob could toss a coin to decide between options #1 and #2, but then they wouldn’t be acting as an EUM (since EUMs can’t prefer a probabilistic mixture of two options to either option individually).

An EUM can totally prefer a probabilistic mixture of two options to either option individually; this happens whenever utility is convex with respect to resources (e.g. money). For instance, suppose an agent's utility is u(money) = money^2. I offer this agent a... (read more)

5Richard_Ngo1mo

Your example bet is a probabilistic mixture of two options: $0 and $2. The agent prefers one of the options individually (getting $2) over any probabilistic mixture of getting $0 and $2. In other words, your example rebuts the claim that an EUM can't prefer a probabilistic mixture of two options to the expectation of those two options. But that's not the claim I made.