LESSWRONG
LW

All of Stag's Comments + Replies

Shallow review of technical AI safety, 2024

Very fair observation; my take is that a relevant continuation is occurring under OpenAI Alignment Science, but I would be interested in counterpoints - the main claim I am gesturing towards here is that the agenda is alive in other parts of the community, despite the previous flagship (and the specific team) going down.

2eggsyntax5mo

Oh, fair enough. Yeah, definitely that agenda is still very much alive! Never mind, then, carry on :)

Shallow review of technical AI safety, 2024

Stag5mo10

As far as I understand, the banner is distinct - the team members seem not the same, but with meaningful overlap with the continuation of the agenda. I believe the most likely source of an error here is whether work is actually continuing in what could be called this direction. Do you believe the representation should be changed?

4eggsyntax5mo

My impression from coverage in eg Wired and Future Perfect was that the team was fully dissolved, the central people behind it left (Leike, Sutskever, others), and Leike claimed OpenAI wasn't meeting its publicly announced compute commitments even before the team dissolved. I haven't personally seen new work coming out of OpenAI trying to 'build a roughly human-level automated alignment researcher⁠' (the stated goal of that team). I don't have any insight beyond the media coverage, though; if you've looked more deeply into it than that, your knowledge is greater than mine. (Fairly minor point either way; I was just surprised to see it expressed that way)

Shallow review of technical AI safety, 2024

Stag5mo31

I think your comment adds a relevant critique of the criticism, but given that this comes from someone contributing to the project, I don't believe it's worth leaving it out altogether. I added a short summary and a hyperlink to a footnote.

1Satron5mo

Sounds good to me!

Shallow review of technical AI safety, 2024

Stag5mo10

Good point imo, expanded and added a hyperlink!

Shallow review of technical AI safety, 2024

Stag5mo40

Would you agree that the entire agenda of collective intelligence is aimed at addressing 11. Someone else will deploy unsafe superintelligence first and 13. Fair, sane pivotal processes, or does that cut off nuance?

3Ivan Vendrov5mo

cuts off some nuance, I would call this the projection of the collective intelligence agenda onto the AI safety frame of "eliminate the risk of very bad things happening" which I think is an incomplete way of looking at how to impact the future in particular I tend to spend more time thinking about future worlds that are more like the current one in that they are messy and confusing and have very terrible and very good things happening simultaneously and a lot of the impact of collective intelligence tech (for good or ill) will determine the parameters of that world

Shallow review of live agendas in alignment & safety

Stag1y10

Thanks, added!

Cup-Stacking Skills (or, Reflexive Involuntary Mental Motions)

Stag4y40

I really like the artistry of post-writing here; the introduction to and transition between the three videos felt especially great.

I've been internally using the term elemental for something in this neighborhood - Frame-Breaker elemental, Incentive-Slope elemental, etc. The term feels more totalizing (having two cup-stacking skills is easy to envision; being a several-thing elemental points in the direction of you being some mix of those things, and only those things), but some other connotations feel more on-target (like the difficulty of not doing the thing). I also like the term's aesthetics, but I could well be alone in that.

6Duncan Sabien (Deactivated)4y

I often use "_____-type Pokémon" as a shorthand in casual conversation and it usually parses immediately.

On the nature of purpose

Stag4y20

I'm not sure I understand the cryptographer's constraint very well, especially with regard to language: individual words seem to have different meanings ("awesome", "literally", "love"). It's generally possible to infer which decryption was intended from the wider context, but sometimes the context itself will have different and mutually exclusive decryptions, such as in cases of real or perceived dogwhistling.

One way I could see this specific issue being resolved is by looking at what the intent of the original communication was - this would make it so th... (read more)

1Nora_Ammann4y

As far as I can tell, I agree with what you say - this seems like a good account of how the cryptophraher's constraint cashes out in language. To your confusion: I think Dennett would agree that it is Darwianian all the way down, and that their disagreement lies elsewhere. Dennet's account for how "reasons turn into causes" is made on Darwinian grounds, and it compels Dennett (but not Rosenberg) to conclude that purposes deserve to be treated as real, because (compressing the argument a lot) they have the capacity to affect the causal world. Not sure this is useful?

Power Buys You Distance From The Crime

Stag6y100

I might be missing the forest for the trees, but all of those still feel like they end up making some kinds of predictions based on the model, even if they're not trivial to test. Something like:

If Alice were informed by some neutral party that she took Bob's apple, Charlie would predict that she would not show meaningful remorse or try to make up for the damage done beyond trivial gestures like an off-hand "sorry" as well as claiming that some other minor extraction of resources is likely to follow, while Diana would predict that Alice... (read more)

Off the Cuff Brangus Stuff

Stag6y100

I am not one of the Old Guard, but I have an uneasy feeling about something related to the Chakra phenomenon.

It feels like there's a lot of hidden value clustered around wooy topics like Chakras and Tulpas, and the right orientation towards these topics seems fairly straightforward: if it calls out to you, investigate and, if you please, report. What feels less clear to me is how I as an individual or as a member of some broader rat community should respond when, according to me, people do not certain forms of bullshit tests.

This comes from someone wi... (read more)

Unrolling social metacognition: Three levels of meta are not enough.

Stag7y40

The complete unrolling of 2.5 (and thus 2.6) feel off if they are placed in the same chain of meta-reasoning. Specifically, Charlie doesn't seem like she's reacting to any chains at all, just the object-level aspect of Alex pegging Bailey as a downer. I can see how more layers of meta can arise in general, but in situations like these where a third person arrives after some events have already unfolded doesn't feel like it fits that model very well - is the claim that Charlie does a subconscious tree search for various values of X that might... (read more)

7ESRogs7y

I sort of agree. But in the cases where Charlie knows what happened, you might expect that their evaluation of whether Alex was right to conclude that Bailey is a downer might depend on the full chain of events.

Terrorism, Tylenol, and dangerous information

Stag7y10

A South Korean show by the name of "the Genius" is basically a case study in adaptive memes in a competitive environment, which might serve as an even better example. There are copycats, innovators and bystanders, and they all have varying levels of ingenuity and honor.

Affordance Widths

Stag7y90

It seems to me that for any given {B}, the vast majority of Adams would deny {B} having this property, or at the very least deny that they are Adams in the given case. I think that's what it feels like from the inside, too - recognizing Adamness in oneself feels difficult, but it seems like a higher waterline in that regard is necessary to stop the phenomenon of useless or net-negative advice among other downstream consequences.

Melting Gold, and Organizational Capacity

Stag7y50

In this vein, I would be very interested in hearing anecdotes about how easy mode events feel different from hard mode events. I don't think I've ever participated in an easy mode event that did not feel like a poor use of time, but that might be due to the environments where those happened (schools and universities).