LESSWRONG
LW

842
Zvi
53822Ω146100214770
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
COVID-19 Updates and Analysis
Immoral Mazes
Slack and the Sabbath
The Darwin Game
More Reactions to If Anyone Builds It, Everyone Dies
Zvi2d20

My guess is that on the margin more time should be spent improving the core messaging versus saturating the dialogue tree, on many AI questions, if you combine effort across everyone.

Reply
More Reactions to If Anyone Builds It, Everyone Dies
Zvi5d50

Lot of 'welcome to my world' vibes reading your self reports here, especially the '50 different people have 75 different objections for a mix of good, bad and deeply stupid reasons, and require 100 different responses, some of which are very long, and it takes a back-and-forth to figure out which one, and you can't possibly just list everything' and so on, and that's without getting into actually interesting branches and the places where you might be wrong or learn something, etc. 

So to take your example, which seems like a good one:

Humans don't generalize their values out of distribution. I affirm this not as strictly fully true, but on the level of 'this is far closer to true and generative of a superior world model then its negation' and 'if you meditate on this sentence you may become [more] enlightened.'

I too have noticed that people seem to think that they do so generalize in ways they very much don't, and this leads to a lot of rather false conclusions.

I also notice that I'm not convinced we are thinking about the sentence that similarly in ways that could end up being pretty load bearing. Stuff gets complicated. 

I think that when you say the statement is 'trivially' true you are wrong about that, or at least holding people to unrealistic standards of epistemics? And that a version of this mistake is part of the problem. At least from me (I presume from others too) you get a very different reaction from saying each of:

  1. Humans don't generalize their values out of distribution. (let this be [X]).
  2. Statement treating [X] as in-context common knowledge.
  3. It is trivially true that [X] (said explicitly), or 'obviously' [X], or similar.
  4. I believe that [X] or am very confident that [X]. (without explaining why you believe this)
  5. I believe that [X] or am very confident that [X], but it is difficult for me to explain/justify. 

And so on. I am very deliberate, or try to be, on which one I say in any given spot, even at the cost of a bunch of additional words.

Another note is I think in spots like this you basically do have to say this even if the subject already knows, to establish common knowledge and that you are basing your argument on this, even if only to orient them that this is where you are reasoning from. So it was a helpful statement to say and a good use of a sentence. 

I see that you get disagreement votes when you say this on LW., but the comments don't end up with negative karma or anything. I can see how that can be read as 'punishment' but I think that's the system working as intended and I don't know what a better one would be? 

In general, I think if you have a bunch of load-bearing statements where you are very confident they are true but people typically think the statement is false and you can't make an explicit case for them (either because you don't have that kind of time/space, or because you don't know how), then the most helpful thing to do is to tell the other person the thing is load bearing, and gesture towards it and why you believe it, but be clear you can't justify it. You can also look for arguments that reach the same conclusion without it - often true things are highly overdetermined so you can get a bunch of your evidence 'thrown out of court' and still be fine, even if that sucks.

Reply
More Reactions to If Anyone Builds It, Everyone Dies
Zvi5d30

On Janus comparisons: I do model you as pretty distinct from them in underlying beliefs although I don't pretend to have a great model of either belief set. Reaction expectations are similarly correlated but distinct. I imagine they'd say that they answer good faith questions too, and often that's true (e.g. when I do ask Janus a question I have a ~100% helpful answer rate, but that's with me having a v high bar for asking).

Reply
More Reactions to If Anyone Builds It, Everyone Dies
Zvi6d111

If that's your reaction to my reaction, then it was a miss in at least some ways, which is on me. 

I did not feel angry (more like frustrated?) when I wrote it nor did I intend to express anger, but I did read your review itself as expressing anger and hostility in various forms - you're doing your best to fight through that and play fair with the ideas as you see them, which is appreciated -  and have generally read your statements about Yudkowsky and related issues consistently as being something in the vicinity of angry, also as part of a consistent campaign, and perhaps some of this was reflected in my response. It's also true that I have a cached memory of you often responding as if things said are more hostile than I felt they were or were intended, although I do not recall examples at this point. 

And I hereby report that, despite at points in the past putting in considerable effort trying to parse your statements, and at some point found it too difficult, frustrating and aversive in some combination and mostly stopped attempting to do so when my initial attempt on a given statement bounced (which sometimes it doesn't). 

(Part of what is 'esoteric' is perhaps that the perfect-enemy-of-good thing means a lot of load-bearing stuff is probably unsaid by you, and you may not realize that you haven't said it?)

But also, frankly, when people write much dumber reviews with much dumber things in them, I mostly can't even bring myself to be mad, because I mean what else can one expect from such sources - there's only one such review that actually did make me angry, because it was someone where I expected better. It's something I've worked a lot on, and I think made progress on - I don't actually e.g. get mad at David Sacks anymore as a person, although I still sometimes get mad that I have to once again write about David Sacks. 

To the extent I was actually having a reaction to you here it was a sign that I respect you enough to care, that I sense opportunity in some form, and that you're saying actual things that matter rather than just spouting gibberish or standard nonsense. 

Similarly, with the one exception, if those people had complained about my reaction to their reaction in the ways I'd expect them to do so, I would have ignored them.

Versus your summary of your review, I would say I read it more as:

  1. We are currently in an alignment winter. (This is bad). This is asserted as 'obvious' and then causes are cited, all in what I read as a hostile manner, and an assertion of 'facts not in evidence' that I indeed disagree with, including various forms of derision that read in-context as status attacks and accusations of bad epistemic action, and the claim that  the value loading problem has been solved, which is all offered in a fashion that implies you think this is all clearly true if not rather obvious, and this is all loaded up front despite it not being especially relevant to the book, and echoing things you talk about a lot. This sets the whole thing up as an adversarial exercise. You can notice that in my reaction, I treated these details as central, in a way you don't seem to think are, or at least I think the central thing boils down to this thing?
  2. Alignment is not solved yet but people widely believe it is. (This is bad). It's weird because you say 'we solved [X] and people think [X] solves alignment but it doesn't' where I don't think it's true we solved [X].
  3. I was expecting to hate the book but it actually retreats on most of the rhetoric I blame for contributing to the alignment winter. (This is good) Yes.
  4. The style of the book is bad, but I won't dwell on it and in fact spend a paragraph on the issue and then move on. 'Truly appalling' editorial choices, weird and often condescending, etc. Yes it's condensed but you come on very strong here (which is fine, you clearly believe it, but I wouldn't minimize its role). Also your summary skips over the 'contempt for LLMs' paragraph.
  5. I actually disagree with the overall thesis, but think it's virtuous to focus on the points of agreement when someone points out an important issue so I don't dwell on that either and instead.
  6. "Emphatically agree" (literal words) that AI labs are not serious about the alignment problem.
  7. State a short version of what the alignment problem actually is. (Important because it's usually conflated with or confused with simpler problems that sound a lot easier to solve.)
  8. I signal boost Eliezer's other and better writing because I think my audience is disproportionately made up of people who might be able to contribute to the alignment problem if they're not deeply confused about it and I think Eliezer's earlier work is under-read.
  9. I reiterate that I think the book is kinda bad, since I need a concluding paragraph.

I read 'ok' in this context as better than 'kinda bad' fwiw. 

As for 'I should just ask you,' I notice this instinctively feels aversive as likely opening up a very painful and time consuming and highly frustrating interaction or set of interactions and I notice I have the strong urge not to do it. I forget the details of the interactions with you in particular or close others that caused this instinct, and it could be a mistake. I could be persuaded to try again. 

I do know that when I see the interactions of the entire Janus-style crowd on almost anything, I have the same feeling I had with early LW, where I expect to get lectured to and yelled at and essentially downvoted a lot, including in 'get a load of this idiot' style ways,  if I engage directly in most ways and it puts me off interacting. Essentially it doesn't feel like a safe space for views outside a certain window. This makes me sad because I have a lot of curiosity there, and it is entirely possible this is deeply stupid and if either side braved mild social awkwardness we'd all get big gains from trade and sharing info. I don't know.

I realize it is frustrating to report things in my head where I can't recall many of the sources of the things, but I am guessing that you would want me to do that given that this is the situation.

I dunno, man, this is definitely a 'write the long letter' situation and I'm calling it here. 

(If you want to engage further, my reading of LW comments even on my own posts is highly unreliable, but I would get a PM or Twitter DM or email etc pretty reliably). 

Reply
I enjoyed most of IABIED
Zvi11d90

Question for Buck: What changes do you anticipate happening between now and the world where we create ASI, that you believe matter for the prognosis here? 

Reply
OpenAI’s GPT-OSS Is Already Old News
Zvi2mo20

Thanks, very helpful!

And yes, I noticed most of the glaring errors you pointed out in o3-pro and Claude's analyses, I interpreted it essentially as a strong message of 'if the GPT-OSS models are improvements they will matter but I shouldn't assume they are improvements, and if not they won't matter.'

Reply
Spilling the Tea
Zvi2mo62

ok wow, yeah, that one's a lot worse and basically marks the app as basically 'don't write anything here you wouldn't want on the front page of the NYT.' 

Reply
America’s AI Action Plan Is Pretty Good
Zvi2mo142

This was very helpful to me and we had a good talk about things. 

I do think it is a correct criticism of my post to say that I should have emphasized more that I think the rhetoric used here and the administration's overall policy path is terrible. After seeing everyone else's responses be so positive, and after seeing Oliver put so much emphasis on the rhetoric versus the proposals, I'm sad about that, and plan to address that going forward, likely in the weekly (given reading patterns it would not do much to try and edit the post now). 

Reply41
America’s AI Action Plan Is Pretty Good
Zvi2mo7-5

I believe that this is a competently executed plan from the perspective of those executing the plan, which is different from the entire policy of the White House being generally competent in ways that those in charge of the plan lacked the power to do anything about (e.g. immigration, attacks on solar power, trade and alliances in general...)

Reply
America’s AI Action Plan Is Pretty Good
Zvi2mo61

As I say up top, one must distinguish the rhetoric from the substance. The rhetoric is terrible, although not as terrible as my median expectation was for it, because of what is not said. On international treaties, I fail to see how anything here makes that situation any worse than baseline, including the rhetoric, given what has already been said and done, you shouldn't be crying about that more than you were last week.

On the substance, this was much better than expectations except that we agree it had unexpectedly competent execution. And I don't think this is anything like the level of 'full economic mobilization.' Setting aside competence level, it is hard to think of how this report could have been better given who was in charge of directing and approving the report. 

If you think things are so bad that the primary thing you want on realistic margins from America's AI policy is incompetent execution, if you want to say reality does not grade on a curve, then okay. I mean, I get it.

Reply2
Load More
55Claude Sonnet 4.5: System Card and Alignment
10h
3
53On Dwarkesh Patel’s Podcast With Richard Sutton
1d
6
19Economics Roundup #6
5d
5
23AI #135: OpenAI Shows Us The Money
6d
2
40OpenAI Shows Us The Money
7d
8
33More Reactions to If Anyone Builds It, Everyone Dies
8d
20
30H1-B And The $100k Fee
9d
1
61Book Review: If Anyone Builds It, Everyone Dies
12d
3
34AI #134: If Anyone Reads It
13d
8
59Reactions to If Anyone Builds It, Anyone Dies
13d
1
Load More