All of Chi Nguyen's Comments + Replies

My current guess is that occasional volunteers are totally fine! There's some onboarding cost but mostly, the cost on our side scales with the number of argument-critique pairs we get. Since the whole point is to have critiques of a large variety of quality, I don't expect the nth argument-critque pair we get to be much more useable than the 1st one. I might be wrong about this one and change my mind as we try this out with people though!

(Btw I didn't get a notification for your comment, so maybe better to dm if you're interested.)

Looking for help with an acausal safety project. If you’re interested or know someone who might be, it would be really great if you let me know/share

  1. Help with acausal research and get mentoring to learn about decision theory
  • Motivation: Caspar Oesterheld (inventor/discoverer of ECL/MSR), Emery Cooper and I are doing a project where we try to get LLMs to help us with our acausal research.
    • Our research is ultimately aimed at making future AIs acausally safe.
  • Project: In a first step, we are trying to train an LLM classifier that evaluates critiques of arguments
... (read more)
1Rauno Arike
What's the minimum capacity in which you're expecting people to contribute? Are you looking for a few serious long-term contributors or are you also looking for volunteers who offer occasional help without a fixed weekly commitment?
Chi Nguyen2123

I don't trust Ilya Sutskever to be the final arbiter of whether a Superintelligent AI design is safe and aligned. We shouldn't trust any individual,

I'm not sure how I feel about the whole idea of this endeavour in the abstract - but as someone who doesn't know Ilya Sutskever and only followed the public stuff, I'm pretty worried that he in particular runs it if decision-making is on the "by an individual" level and even if not. Running this safely will likely require lots of moral integrity and courage. The board drama made it look to me like Ilya disquali... (read more)

Chi Nguyen127

Greg Brockman and Sam Altman (cosigned):
[...]
First, we have raised awareness of the risks and opportunities of AGI so that the world can better prepare for it. We’ve repeatedly demonstrated the incredible possibilities from scaling up deep learning

chokes on coffee

This also stood out to me as a truly insane quote. He's almost but not quite saying "we have raised awareness that this bad thing can happen by doing the bad thing"

Chi Nguyen172

From my point of view, of course profit maximizing companies will…maximize profit. It never was even imaginable that these kinds of entities could shoulder such a huge risk responsibly.

Correct me if I'm wrong but isn't Conjecture legally a company? Maybe their profit model isn't actually foundation models? Not actually trying to imply things, just thought the wording was weird in that context and was wondering whether Conjecture has a different legal structure than I thought.

It’s a funny comment because legally Conjecture is for-profit and OpenAI is not. It just goes to show that the various internal and external pressures and incentives on an organization and its staff are not encapsulated by glancing at their legal status—see also my comment here.

Anyway, I don’t think Connor is being disingenuous in this particular comment, because he has always been an outspoken advocate for government regulation of all AGI-related companies including his own.

I don’t think it’s crazy or disingenuous in general to say “This is a terrible sys... (read more)

minus Cullen O’Keefe who worked on policy and legal (so was not a clear cut case of working on safety),

 

I think Cullen was on the same team as Daniel (might be misremembering), so if you count Daniel, I'd also count Cullen. (Unless you wanna count Daniel because he previously was more directly part of technical AI safety research at OAI.)

The "entity giving the payout" in practice for ECL would be just the world states you end up in and requires you to care about the environment of the person you're playing the PD with.

So, defecting might be just optimising my local environment for my own values and cooperating would be optimising my local environment for some aggregate of my own values and the values of the person I'm playing with. So, it only works if there are positive-sum aggregates and if each player cares about what the other does to their local environment.

Answer by Chi Nguyen40

I watched and read a ton of Lab Muffin Beauty Science when I got into skincare. Apart from Sunscreen, I think a lot of it is trial and error with what has good short-term effects. I'm not sure about long-term effects at all tbh. Lab Muffin Beauty Science is helpful for figuring out your skin type, leads for which products to try first, and how to use them. (There's a fair number of products you wanna ramp up slowly and even by the end only use on some days.)

Chi Nguyen163

Are there types of published alignment research that you think were (more likely to be) good to publish? If so, I'd be curious to see a list.

Morphism105

Some off the top of my head:

  • Outer Alignment Research (e.g. analytic moral philosophy in an attempt to extrapolate CEV) seems to be totally useless to capabilities, so we should almost definitely publish that.
  • Evals for Governance? Not sure about this since a lot of eval research helps capabilities, but if it leads to regulation that lengthens timelines, it could be net positive.

Edit: oops i didn't see tammy's comment

9Tamsin Leake
I think research that is mostly about outer alignment (what to point the AI to) rather than inner alignment (how to point the AI to it) tends to be good — quantilizers, corrigibility, QACI, decision theory, embedded agency, indirect normativity, infra bayesianism, things like that. Though I could see some of those backfiring the way RLHF did — in the hands of a very irresponsible org, even not very capabilities-related research can be used to accelerate timelines and increase race dynamics if the org doing it thinks it can get a quick buck out of it.

Agree-vote: I generally tend to choose work over sleep when I feel particularly inspired to work.

Disagree-vote: I generally tend to choose to sleep over work when even when I feel particularly inspired to work.

Any other reaction, new answer or comment, or no reaction of any kind: Neither of the two descriptions above fit.

I considered making four options to capture the dimension of whether you endorse your behaviour or not but decided against it. Feel free to supplement this information.

Interesting. The main thing that pops out for me is that it feels like your story is descriptive while we try to be normative? I.e. it's not clear to me from what you say whether you would recommend to humans to act in this cooperative way towards distant aliens, but you seem to expect that they will do/are doing so. Meanwhile, I would claim that we should act cooperatively in this way but make no claims about whether humans actually do so.

Does that seem right to you or am I misunderstanding your point?

1Dan.Oblinger
Chi, I think that is correct. My arguments attempts to provide a descriptive explanation of why all evolved intelligence do have a tendency towards ECL, but it provide no basis to argue such intelligence should have such a tendency in a normative sense.   Still somehow as an individual (with such tendencies), I find the idea that other distant intelligence will also have a tendency towards ECL does provide some personal motivation.  I don't feel like such a "sucker" if I spend energy on an activity like this, since I know others will to, and it is only "fair" that I contribute my share. Notice, I still have a suspicion that this way of thinking in myself is a product of my descriptive explanation.  But that does not diminish the personal motivation is provides me. In this end, this is still not really a normative explanation.  At best is could be a MOTIVATING explanation, for the normative behavior you are hoping for. ~ For me, however, the main reason I like such a descriptive explanation is that it feels like it could one day be proved true.  We could potentially verify that ECL follows from evolution as a statement about the inherent and objective nature of the universe.  Such objective statements are of great interest to me, as they feel like I am understanding a part of reality itself. Interesting topic!

Letting on-lookers know that I responded in this comment thread

I'm not sure I understand exactly what you're saying, so I'm just gonna write some vaguely related things to classic acausal trade + ECL:

 

I'm actually really confused about the exact relationship between "classic" prediction-based acausal trade and ECL. And I think I tend to think about them as less crisply different than others. I've tried to unconfuse myself about that for a few hours some months ago and just ended up with a mess of a document. Some intuitive way to differentiate them:

  • ECL leverages the correlation between you and the other agent "di
... (read more)

Thanks! I actually agree with a lot of what you say. Lack of excitement about existing intervention ideas is part of the reason why I'm not all in on this agenda at the moment. Although in part I'm just bottlenecked by lack of technical expertise (and it's not like people had great ideas for how to align AIs at the beginning of the field...), so I don't want people to overupdate from "Chi doesn't have great ideas."

With that out of the way, here are some of my thoughts:

  • We can try to prevent silly path-dependencies in (controlled or uncontrolled i.e. misalig
... (read more)

Yeah, you're right that we assume that you care about what's going on outside the lightcone! If that's not the case (or only a little bit the case), that would limit the action-relevance of ECL.

(That said, there might be some weird simulations-shenanigans or cooperating with future earth-AI that would still make you care about ECL to some extent although my best guess is that they shouldn't move you too much. This is not really my focus though and I haven't properly thought through ECL for people with indexical values.)

Whoa, I didn't know about this survey, pretty cool! Interesting results overall.

It's notable that 6% of people also report they'd prefer absolute certainty of hell over not existing, which seems totally insane from the point of view of my preferences. The 11% that prefer a trillion miserable sentient beings over a million happy sentient beings also seems wild to me. (Those two questions are also relatively more correlated than the other questions.)

Thanks, I hadn't actually heard of this one before!

edit: Any takes on addictiveness/other potential side effects so far?

1Dr. Birdbrain
a google search suggests desoxyn might be just be a brand of pharmaceutical-grade meth
2Ben Pace
Not noticed anything. Once I got quite sad in the evening as I came down from it.

First of all:  Thanks for asking. I was being lazy with this and your questions forced me to come up with a response which forced me to actually think about my plan.

Concrete changes

1) I'm currently doing week-daily in-person Pomodoro co-working with a friend, but I had planned that before this post IIRC, and definitely know for a while that that's a huge boost for me.

In-person co-working and the type of work I do seem somewhat situational/hard to sustain/hard to quickly change sometimes. For some reason, (perhaps because I feel a bit meh about virtual... (read more)

1Tristan Williams
Thanks for such an in depth and wonderful response, I have a couple of questions. On 1. Perhaps the biggest reason I've stayed away from Pomodors is the question of how much time for breaks you can take before you need to start logging it as a reduction in time worked. Where have you come out on that debate? I.e. maybe you've found increased productivity makes the breaks totally worth it and this hasn't really been an issue for you. On 3. How are you strict with your weekends? The vibe I get from the rest is that normally you make sure what you're doing is restful? On 3.5. Adding to the anecdata, I keep a fairly sporadic schedule that often extends past normal hours, and I've found that it works pretty well for me. I do find that when I'm feeling a bit down that switching back to normal hours is better for me though, because I'm apt to start playing video games in the middle of the day because I think "ah, I'm remote and have a flexible schedule, so I can do what I want!" when in reality playing video games during the day is usually just me doing a poor job of dealing with something that then ends up not resolved later and leaves me in a tricky spot to get work done.  On 4, I'd love to hear more about your targets: are they like just more concrete than goals? Do you have some sort of accountability system that you keep yourself from overriding? I think I'm coming to realize I work better with deadlines, but I'm still really unsure how to implement them in a way that forces me to stick to them but also that allows me to override it in circumstances where I'd be better off if I could push something back a bit.
5Firinn
I'm surprised you decided not to prioritise exercise! I realised reading this comment that when I ask myself, "How have I become more hardworking?" I don't think about exercise at all. But if I asked myself the mirror question - "How have I become less hardworking?" - I think about the time when I accidentally stopped exercising (because I moved further away from a dojo and couldn't handle the public transit - and then, some years later, because of confining myself to my apartment during the pandemic) and it was basically like taking a sledgehammer to my mental health. I can't recommend exercise strongly enough; it helps with sleep, mood, motivation, energy, everything. (Not everyone experiences this, but enough people do that it seems very much worth trying!)

Thank you. This answer was both insightful and felt like a warm hug somehow.

3Firinn
it appears there is no heart react on LessWrong, which is sad because I want to give this comment a lil <3

Thanks for posting this! I really enjoyed the read.

 

Feedback on the accompanying poll: I was going to fill it out. Then saw that I have to look up and list the titles I can (not) relate to instead of just being able to click "(strongly) relate/don't relate" on a long list of titles. (I think the relevant function for this in forms is "Matrix" or something). And my reaction was "ugh, work". I think I might still fill it in but I'm muss less likely to. If others feel the same, maybe you wanna change the poll?

2Yulia
Thank you so much for the feedback! I think you're totally right. Here's the updated poll. 

I find this comment super interesting because

a) before, I would have expected many more people to be scared of being eaten by piranhas on LessWrong and not the EA Forum than vice versa. In fact, I didn't even consider that people could find the EA Forum more scary than LessWrong. (well, before FTX anyway)

b) my current read of the EA Forum (and this has been the case for a while) is that forum people like when you say something like "People should value things other than impact (more)" and that you're more likely to be eaten by piranhas for saying "People s... (read more)

4Duncan Sabien (Deactivated)
The specific things you said about the EA forum seem true but it also seems to me to be a hellscape of vicious social punishment and conformity and suspicion. The existence of a number of people pushing back against that doesn't quite suffice for feelings of safety, at least according to my own intuitions.

edit: We're sorted :)

 

Hello, I'm Chi, the friend, in case you wanna check out my LessWrong, although my EA forum account probably says more. Also, £50 referral bonus if you refer a person we end up moving in with!

Also, we don't really know whether the Warren Street place will work out but are looking for flatmates either way. Potential other accommodation would likely be in N1, NW1, W1, or WC1

Hi, thanks for this comment and the links.

I agree that it's a pretty vast topic. I agree that the questions are personalized in the sense that there are many different personal factors to this question, although the bullets I listed weren't actually really personalized to me. One hope I had with posting to LessWrong was that I trust people here to be able to do some of the "what's most relevant to include" thinking, (e.g.: everything that affects ≥10% of women between 20 and 40 + everything that's of more interest on LessWrong than elsewhere (e.g. irrevers... (read more)

Sorry for replying so late! I was quite busy this week.

  • I initially wanted to commission someone and expected that I'd have to pay 4 digits. Someone suggested I put down a bounty. I'm not familiar with putting bounties on things and I wanted to avoid getting myself in a situation where I feel like I have to pay the full amount for
    • work that's poor
    • work that's decent but much less detailed than I had envisioned
    • multiple reports each
  • I think I'm happy to pay the full amount for a report that is
    • transparent in its reasoning, so I can trust it,
    • tells me how much to t
... (read more)

Thanks! I felt kind of sheepish about making a top-level post/question out of this but will do so now. Feel free to delete my comment here if you think that makes sense.

I would like if there was a well-researched LessWrong post on the pros and cons of different contraceptives. - Same deal with a good post on how to treat or prevent urinary tract infection, although I'm less excited about that.

  • I'd be willing to pay some from my private money for this to get done. Maybe up to £1000? Open to considering higher amounts.
  • It would mostly be a public service as I'm kind of fine with my current contraception. So, I'm also looking for people to chip in (either to offer more money or just to take some of the monetary burden off me!)
... (read more)
5habryka
I think this would be really valuable and would be happy to pay $500 to a post that is good here.
5Ruby
This is a public service. I think you could write this up as a post/question for more visibility.

Thanks! I already mention this in the post, but just wanted to clarify that Paul only read the first third/half (wherever his last comment is) in case people missed that and mistakenly take the second half at face value.

Edit: Just went back to the post and noticed I don't really clearly say that.

Hey, thanks for the question! And I'm glad you liked the part about AGZ. (I also found this video by Robert Miles extremely helpful and accessible to understand AGZ)

 

This seems speculative. How do you know that a hypothetical infinite HCH tree does not depend the capabilities of the human?

Hm, I wouldn't say that it doesn't depend on the capabilities of the human. I think it does, but it depends on the type of reasoning they employ and not e.g. their working memory (to the extent that the general hypothesis of factored cognition holds that we can succe... (read more)

Chi NguyenΩ120

Thanks for the comment and I'm glad you like the post :)

On the other topic: I'm sorry, I'm afraid I can't be very helpful here. I'd be somewhat surprised if I'd have had a good answer to this a year ago and certainly don't have one now.

Some cop-out answers:

  • I often found reading his (discussions with others in) comments/remarks about corrigibility in posts focused on something else more useful to find out if his thinking changed on this than his blog posts that were obviously concentrating on corrigibility
  • You might have some luck reading through some of his
... (read more)
1algon33
Fair enough. Thanks for the recommendations. :)

Copied from my comment on this from the EA forum:

Yeah, that's a bit confusing. I think technically, yes, IDA is iterated distillation and amplification and that Iterated Amplification is just IA. However, IIRC many people referred to Paul Christiano's research agenda as IDA even though his sequence is called Iterated amplification, so I stuck to the abbreviation that I saw more often while also sticking to the 'official' name. (I also buried a comment on this in footnote 6)

I think lately, I've mostly seen people refer to the agenda and ideas as Iterated Am

... (read more)
3ESRogs
Ah that sounds reasonable. As a stickler about abbreviations, I'll just add that if you do want to stick with "IDA" and "Iterated Amplification" then it seems a little odd to use the phrase "stands for" to connect the two. (Since the D in "IDA" does not stand for anything in "Iterated Amplification".)   EDIT: My inner pedant would be satisfied if this was simply worded as something like, "IDA stands for Iterated Distillation and Amplification, which we will refer to as Iterated Amplification for short." (I'll leave it to you to decide whether my inner pedant is your target audience. ;-) )