Looking for help with an acausal safety project. If you’re interested or know someone who might be, it would be really great if you let me know/share
I don't trust Ilya Sutskever to be the final arbiter of whether a Superintelligent AI design is safe and aligned. We shouldn't trust any individual,
I'm not sure how I feel about the whole idea of this endeavour in the abstract - but as someone who doesn't know Ilya Sutskever and only followed the public stuff, I'm pretty worried that he in particular runs it if decision-making is on the "by an individual" level and even if not. Running this safely will likely require lots of moral integrity and courage. The board drama made it look to me like Ilya disquali...
Greg Brockman and Sam Altman (cosigned):
[...]
First, we have raised awareness of the risks and opportunities of AGI so that the world can better prepare for it. We’ve repeatedly demonstrated the incredible possibilities from scaling up deep learning
chokes on coffee
This also stood out to me as a truly insane quote. He's almost but not quite saying "we have raised awareness that this bad thing can happen by doing the bad thing"
From my point of view, of course profit maximizing companies will…maximize profit. It never was even imaginable that these kinds of entities could shoulder such a huge risk responsibly.
Correct me if I'm wrong but isn't Conjecture legally a company? Maybe their profit model isn't actually foundation models? Not actually trying to imply things, just thought the wording was weird in that context and was wondering whether Conjecture has a different legal structure than I thought.
It’s a funny comment because legally Conjecture is for-profit and OpenAI is not. It just goes to show that the various internal and external pressures and incentives on an organization and its staff are not encapsulated by glancing at their legal status—see also my comment here.
Anyway, I don’t think Connor is being disingenuous in this particular comment, because he has always been an outspoken advocate for government regulation of all AGI-related companies including his own.
I don’t think it’s crazy or disingenuous in general to say “This is a terrible sys...
minus Cullen O’Keefe who worked on policy and legal (so was not a clear cut case of working on safety),
I think Cullen was on the same team as Daniel (might be misremembering), so if you count Daniel, I'd also count Cullen. (Unless you wanna count Daniel because he previously was more directly part of technical AI safety research at OAI.)
Yes! Edited the main text to make it clear
The "entity giving the payout" in practice for ECL would be just the world states you end up in and requires you to care about the environment of the person you're playing the PD with.
So, defecting might be just optimising my local environment for my own values and cooperating would be optimising my local environment for some aggregate of my own values and the values of the person I'm playing with. So, it only works if there are positive-sum aggregates and if each player cares about what the other does to their local environment.
I watched and read a ton of Lab Muffin Beauty Science when I got into skincare. Apart from Sunscreen, I think a lot of it is trial and error with what has good short-term effects. I'm not sure about long-term effects at all tbh. Lab Muffin Beauty Science is helpful for figuring out your skin type, leads for which products to try first, and how to use them. (There's a fair number of products you wanna ramp up slowly and even by the end only use on some days.)
Are there types of published alignment research that you think were (more likely to be) good to publish? If so, I'd be curious to see a list.
Some off the top of my head:
Edit: oops i didn't see tammy's comment
Agree-vote: I generally tend to choose work over sleep when I feel particularly inspired to work.
Disagree-vote: I generally tend to choose to sleep over work when even when I feel particularly inspired to work.
Any other reaction, new answer or comment, or no reaction of any kind: Neither of the two descriptions above fit.
I considered making four options to capture the dimension of whether you endorse your behaviour or not but decided against it. Feel free to supplement this information.
Interesting. The main thing that pops out for me is that it feels like your story is descriptive while we try to be normative? I.e. it's not clear to me from what you say whether you would recommend to humans to act in this cooperative way towards distant aliens, but you seem to expect that they will do/are doing so. Meanwhile, I would claim that we should act cooperatively in this way but make no claims about whether humans actually do so.
Does that seem right to you or am I misunderstanding your point?
I'm not sure I understand exactly what you're saying, so I'm just gonna write some vaguely related things to classic acausal trade + ECL:
I'm actually really confused about the exact relationship between "classic" prediction-based acausal trade and ECL. And I think I tend to think about them as less crisply different than others. I've tried to unconfuse myself about that for a few hours some months ago and just ended up with a mess of a document. Some intuitive way to differentiate them:
Thanks! I actually agree with a lot of what you say. Lack of excitement about existing intervention ideas is part of the reason why I'm not all in on this agenda at the moment. Although in part I'm just bottlenecked by lack of technical expertise (and it's not like people had great ideas for how to align AIs at the beginning of the field...), so I don't want people to overupdate from "Chi doesn't have great ideas."
With that out of the way, here are some of my thoughts:
Yeah, you're right that we assume that you care about what's going on outside the lightcone! If that's not the case (or only a little bit the case), that would limit the action-relevance of ECL.
(That said, there might be some weird simulations-shenanigans or cooperating with future earth-AI that would still make you care about ECL to some extent although my best guess is that they shouldn't move you too much. This is not really my focus though and I haven't properly thought through ECL for people with indexical values.)
Whoa, I didn't know about this survey, pretty cool! Interesting results overall.
It's notable that 6% of people also report they'd prefer absolute certainty of hell over not existing, which seems totally insane from the point of view of my preferences. The 11% that prefer a trillion miserable sentient beings over a million happy sentient beings also seems wild to me. (Those two questions are also relatively more correlated than the other questions.)
Thanks, I hadn't actually heard of this one before!
edit: Any takes on addictiveness/other potential side effects so far?
First of all: Thanks for asking. I was being lazy with this and your questions forced me to come up with a response which forced me to actually think about my plan.
Concrete changes
1) I'm currently doing week-daily in-person Pomodoro co-working with a friend, but I had planned that before this post IIRC, and definitely know for a while that that's a huge boost for me.
In-person co-working and the type of work I do seem somewhat situational/hard to sustain/hard to quickly change sometimes. For some reason, (perhaps because I feel a bit meh about virtual...
Thank you. This answer was both insightful and felt like a warm hug somehow.
Thanks for posting this! I really enjoyed the read.
Feedback on the accompanying poll: I was going to fill it out. Then saw that I have to look up and list the titles I can (not) relate to instead of just being able to click "(strongly) relate/don't relate" on a long list of titles. (I think the relevant function for this in forms is "Matrix" or something). And my reaction was "ugh, work". I think I might still fill it in but I'm muss less likely to. If others feel the same, maybe you wanna change the poll?
I find this comment super interesting because
a) before, I would have expected many more people to be scared of being eaten by piranhas on LessWrong and not the EA Forum than vice versa. In fact, I didn't even consider that people could find the EA Forum more scary than LessWrong. (well, before FTX anyway)
b) my current read of the EA Forum (and this has been the case for a while) is that forum people like when you say something like "People should value things other than impact (more)" and that you're more likely to be eaten by piranhas for saying "People s...
edit: We're sorted :)
Hello, I'm Chi, the friend, in case you wanna check out my LessWrong, although my EA forum account probably says more. Also, £50 referral bonus if you refer a person we end up moving in with!
Also, we don't really know whether the Warren Street place will work out but are looking for flatmates either way. Potential other accommodation would likely be in N1, NW1, W1, or WC1
Hi, thanks for this comment and the links.
I agree that it's a pretty vast topic. I agree that the questions are personalized in the sense that there are many different personal factors to this question, although the bullets I listed weren't actually really personalized to me. One hope I had with posting to LessWrong was that I trust people here to be able to do some of the "what's most relevant to include" thinking, (e.g.: everything that affects ≥10% of women between 20 and 40 + everything that's of more interest on LessWrong than elsewhere (e.g. irrevers...
Sorry for replying so late! I was quite busy this week.
Thanks! I felt kind of sheepish about making a top-level post/question out of this but will do so now. Feel free to delete my comment here if you think that makes sense.
I would like if there was a well-researched LessWrong post on the pros and cons of different contraceptives. - Same deal with a good post on how to treat or prevent urinary tract infection, although I'm less excited about that.
Thanks! I already mention this in the post, but just wanted to clarify that Paul only read the first third/half (wherever his last comment is) in case people missed that and mistakenly take the second half at face value.
Edit: Just went back to the post and noticed I don't really clearly say that.
Hey, thanks for the question! And I'm glad you liked the part about AGZ. (I also found this video by Robert Miles extremely helpful and accessible to understand AGZ)
This seems speculative. How do you know that a hypothetical infinite HCH tree does not depend the capabilities of the human?
Hm, I wouldn't say that it doesn't depend on the capabilities of the human. I think it does, but it depends on the type of reasoning they employ and not e.g. their working memory (to the extent that the general hypothesis of factored cognition holds that we can succe...
Thanks for the comment and I'm glad you like the post :)
On the other topic: I'm sorry, I'm afraid I can't be very helpful here. I'd be somewhat surprised if I'd have had a good answer to this a year ago and certainly don't have one now.
Some cop-out answers:
Copied from my comment on this from the EA forum:
...Yeah, that's a bit confusing. I think technically, yes, IDA is iterated distillation and amplification and that Iterated Amplification is just IA. However, IIRC many people referred to Paul Christiano's research agenda as IDA even though his sequence is called Iterated amplification, so I stuck to the abbreviation that I saw more often while also sticking to the 'official' name. (I also buried a comment on this in footnote 6)
I think lately, I've mostly seen people refer to the agenda and ideas as Iterated Am
My current guess is that occasional volunteers are totally fine! There's some onboarding cost but mostly, the cost on our side scales with the number of argument-critique pairs we get. Since the whole point is to have critiques of a large variety of quality, I don't expect the nth argument-critque pair we get to be much more useable than the 1st one. I might be wrong about this one and change my mind as we try this out with people though!
(Btw I didn't get a notification for your comment, so maybe better to dm if you're interested.)