All of Quadratic Reciprocity's Comments + Replies

Cullen O'Keefe also no longer at OpenAI (as of last month)

From the comment thread:


I'm not a fan of *generic* regulation-boosting. Like, if I just had a megaphone to shout to the world, "More regulation of AI!" I would not use it. I want to do more targeted advocacy of regulation that I think is more likely to be good and less likely to result in regulatory-capture

What are specific regulations / existing proposals that you think are likely to be good? When people are protesting to pause AI, what do you want them to be speaking into a megaphone (if you think those kinds of protests could be helpful at all right now)? 

Reporting requirements, especially requirements to report to the public what your internal system capabilities are, so that it's impossible to have a secret AGI project. Also reporting requirements of the form "write a document explaining what capabilities, goals/values, constraints, etc. your AIs are supposed to have, and justifying those claims, and submit it to public scrutiny. So e.g. if your argument is 'we RLHF'd it to have those goals and constraints, and that probably works because there's No Evidence of deceptive alignment or other speculative fai... (read more)

This is so much fun! I wish I could download them!

6habryka
You can! Currently just one by one in the desktop audio player. Press this button: 

I thought I didn’t get angry much in response to people making specific claims. I did some introspection about times in the recent past when I got angry, defensive, or withdrew from a conversation in response to claims that the other person made. 

After some introspection, I think these are the mechanisms that made me feel that way:

  • They were very confident about their claim. Partly I felt annoyance because I didn’t feel like there was anything that would change their mind, partly I felt annoyance because it felt like they didn’t have enough status to m
... (read more)
1sweenesm
Thanks for the post. I don't know if you saw this one: "Thank you for triggering me", but it might be of interest. Cheers!

Advice of this specific form has been has been helpful for me in the past. Sometimes I don't notice immediately when the actions I'm taking are not ones I would endorse after a bit of thinking (particularly when they're fun and good for me in the short-term but bad for others or for me longer-term). This is also why having rules to follow for myself is helpful (eg: never lying or breaking promises) 

2the gears to ascension
hmm, fair. I guess it does help if the person is doing something bad by accident, rather than because they intend to. just, don't underestimate how often the latter happens either, or something. or overestimate it, would be your point in reply, I suppose!

women more often these days choose not to make this easy, ramping up the fear and cost of rejection by choosing to deliberately inflict social or emotional costs as part of the rejection

I'm curious about how common this is, and what sort of social or emotional costs are being referred to. 

Sure feels like it would be a tiny minority of women doing it but maybe I'm underestimating how often men experience something like this. 

My goals for money, social status, and even how much I care about my family don't seem all that stable and have changed a bunch over time. They seem to be arising from some deeper combination of desires to be accepted, to have security, to feel good about myself, to avoid effortful work etc. interacting with my environment. Yet I wouldn't think of myself as primarily pursuing those deeper desires, and during various periods would have self-modified if given the option to more aggressively pursue the goals that I (the "I" that was steering things) thought I cared about (like doing really well at a specific skill, which turned out to be a fleeting goal with time).   

3quetzal_rainbow
What about things like fun, happiness, eudamonia, meaning? I certainly think that excluding brain damage/very advanced brainwashing, you are not going to eat babies or turn planets into paperclips.

Current AI safety university groups are overall a good idea and helpful, in expectation, for reducing AI existential risk 

Reply2874

Things will basically be fine regarding job loss and unemployment due to AI in the next several years and those worries are overstated 

Reply3329152

It is very unlikely AI causes an existential catastrophe (Bostrom or Ord definition) but doesn't result in human extinction. (That is, non-extinction AI x-risk scenarios are unlikely)

Reply393510

EAs and rationalists should strongly consider having lots more children than they currently are

Reply4842102

In my head, I've sort of just been simplifying to two ways the future could go: human extinction within a relatively short time period after powerful AI is developed or a pretty good utopian world. The non-extinction outcomes are not ones I worry about at the moment, though I'm very curious about how things will play out. I'm very excited about the future conditional on us figuring out how to align AI. 

I'm curious about, for people who think similarly to Katja, what kind of story are you imagining that leads to that? Does the story involve authoritari... (read more)

Topics I would be excited to have a dialogue about [will add to this list as I think of more]:

  • I want to talk to someone who thinks p(human extinction | superhuman AGI developed in next 50 years) < 50% and understand why they think that 
  • I want to talk to someone who thinks the probability of existential risk from AI is much higher than the probability of human extinction due to AI (ie most x-risk from AI isn't scenarios where all humans end up dead soon after)
  • I want to talk to someone who has thoughts on university AI safety groups (are they harmful
... (read more)

I attended an AI pause protest recently and thought I’d write up what my experience was like for people considering going to future ones. 

I hadn’t been to a protest ever before and didn’t know what to expect. I will probably attend more in the future.

Some things that happened:

  • There were about 20ish people protesting. I arrived a bit after the protest had begun and it was very easy and quick to get oriented. It wasn’t awkward at all (and I’m normally pretty socially anxious and awkward). The organisers had flyers printed out to give away and there were
... (read more)

Are there specific non-obvious prompts or custom instructions you use for this that you've found helpful? 

Answer by Quadratic Reciprocity47

There are physical paperback copies of the first two books in Rationality A-Z: Map and Territory and How to Actually Change Your Mind. They show up on Amazon for me. 

E.g. I know of people who are interviewing for Anthropic capability teams because idk man, they just want a safety-adjacent job with a minimal amount of security, and it's what's available

That feels concerning. Are there any obvious things that would help with this situation, eg: better career planning and reflection resources for people in this situation, AI safety folks being more clear about what they see as the value/disvalue of working in those types of capability roles? 

Seems weird for someone to explicitly want a "safety-adjacent" job unless there are weird social dynamics encouraging people to do that even when there isn't positive impact to be had from such a job. 

Most people still have the Bostromiam “paperclipping” analogy for AI risk in their head. In this story, we give the AI some utility function, and the problem is that the AI will naively optimize the utility function (in the Bostromiam example, a company wanting to make more paperclips results in an AI turning the entire world into a paperclip factory).

That is how Bostrom brought up the paperclipping example in Superintelligence but my impression was that the paperclipping example originally conceived by Eliezer prior to the Superintelligence book was NOT a... (read more)

Visiting London and kinda surprised by how there isn't much of a rationality community there relative to the bay area (despite there being enough people in the city who read LessWrong, are aware of the online community, etc.?) Especially because the EA community seems pretty active there. The rationality meetups that do happen seem to have a different vibe. In the bay, it is easy to just get invited to interesting rationalist-adjacent events every week by just showing up. Not so in London. 

Not sure how much credit to give to each of these explanations... (read more)

see the current plan here EAG 2023 Bay Area The current alignment plan, and how we might improve it

Link to talk above doesn't seem to work for me.

Outside view: The proportion of junior researchers doing interp rather than other technical work is too high

Quite tangential[1] to your post but if true, I'm curious about what this suggests about the dynamics of field-building in AI safety.

Seems to me like certain organisations and individuals have an outsized influence in funneling new entrants into specific areas, and because the field is small (and ... (read more)

Other podcasts that have at least some relevant episodes: Hear This Idea, Towards Data Science, The Lunar Society, The Inside View, Machine Learning Street Talk

Here are some Twitter accounts I've found useful to follow (in no particular order): Quintin Pope, Janus @repligate, Neel Nanda, Chris Olah, Jack Clark, Yo Shavit @yonashav, Oliver Habryka, Eliezer Yudkowsky, alex lawsen, David Krueger, Stella Rose Biderman, Michael Nielsen, Ajeya Cotra, Joshua Achiam, Séb Krier, Ian Hogarth, Alex Turner, Nora Belrose, Dan Hendrycks, Daniel Paleka, Lauro Langosco, Epoch AI Research, davidad, Zvi Mowshowitz, Rob Miles

If some of the project ideas are smaller, is it easier for you to handle if they're added on to just one larger application as extras that might be worth additional funding?

Is your "alignment research experiments I wish someone would run" list shareable :)

Paul Graham's essay on What You Can't Say is very practical. The tests/exercises he recommends for learning true, controversial things were useful to me.

Even if trying the following tests yields statements that aren't immediately useful, I think the act of noticing where you disagree with someone or something more powerful is good practice. I think similar mental muscles get used when noticing when you disagree or are confused about a commonly-held assumption in a research field or when noticing important ideas that others are neglecting.

The different exer... (read more)

Is there an organisation that can hire independent alignment researchers who already have funding, in order to help with visas for a place that has other researchers, perhaps somewhere in the UK? Is there a need for such an organisation? 

2Milli | Martin
EA Germany has an "Employer of Record" program. Your funding gets put into their account, and they pay your salary from it, formally becoming your employer. This is probably what you want to google or mention to an organization in the UK. :) Details (EAD): https://docs.google.com/document/d/1EePELRNTrZGHgeJa3oeRdF_rDsN7LesYppQah_zE7g4

What are the most promising plans for automating alignment research as mentioned in for example OpenAI's approach to alignment and by others?

3JakubK
The cyborgism post might be relevant:

I think there will probably be even more discussion of AI x-risk in the media in the near future. My own media consumption is quite filtered but for example, the last time I was in an Uber, the news channel on the radio mentioned Geoffrey Hinton thinking AI might kill us all. And it isn't a distant problem for my parents the way climate change is because they use Chat-GPT and are both impressed and concerned by it. They'll probably form thoughts on it anyway, and I'd prefer if I can be around to respond to their confusion and concerns. 

It also seems p... (read more)

I think it's plausible that too much effort is going to interp at the margin

What's the counterfactual? Do you think newer people interested in AI safety should be doing other things instead of for example attempting one of the 200+ MI problems suggested by Neel Nanda? What other things?

I'm curious about whether I should change my shortform posting behaviour in response to higher site quality standards. I currently perceive it to be an alright place to post things that are quick and not aiming to be well-written or particularly useful for others to read because it doesn't clutter up the website the way a post or comment on other people's posts would. 

5Ruby
I'm still interested in Shortform being the place for established users to freely post their rough thoughts. I'd say you're good and shouldn't worry. The rough thoughts of a person who has the site culture, knows all the basics, has a decent chance of having some valuable such that it's good for people to post them. On on the other hand, some people's high effort stuff is not very interesting, e.g. from new users. I wouldn't want Shortform to become super low SNR because we put low quality users indiscriminately, and by extension, have that content all over the site (Recent Discussion, All Posts, etc) which it would be right now unless we changed some things. [Raemon for a while was thinking to send users not quite up to par to Shortform, but I've told him I don't endorse that use case currently. We'll have to think about it more/perhaps change various parts of the site, e.g. what goes in Recent Discussion, before endorsing that.]

Why is aliens wanting to put us in a zoo more plausible than the AI wanting to put us in a zoo itself? 

Edit: Ah, there are more aliens around so even if the average alien doesn't care about us, it's plausible that some of them would?

6MinusGix
https://www.lesswrong.com/posts/HoQ5Rp7Gs6rebusNP/superintelligent-ai-is-necessary-for-an-amazing-future-but-1#How_likely_are_extremely_good_and_extremely_bad_outcomes_

And the biggest question for me is not, is AI going to doom the world? Can I work on this in order to save the world? A lot of people expect that would be the question. That’s not at all the question. The question for me is, is there a concrete problem that I can make progress on? Because in science, it’s not sufficient for a problem to be enormously important. It has to be tractable. There has to be a way to make progress. And this was why I kept it at arm’s length for as long as I did.

I thought this was interesting. But it does feel like with this AI thi... (read more)

One way people can help is by stating their beliefs on AI and the confidence in those beliefs to their friends, family members, and acquaintances who they talk to.

Currently, a bunch of people are coming across things in the news talking about humanity going extinct if AI progress continues as it has and no more alignment research happens. I would expect many of them to not think seriously about it because it's really hard to shake out of the "business as usual" frame. Most of your friends and family members probably know you're a reasonable, thoughtful per... (read more)

1[anonymous]
Do you think it's worth doing it if you will cause them distress? I find that hard to decide
4Vladimir_Nesov
I wasn't expecting the development endgame to be much different, though it's a bit early. At least it's LLMs and not Atari-playing RL agents. Also, I'm much less certain about inevitability of boundary-norm-ignoring optimizers now, in a world that's not too dog eat dog at the top. This makes precise value targeting less crucial for mere survival, though most of the Future is still lost without it. So the news is good. I'm personally down to 70% probability of extinction, mostly first AGIs failing to prevent the world from getting destroyed by their research output, since it isn't looking like they are going to be superintelligent out of the box. I'm no longer expecting the first AGIs to intentionally destroy the world, unless users are allowed to explicitly and successfully wish for it to be destroyed, which bizarrely seems like a significant portion of the risk.

Hopefully this isn't too rude to say, but: I am indeed confused how you could be confused

Fwiw, I was also confused and your comment makes a lot more sense now. I think it's just difficult to convert text into meaning sometimes. 

9TekhneMakre
Ok, thanks for the data, updating some.

Thanks for posting this. It's insightful reading other people thinking through career/life planning of this type.

Am curious about how you feel about the general state of the alignment community going into the midgame. Are there things you hoped you/alignment community had more of / achievable things that could have been different by the time the early game ended that would have been nice?

"I have a crazy take that the kind of reasoning that is done in generative modeling has a bunch of things in common with the kind of reasoning that is valuable when developing algorithms for AI alignment"

Cool!!

Wow, the quoted text feels scary to read. 

I have met people within effective altruism who seem to be trying to do scary, dark things to their beliefs/motivations, which feels in the same category, like trying to convince themselves they don't care about anything besides maximising impact or reducing x-risk. The latter, in at least one case, by thinking lots about dying due to AI to start caring about it more, which can't be good for thinking clearly in the way they described it. 

From Ray Kurzweil's predictions for 2019 (written in 1999):

On Politics and Society

People are beginning to have relationships with automated personalities as companions, teachers, caretakers, and lovers. Automated personalities are superior to humans in some ways, such as having very reliable memories and, if desired, predictable (and programmable) personalities. They are not yet regarded as equal to humans in the subtlety of their personalities, although there is disagreement on this point. 

An undercurrent of concern is developing with regard to the i

... (read more)

There are too many books I want to read but probably won't get around to reading any time soon. I'm more likely to read them if there's someone else who's also reading it at a similar pace and I can talk to them about the book. If anyone's interested in going through any of the following books in June and discussing it together, message me. We can decide on the format later, it could just be reading the book and collaborating on a blog post about it together, or for more textbook-like things, reading a couple of selected chapters a week and going over the ... (read more)

I don't remember if I put down "inside view" on the form when filling it out but that does sound like the type of thing I may have done. I think I might have been overly eager at the time to say I had an "inside view" when what I really had was: confusion and disagreements with others' methods for forecasting, weighing others' forecasts in a mostly non-principled way, intuitions about AI progress that were maybe overly strong and as much or more based on hanging around a group of people and picking up their beliefs instead of evaluating evidence for myself... (read more)

How do we get LLM human imitations? 

2Vladimir_Nesov
I meant the same thing as masks/simulacra. Though currently I'm more bullish about the shoggoths, because masks probably fail alignment security, even though their alignment might be quite robust despite the eldritch substrate.

The answers I got for your examples using ChatGPT-4:

Q: Could you get drunk from drinking a drunk person's blood?

I am not a medical professional, but I can provide some general information on the topic. It is highly unlikely that you would get drunk from drinking a drunk person's blood. When a person consumes alcohol, it is absorbed into their bloodstream, and their blood alcohol content (BAC) rises. However, the concentration of alcohol in their blood is still relatively low compared to the amount you would need to consume to feel intoxicated.

Drinking some

... (read more)

Reflections on bay area visit

GPT-4 generated TL;DR (mostly endorsed but eh):

  1. The beliefs of prominent AI safety researchers may not be as well-founded as expected, and people should be cautious about taking their beliefs too seriously.
  2. There is a tendency for people to overestimate their own knowledge and confidence in their expertise.
  3. Social status plays a significant role in the community, with some individuals treated like "popular kids."
  4. Important decisions are often made in casual social settings, such as lunches and parties.
  5. Geographical separation of com
... (read more)

The really cool bit was when he had a very quick mockup of a web app drawn on a piece of paper and uploaded a photo of it and GPT-4 then used just that to write the HTML and JavaScript for the app based on the drawing. 

I would be appreciative if you do end up writing such a post.

Sad that sometimes the things that seem good for creating a better, more honest, more accountable community for the people in it also give outsiders ammunition. My intuitions point strongly in the direction of doing things in this category anyway. 

I can see how the article might be frustrating for people who know the additional context that the article leaves out (where some of the additional context is simply having been in this community for a long time and having more insight into how it deals with abuse). From the outside though, it does feel like some factors would make abuse more likely in this community: how salient "status" feels, mixing of social and professional lives, gender ratios, conflicts of interests everywhere due to the community being small, sex positivity and acceptance of weirdn... (read more)

habryka2728

Yeah, I might want to write a post that tries to actually outline the history of abuse that I am aware of, without doing weird rhetorical tricks or omitting information. I've recently been on a bit of a "let's just put everything out there in public" spree, and I would definitely much prefer for new people to be able to get an accurate sense of the risk of abuse and harm, which, to be clear, is definitely not zero and feels substantial enough that people should care about it.

I do think the primary reason why people haven't written up stuff in the past is e... (read more)

This was a somewhat emotional read for me.

When I was between the ages of 11-14, I remember being pretty intensely curious about lots of stuff. I learned a bunch of programming and took online courses on special relativity, songwriting, computer science, and lots of other things. I liked thinking about maths puzzles that were a bit too difficult for me to solve. I had weird and wild takes on things I learned in history class that I wanted to share with others. I liked looking at ants and doing experiments on their behaviour. 

And then I started to feel ... (read more)

4TsviBT
<3

I agree that these are pretty malleable. For example, about ~1 year ago, I was probably two standard deviations less relentless and motivated in research topics, and probably a standard deviation on hustle/resourcefulness. 

Interesting! Would be very curious to hear if there were specific things you think caused the change. 

Fairs. I am also liking the concept of "sanity" and notice people use that word more now. To me, it points at some of the psychological stuff and also the vibe in the What should you change in response to an "emergency"? And AI risk post. 

I like "improving log odds of survival" as a handle. I don't like catchy concept names in this domain because they catch on more than understanding of the concept they refer to. 

2Raemon
I... don't know whether it actually accomplishes the right things but I am embarrassed at how long it took "improve log-odds of survival" to get promoted to my hypothesis space as the best handle for the concept of "improving log odds of survival [+ psychological stuff]"

I thought it was interesting when Oli said that there are so many good ideas in mechanism design and that the central bottleneck of mechanism design is that nobody understands UI design to take advantage of them. Would be very interested if other folks have takes or links to good mechanism design ideas that are neglected/haven't been properly tried enough or people/blogs that talk about stuff like that. 

Load More