From the comment thread:
I'm not a fan of *generic* regulation-boosting. Like, if I just had a megaphone to shout to the world, "More regulation of AI!" I would not use it. I want to do more targeted advocacy of regulation that I think is more likely to be good and less likely to result in regulatory-capture
What are specific regulations / existing proposals that you think are likely to be good? When people are protesting to pause AI, what do you want them to be speaking into a megaphone (if you think those kinds of protests could be helpful at all right now)?
Reporting requirements, especially requirements to report to the public what your internal system capabilities are, so that it's impossible to have a secret AGI project. Also reporting requirements of the form "write a document explaining what capabilities, goals/values, constraints, etc. your AIs are supposed to have, and justifying those claims, and submit it to public scrutiny. So e.g. if your argument is 'we RLHF'd it to have those goals and constraints, and that probably works because there's No Evidence of deceptive alignment or other speculative fai...
This is so much fun! I wish I could download them!
I thought I didn’t get angry much in response to people making specific claims. I did some introspection about times in the recent past when I got angry, defensive, or withdrew from a conversation in response to claims that the other person made.
After some introspection, I think these are the mechanisms that made me feel that way:
Advice of this specific form has been has been helpful for me in the past. Sometimes I don't notice immediately when the actions I'm taking are not ones I would endorse after a bit of thinking (particularly when they're fun and good for me in the short-term but bad for others or for me longer-term). This is also why having rules to follow for myself is helpful (eg: never lying or breaking promises)
women more often these days choose not to make this easy, ramping up the fear and cost of rejection by choosing to deliberately inflict social or emotional costs as part of the rejection
I'm curious about how common this is, and what sort of social or emotional costs are being referred to.
Sure feels like it would be a tiny minority of women doing it but maybe I'm underestimating how often men experience something like this.
My goals for money, social status, and even how much I care about my family don't seem all that stable and have changed a bunch over time. They seem to be arising from some deeper combination of desires to be accepted, to have security, to feel good about myself, to avoid effortful work etc. interacting with my environment. Yet I wouldn't think of myself as primarily pursuing those deeper desires, and during various periods would have self-modified if given the option to more aggressively pursue the goals that I (the "I" that was steering things) thought I cared about (like doing really well at a specific skill, which turned out to be a fleeting goal with time).
Current AI safety university groups are overall a good idea and helpful, in expectation, for reducing AI existential risk
Things will basically be fine regarding job loss and unemployment due to AI in the next several years and those worries are overstated
It is very unlikely AI causes an existential catastrophe (Bostrom or Ord definition) but doesn't result in human extinction. (That is, non-extinction AI x-risk scenarios are unlikely)
EAs and rationalists should strongly consider having lots more children than they currently are
In my head, I've sort of just been simplifying to two ways the future could go: human extinction within a relatively short time period after powerful AI is developed or a pretty good utopian world. The non-extinction outcomes are not ones I worry about at the moment, though I'm very curious about how things will play out. I'm very excited about the future conditional on us figuring out how to align AI.
I'm curious about, for people who think similarly to Katja, what kind of story are you imagining that leads to that? Does the story involve authoritari...
Topics I would be excited to have a dialogue about [will add to this list as I think of more]:
I attended an AI pause protest recently and thought I’d write up what my experience was like for people considering going to future ones.
I hadn’t been to a protest ever before and didn’t know what to expect. I will probably attend more in the future.
Some things that happened:
Are there specific non-obvious prompts or custom instructions you use for this that you've found helpful?
There are physical paperback copies of the first two books in Rationality A-Z: Map and Territory and How to Actually Change Your Mind. They show up on Amazon for me.
E.g. I know of people who are interviewing for Anthropic capability teams because idk man, they just want a safety-adjacent job with a minimal amount of security, and it's what's available
That feels concerning. Are there any obvious things that would help with this situation, eg: better career planning and reflection resources for people in this situation, AI safety folks being more clear about what they see as the value/disvalue of working in those types of capability roles?
Seems weird for someone to explicitly want a "safety-adjacent" job unless there are weird social dynamics encouraging people to do that even when there isn't positive impact to be had from such a job.
Most people still have the Bostromiam “paperclipping” analogy for AI risk in their head. In this story, we give the AI some utility function, and the problem is that the AI will naively optimize the utility function (in the Bostromiam example, a company wanting to make more paperclips results in an AI turning the entire world into a paperclip factory).
That is how Bostrom brought up the paperclipping example in Superintelligence but my impression was that the paperclipping example originally conceived by Eliezer prior to the Superintelligence book was NOT a...
Visiting London and kinda surprised by how there isn't much of a rationality community there relative to the bay area (despite there being enough people in the city who read LessWrong, are aware of the online community, etc.?) Especially because the EA community seems pretty active there. The rationality meetups that do happen seem to have a different vibe. In the bay, it is easy to just get invited to interesting rationalist-adjacent events every week by just showing up. Not so in London.
Not sure how much credit to give to each of these explanations...
see the current plan here EAG 2023 Bay Area The current alignment plan, and how we might improve it
Link to talk above doesn't seem to work for me.
Outside view: The proportion of junior researchers doing interp rather than other technical work is too high
Quite tangential[1] to your post but if true, I'm curious about what this suggests about the dynamics of field-building in AI safety.
Seems to me like certain organisations and individuals have an outsized influence in funneling new entrants into specific areas, and because the field is small (and ...
Other podcasts that have at least some relevant episodes: Hear This Idea, Towards Data Science, The Lunar Society, The Inside View, Machine Learning Street Talk
Here are some Twitter accounts I've found useful to follow (in no particular order): Quintin Pope, Janus @repligate, Neel Nanda, Chris Olah, Jack Clark, Yo Shavit @yonashav, Oliver Habryka, Eliezer Yudkowsky, alex lawsen, David Krueger, Stella Rose Biderman, Michael Nielsen, Ajeya Cotra, Joshua Achiam, Séb Krier, Ian Hogarth, Alex Turner, Nora Belrose, Dan Hendrycks, Daniel Paleka, Lauro Langosco, Epoch AI Research, davidad, Zvi Mowshowitz, Rob Miles
If some of the project ideas are smaller, is it easier for you to handle if they're added on to just one larger application as extras that might be worth additional funding?
Is your "alignment research experiments I wish someone would run" list shareable :)
Paul Graham's essay on What You Can't Say is very practical. The tests/exercises he recommends for learning true, controversial things were useful to me.
Even if trying the following tests yields statements that aren't immediately useful, I think the act of noticing where you disagree with someone or something more powerful is good practice. I think similar mental muscles get used when noticing when you disagree or are confused about a commonly-held assumption in a research field or when noticing important ideas that others are neglecting.
The different exer...
Is there an organisation that can hire independent alignment researchers who already have funding, in order to help with visas for a place that has other researchers, perhaps somewhere in the UK? Is there a need for such an organisation?
What are the most promising plans for automating alignment research as mentioned in for example OpenAI's approach to alignment and by others?
I think there will probably be even more discussion of AI x-risk in the media in the near future. My own media consumption is quite filtered but for example, the last time I was in an Uber, the news channel on the radio mentioned Geoffrey Hinton thinking AI might kill us all. And it isn't a distant problem for my parents the way climate change is because they use Chat-GPT and are both impressed and concerned by it. They'll probably form thoughts on it anyway, and I'd prefer if I can be around to respond to their confusion and concerns.
It also seems p...
I think it's plausible that too much effort is going to interp at the margin
What's the counterfactual? Do you think newer people interested in AI safety should be doing other things instead of for example attempting one of the 200+ MI problems suggested by Neel Nanda? What other things?
I'm curious about whether I should change my shortform posting behaviour in response to higher site quality standards. I currently perceive it to be an alright place to post things that are quick and not aiming to be well-written or particularly useful for others to read because it doesn't clutter up the website the way a post or comment on other people's posts would.
Why is aliens wanting to put us in a zoo more plausible than the AI wanting to put us in a zoo itself?
Edit: Ah, there are more aliens around so even if the average alien doesn't care about us, it's plausible that some of them would?
And the biggest question for me is not, is AI going to doom the world? Can I work on this in order to save the world? A lot of people expect that would be the question. That’s not at all the question. The question for me is, is there a concrete problem that I can make progress on? Because in science, it’s not sufficient for a problem to be enormously important. It has to be tractable. There has to be a way to make progress. And this was why I kept it at arm’s length for as long as I did.
I thought this was interesting. But it does feel like with this AI thi...
One way people can help is by stating their beliefs on AI and the confidence in those beliefs to their friends, family members, and acquaintances who they talk to.
Currently, a bunch of people are coming across things in the news talking about humanity going extinct if AI progress continues as it has and no more alignment research happens. I would expect many of them to not think seriously about it because it's really hard to shake out of the "business as usual" frame. Most of your friends and family members probably know you're a reasonable, thoughtful per...
Hopefully this isn't too rude to say, but: I am indeed confused how you could be confused
Fwiw, I was also confused and your comment makes a lot more sense now. I think it's just difficult to convert text into meaning sometimes.
Thanks for posting this. It's insightful reading other people thinking through career/life planning of this type.
Am curious about how you feel about the general state of the alignment community going into the midgame. Are there things you hoped you/alignment community had more of / achievable things that could have been different by the time the early game ended that would have been nice?
"I have a crazy take that the kind of reasoning that is done in generative modeling has a bunch of things in common with the kind of reasoning that is valuable when developing algorithms for AI alignment"
Cool!!
Wow, the quoted text feels scary to read.
I have met people within effective altruism who seem to be trying to do scary, dark things to their beliefs/motivations, which feels in the same category, like trying to convince themselves they don't care about anything besides maximising impact or reducing x-risk. The latter, in at least one case, by thinking lots about dying due to AI to start caring about it more, which can't be good for thinking clearly in the way they described it.
From Ray Kurzweil's predictions for 2019 (written in 1999):
On Politics and Society
...People are beginning to have relationships with automated personalities as companions, teachers, caretakers, and lovers. Automated personalities are superior to humans in some ways, such as having very reliable memories and, if desired, predictable (and programmable) personalities. They are not yet regarded as equal to humans in the subtlety of their personalities, although there is disagreement on this point.
An undercurrent of concern is developing with regard to the i
There are too many books I want to read but probably won't get around to reading any time soon. I'm more likely to read them if there's someone else who's also reading it at a similar pace and I can talk to them about the book. If anyone's interested in going through any of the following books in June and discussing it together, message me. We can decide on the format later, it could just be reading the book and collaborating on a blog post about it together, or for more textbook-like things, reading a couple of selected chapters a week and going over the ...
I don't remember if I put down "inside view" on the form when filling it out but that does sound like the type of thing I may have done. I think I might have been overly eager at the time to say I had an "inside view" when what I really had was: confusion and disagreements with others' methods for forecasting, weighing others' forecasts in a mostly non-principled way, intuitions about AI progress that were maybe overly strong and as much or more based on hanging around a group of people and picking up their beliefs instead of evaluating evidence for myself...
How do we get LLM human imitations?
The answers I got for your examples using ChatGPT-4:
Q: Could you get drunk from drinking a drunk person's blood?
...I am not a medical professional, but I can provide some general information on the topic. It is highly unlikely that you would get drunk from drinking a drunk person's blood. When a person consumes alcohol, it is absorbed into their bloodstream, and their blood alcohol content (BAC) rises. However, the concentration of alcohol in their blood is still relatively low compared to the amount you would need to consume to feel intoxicated.
Drinking some
GPT-4 generated TL;DR (mostly endorsed but eh):
The really cool bit was when he had a very quick mockup of a web app drawn on a piece of paper and uploaded a photo of it and GPT-4 then used just that to write the HTML and JavaScript for the app based on the drawing.
I would be appreciative if you do end up writing such a post.
Sad that sometimes the things that seem good for creating a better, more honest, more accountable community for the people in it also give outsiders ammunition. My intuitions point strongly in the direction of doing things in this category anyway.
I can see how the article might be frustrating for people who know the additional context that the article leaves out (where some of the additional context is simply having been in this community for a long time and having more insight into how it deals with abuse). From the outside though, it does feel like some factors would make abuse more likely in this community: how salient "status" feels, mixing of social and professional lives, gender ratios, conflicts of interests everywhere due to the community being small, sex positivity and acceptance of weirdn...
Yeah, I might want to write a post that tries to actually outline the history of abuse that I am aware of, without doing weird rhetorical tricks or omitting information. I've recently been on a bit of a "let's just put everything out there in public" spree, and I would definitely much prefer for new people to be able to get an accurate sense of the risk of abuse and harm, which, to be clear, is definitely not zero and feels substantial enough that people should care about it.
I do think the primary reason why people haven't written up stuff in the past is e...
This was a somewhat emotional read for me.
When I was between the ages of 11-14, I remember being pretty intensely curious about lots of stuff. I learned a bunch of programming and took online courses on special relativity, songwriting, computer science, and lots of other things. I liked thinking about maths puzzles that were a bit too difficult for me to solve. I had weird and wild takes on things I learned in history class that I wanted to share with others. I liked looking at ants and doing experiments on their behaviour.
And then I started to feel ...
I agree that these are pretty malleable. For example, about ~1 year ago, I was probably two standard deviations less relentless and motivated in research topics, and probably a standard deviation on hustle/resourcefulness.
Interesting! Would be very curious to hear if there were specific things you think caused the change.
Fairs. I am also liking the concept of "sanity" and notice people use that word more now. To me, it points at some of the psychological stuff and also the vibe in the What should you change in response to an "emergency"? And AI risk post.
I like "improving log odds of survival" as a handle. I don't like catchy concept names in this domain because they catch on more than understanding of the concept they refer to.
I thought it was interesting when Oli said that there are so many good ideas in mechanism design and that the central bottleneck of mechanism design is that nobody understands UI design to take advantage of them. Would be very interested if other folks have takes or links to good mechanism design ideas that are neglected/haven't been properly tried enough or people/blogs that talk about stuff like that.
Cullen O'Keefe also no longer at OpenAI (as of last month)