Introduction We (Adam Shimi, Joe Collman & myself) are trying to emulate peer review feedback for Alignment Forum posts. This is the second review in the series. The first’s introduction sums up our motivation and approach rather well, we will not duplicate it here. Instead, let’s dive into today’s reviewed...
Introduction This review is part of a project with Joe Collman and Jérémy Perret to try to get as close as possible to peer review when giving feedback on the Alignment Forum. Our reasons behind this endeavor are detailed in our original post asking for suggestions of works to review;...
Alternate title: learning from fictional evidence. I've seen echoes of this idea elsewhere but couldn't find a description that suits me. My main idea is: you can update from your observed reaction to fiction and/or counterfactuals. The fallacy of generalizing from fictional evidence happens when you treat events having happened...
This week, the key alignment group, we answered two questions, 5-minute timer style: 1. Map out all of alignment (25 minutes) 2. Create an image/ table representing alignment (10 min.) You are free to stop here, to actually try to answer the questions yourself. Here is a link for a...
I want to make an actionable map of AI alignment. After years of reading papers, blog posts, online exchanges, books, and occasionally hidden documents about AI alignment and AI risk, and having extremely interesting conversations about it, most arguments I encounter now feel familiar at best, rehashed at worst. This...
Epistemic status: oversimplification of a process I'm confident about; meant as proof of concept. Related to: Double-Dipping in Dunning-Kruger Expertise comes in different, mostly independent layers. To illustrate them, I will describe the rough process of a curious mind discovering a field of study. Discovery In the beginning, the Rookie...
Rationality is designed to make you win, to help you attain your objectives. One of the most prominent phenomena getting in the way is akrasia, the lack of willpower preventing us to perform whatever action we want to do. So, wait, do we want to act or not? There are...