Joe Collman — LessWrong

Existing Safety Frameworks Imply Unreasonable Confidence

This is part of the MIRI Single Author Series. Pieces in this series represent the beliefs and opinions of their named authors, and do not claim to speak for all of MIRI. Most human endeavors have bounded results. A construction project may result in a functional bridge or a deadly...

Apr 10, 202546

Truthfulness, standards and credibility

-1: Meta Prelude While truthfulness is a topic I’ve been thinking about for some time, I’ve not discussed much of what follows with others. Therefore, at the very least I expect to be missing important considerations on some issues (where I’m not simply wrong). I’m hoping this should make any...

Apr 7, 202212

Review of "Learning Normativity: A Research Agenda"

Introduction We (Adam Shimi, Joe Collman & myself) are trying to emulate peer review feedback for Alignment Forum posts. This is the second review in the series. The first’s introduction sums up our motivation and approach rather well, we will not duplicate it here. Instead, let’s dive into today’s reviewed...

Jun 6, 202137

Review of "Fun with +12 OOMs of Compute"

Introduction This review is part of a project with Joe Collman and Jérémy Perret to try to get as close as possible to peer review when giving feedback on the Alignment Forum. Our reasons behind this endeavor are detailed in our original post asking for suggestions of works to review;...

Mar 28, 202165

A Critique of Non-Obstruction

Epistemic status: either I’m confused, or non-obstruction isn’t what I want. This is a response to Alex Turner’s Non-Obstruction: A simple Concept Motivating Corrigibility. Please read that first, and at least skim Reframing Impact where relevant. It’s all good stuff. I may very well be missing something: if not, it...

Feb 3, 202113

Optimal play in human-judged Debate usually won't answer your question

Epistemic status: highly confident (99%+) this is an issue for optimal play with human consequentialist judges. Thoughts on practical implications are more speculative, and involve much hand-waving (70% sure I’m not overlooking a trivial fix, and that this can’t be safely ignored). Note: I fully expect some readers to find...

Jan 27, 202133

Literature Review on Goal-Directedness

Introduction: Questioning Goals Goals play a central role in almost all thinking in the AI existential risk research. Common scenarios assume misaligned goals, be it from a single AGI (paperclip maximizer) or multiple advanced AI optimizing things we don’t want (Paul Christiano’s What Failure Looks Like). Approaches around this issue...

Jan 18, 202180