Sam F. Brown

My emotional reaction to the current funding situation

I’m allowed to spend two days a week at Trajan House, a building in Oxford which houses the Center for Effective Altruism (CEA), along with a few EA-related bodies. Two days is what I asked for, and what I received. The rest of the time I spend in the Bodleian Library of the University of Oxford (about £30/year, if you can demonstrate an acceptable “research need”), a desk at a coworking space in Ethical Property (which houses Refugee Welcome, among other non-EA bodies, for £200/month), Common Ground (a cafe/co-working space which I’ve recommended to people as a place where the staff explicitly explain, if you ask, that you don’t need to order anything to stay as long as you like), a large family house I’m friends with, and various cafes and restaurants where I can sit for hours while only drinking mint tea. I’m allowed to use the hot-desk space at Trajan House because I’m a recipient of an EA Long Term Future Fund grant, to research Alignment. (I call this “AI safety” to most people, and sometimes have to explain that AI stands for Artificial Intelligence.) I judged that 6 months of salary at the level of my previous startup job, with a small expenses budget, came to about £40,000. This is what I asked for, and what I received. At my previous job I thought I was having a measurable, meaningful impact on climate change. When I started there, I imagined that I’d go on to found my own startup. I promised myself it would be the last time I’d be employed. When I quit that startup job, I spent around a year doing nothing-much. I applied to Oxford’s Philosophy BPhil, unsuccessfully. I looked at startup incubators and accelerators. But mostly, I researched Alignment groups. I visited Conjecture, and talked to people from Deep Mind, and the Future of Humanity Institute. What I was trying to do, was to discern whether Alignment was “real” or not. Certainly, I decided, some of these people were cleverer than me, more hard-working than me, better-informed. Some seem deluded

108Sep 9, 2022

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

84Jun 13, 2024

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

49Nov 8, 2023

AstralCodexTen and Rationality Meetup Organisers’ Retreat — Europe, Middle East, and Africa 2023

25Sep 15, 2022

Sam F. Brown

Message

381

Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents

Summary * Scaffolded LLM agents are, in principle, able to execute arbitrary code to achieve the goals they have been set. One such goal could be self-improvement. * This post outlines our plans to build a benchmark to measure the ability of LLM agents to modify and improve other LLM...

Jul 22, 202420

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

We have written a paper on sandbagging for which we present the abstract and brief results in this post. See the paper for more details. Tweet thread here. Illustration of sandbagging. Evaluators may regulate the deployment of AI systems with dangerous capabilities, potentially against the interests of the AI system...

Jun 13, 202484

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

This post summarizes work done over the summer as part of the Summer 2023 AI Safety Hub Labs programme. Our results will also be published as part of an upcoming paper. In this post, we focus on explaining how we define and evaluate properties of deceptive behavior in LMs and...

Nov 8, 202349

How to find cool things in a new place

A friend asked me for advice on how to find stuff that's going on in a place. I feel a bit self-conscious positioning myself as some kind of expert here, but I do think I've done a good job of this in Oxford. None of this is galaxy-brain stuff, just...

Jan 24, 202312

Questions about Value Lock-in, Paternalism, and Empowerment

Pulling a child out of the path of a fast car is the right thing to do, whether or not the child agrees, understands, is grateful, or even is hurt during the rescue. Paternalistic acts like this, when we might argue that a person's straightforward consent ought to be overridden,...

Nov 16, 202213

AstralCodexTen and Rationality Meetup Organisers’ Retreat — Europe, Middle East, and Africa 2023

TL;DR: Retreat for EMEA meetup organisers, Summer 2023, apply here Application deadline: 31ˢᵗ Oct 2022 Why a retreat? We think we all benefit from having an international community of rationality and ACX meetup organisers who can go to one another for advice and support. The goal of this retreat is...

Sep 15, 202225

My emotional reaction to the current funding situation

Sep 9, 2022108

Load More (7/8)

LESSWRONG
LW

LESSWRONG
LW

Sam F. Brown

Sam F. Brown

Sam F. Brown

My emotional reaction to the current funding situation

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

AstralCodexTen and Rationality Meetup Organisers’ Retreat — Europe, Middle East, and Africa 2023

Sam F. Brown

Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

How to find cool things in a new place

Questions about Value Lock-in, Paternalism, and Empowerment

AstralCodexTen and Rationality Meetup Organisers’ Retreat — Europe, Middle East, and Africa 2023

My emotional reaction to the current funding situation

My emotional reaction to the current funding situation

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

AstralCodexTen and Rationality Meetup Organisers’ Retreat — Europe, Middle East, and Africa 2023

Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

How to find cool things in a new place

Questions about Value Lock-in, Paternalism, and Empowerment

AstralCodexTen and Rationality Meetup Organisers’ Retreat — Europe, Middle East, and Africa 2023

My emotional reaction to the current funding situation