Thanks for this post; I don't know much about Ought other than what you've just said, so sorry if this has already been answered elsewhere:
You say that " Designing an objective that incentivizes experts to reveal what they know seems like a critical step in AI alignment. "
It also seems like a crucial step in pretty much all institution design. Surely there is a large literature on this already? Surely there have been scientific experiments run on this already? What does the state of modern science on this question look like right now, and does Ought have plans to collaborate with academics in some manner? A quick skim of the Ought website didn't turn up any references to existing literature.
There is a large economics literature on principal agent problems, optimal contracting, etc.; these usually consider the situation where we can discover the ground truth or see the outcome of a decision (potentially only partially, or at some cost) and the question is how to best structure incentives in light of that. This typically holds for a profit-maximizing firm, at least to some extent, since they ultimately want to make money. I'm not aware of work in economics that addresses the situation where there is no external ground truth, except to prove negative results which justify the use of other assumptions. I don't believe there's much that would be useful to Ought, probably because it's a huge mess and hard to fit into the field's usual frameworks.
(I actually think even the core economics questions relevant to Ought, where you do have a ground truth and expensive monitoring, a pool of risk-averse expert some of whom are malicious, etc.; aren't fully answered in the economics literature, and that these versions of the questions aren't a major focus in economics despite being theoretically appealing from a certain perspective. But (i) I'm much less sure of that, and someone would need to have some discussion with relevant experts to find out, (ii) in that setting I do think economists have things to say even if they haven't answered all of the relevant questions.)
In practice, I think institutions are basically always predicated on one of (i) having some trusted experts, or a principal with understanding of the area, (ii) having someone trusted who can at least understand the expert's reasoning when adequately explained, (iii) being able to monitor outcomes to see what ultimately works well. I don't really know of institutions that do well when none of (i)-(iii) apply. Those work OK in practice today but seem to break down quickly as you move to the setting with powerful AI (though even today I don't think they work great and would hope that a better understanding could help, I just wouldn't necessarily expect it to help as much as work that engages directly with existing institutions and their concrete failures).
Thanks for this post, Paul!
NOTE: Response to this post has been even greater than we expected. We received more applications for experiment participant than we currently have the capacity to manage so we are temporarily taking the posting down. If you've applied and don't hear from us for a while, please excuse the delay! Thanks everyone who has expressed interest - we're hoping to get back to you and work with you soon.
Though they're still actively searching for the senior web developer role: https://ought.org/careers/web-developer
Great post.
One of the things that it didn't cover (which the website didn't either) was whether any of the work (aside from the experiment participant role/s) was work that could be done remotely. (So I assume not.)
(I'm helping Ought hire for the web dev role.)
Ought is based in SF (office in North Beach).
Ideally we'd find someone who could work out of the SF office, but we're open to considering remote arrangements. One of our full-time researchers is based remotely and periodically visits the SF office to co-work.
Allowing donors to coordinate better in the face of uncertainty about information sources seems really important even if it doesn't directly wind up applying to ML.
I think that Ought is one of the most promising projects working on AI alignment. There are several ways that LW readers can potentially help:
In this post I'll describe what Ought is currently doing, why I think it's promising, and give some detail on these asks.
(I am an Ought donor and board member.)
Factored evaluation
Ought's main project is currently designing and running "factored evaluation" experiments, and building relevant infrastructure. The goal of these experiments is to answer the following question:
Here's what an experiment looks like:
This is not Ought's only project, but it's currently the largest single focus. Other projects include: exploring how well we can automate the judge's role on simple questions using existing ML, and thinking about possible decomposition strategies and challenges for factored evaluation.
Why this is important for AI alignment
ML systems are trained by gradient descent to optimize a measurable objective. In the best case (i.e. ignoring misaligned learned optimization) they behave like an expert incentivized to optimize that objective. Designing an objective that incentivizes experts to reveal what they know seems like a critical step in AI alignment. I think human experts are often a useful analogy for powerful ML systems, and that we should be using that analogy as much as we can.
Not coincidentally, factored evaluation is a major component of my current best-guess about how to address AI alignment, which could literally involve training AI systems to replace humans in Ought's current experiments. I'd like to be at the point where factored evaluation experiments are working well at scale before we have ML systems powerful enough to participate in them. And along the way I expect to learn enough to substantially revise the scheme (or totally reject it), reducing the need for trials in the future when there is less room for error.
Beyond AI alignment, it currently seems much easier to delegate work if we get immediate feedback about the quality of output. For example, it's easier to get someone to run a conference that will get a high approval rating, than to run a conference that will help participants figure out how to get what they actually want. I'm more confident that this is a real problem than that our current understanding of AI alignment is correct. Even if factored evaluation does not end up being critical for AI alignment I think it would likely improve the capability of AI systems that help humanity cope with long-term challenges, relative to AI systems that help design new technologies or manipulate humans. I think this kind of differential progress is important.
Beyond AI, I think that having a clearer understanding of how to delegate hard open-ended problems would be a good thing for society, and it seems worthwhile to have a modest group working on the relatively clean problem "can we find a scalable approach to delegation?" It wouldn't be my highest priority if not for the relevance to AI, but I would still think Ought is attacking a natural and important question.
Ways to help
Web developer
I think this is likely to be the most impactful way for someone with significant web development experience to contribute to AI alignment right now. Here is the description from their job posting:
Experiment participants
Ought is looking for contractors to act as judges, honest experts, and malicious experts in their factored evaluation experiments. I think that having competent people doing this work makes it significantly easier for Ought to scale up faster and improves the probability that experiments go well---my rough guess is that a very competent and aligned contractor working for an hour does about as much good as someone donating $25-50 to Ought (in addition to the $25 wage).
Here is the description from their posting:
Donate
I think Ought is probably the best current opportunity to turn marginal $ into more AI safety, and it's the main AI safety project I donate to. You can donate here.
They are spending around $1M/year. Their past work has been some combination of: building tools and capacity, hiring, a sequence of exploratory projects, charting the space of possible approaches and figuring out what they should be working on. You can read their 2018H2 update here.
They have recently started to scale up experiments on factored evaluation (while continuing to think about prioritization, build capacity, etc.). I've been happy with their approach to exploratory stages, and I'm tentatively excited about their approach to execution.