I'm a 3rd year PhD student at Columbia. My academic interests lie in mechanism design and algorithms related to the acquisition of knowledge. I write a blog on stuff I'm interested in (such as math, philosophy, puzzles, statistics, and elections): https://ericneyman.wordpress.com/
I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions:
Yeah, there's definitely value in experts being allowed to submit multiple times, allowing them to update on other experts' submissions. This is basically the frame taken in Chapter 8, where Alice and Bob update their estimate based on the other's estimate at each step. This is generally the way prediction markets work, and I think it's an understudied perspective (perhaps because it's more difficult to reason about than if you assume that each expert's estimate is static, i.e. does not depend on other experts' estimates).
Thanks! Here are some brief responses:
From the high level summary here it sounds like you're offloading the task of aggregation to the forecasters themselves. It's odd to me that you're describing this as arbitrage.
Here's what I say about this anticipated objection in the thesis:
For many reasons, the expert may wish to make arbitrage impossible. First, the principal may wish to know whether the experts are in agreement: if they are not, for instance, the principal may want to elicit opinions from more experts. If the experts collude to report an aggregate value (as in our example), the principal does not find out whether they originally agreed. Second, even if the principal only seeks to act based on some aggregate of the experts' opinions, their method of aggregation may be different from the one that experts use to collude. For instance, the principal may have a private opinion on the trustworthiness of each expert and wishes to average the experts' opinions with corresponding weights. Collusion among the experts denies the principal this opportunity. Third, a principal may wish to track the accuracy of each individual expert (to figure out which experts to trust more in the future, for instance), and collusion makes this impossible. Fourth, the space of collusion strategies that constitute arbitrage is large. In our example above, any report in [0.546, 0.637] would guarantee a profit; and this does not even mention strategies in which experts report different probabilities. As such, the principal may not even be able to recover basic information about the experts' beliefs from their reports.
For example, when I worked with IARPA on geopolitical forecasting, our forecasters would get financial rewards depending on what percentile they were in relative to other forecasters.
This would indeed be arbitrage-free, but likely not proper: it wouldn't necessarily incentivize each expert to report their true belief; instead, an expert's optimal report is going to be some sort of function of the expert's belief about the joint probability distribution over the experts' beliefs. (I'm not sure how much this matters in practice -- I defer to you on that.)
It's surprising to me that you could disincentivize forecasters from reporting the aggregate as their individual forecast.
In Chapter 4, we are thinking of experts as having immutable beliefs, rather than beliefs that change upon hearing other experts' beliefs. Is this a silly model? If you want, you can think of these beliefs as each expert's belief after talking to the other experts a bunch. In theory(?) the experts' beliefs should converge (though I'm not actually clear what happens if the experts are computationally bounded); but in practice, experts often don't converge (see e.g. the FRI adversarial collaboration on AI risk).
It seems to me that under sufficiently pessimistic conditions, there would be no good way to aggregate those two forecasts.
Yup -- in my summary I described "robust aggregation" as "finding an aggregation strategy that works as well as possible in the worst case over a broad class of possible information structures." In fact, you can't do anything interesting in the worse case over all information structures. The assumption I make in the chapter in order to get interesting results is, roughly, that experts' information is substitutable rather than complementary (on average over the information structure). The sort of scenario you describe in your example is the type of example where Alice and Bob's information might be complementary.
Great questions!
(Note: I work with Paul at ARC theory. These views are my own and Paul did not ask me to write this comment.)
I think the following norm of civil discourse is super important: do not accuse someone of acting in bad faith, unless you have really strong evidence. An accusation of bad faith makes it basically impossible to proceed with discussion and seek truth together, because if you're treating someone's words as a calculated move in furtherance of their personal agenda, then you can't take those words at face value.
I believe that this post violates this norm pretty egregiously. It begins by saying that hiding your beliefs "is lying". I'm pretty confident that the sort of belif-hiding being discussed in the post is not something most people would label "lying" (see Ryan's comment), and it definitely isn't a central example of lying. (And so in effect it labels a particular behavior "lying" in an attempt to associate it with behaviors generally considered worse.)
The post then confidently asserts that Paul Christiano hides his beliefs in order to promote RSPs. This post presents very little evidence presented that this is what's going on, and Paul's account seems consistent with the facts (and I believe him).
So in effect, it accuses Paul and others of lying, cowardice, and bad faith on what I consider to be very little evidence.
Edited to add: What should the authors have done instead? I think they should have engaged in a public dialogue with one or more of the people they call out / believe to be acting dishonestly. The first line of the dialogue should maybe have been: "I believe you have been hiding your beliefs, for [reasons]. I think this is really bad, for [reasons]. I'd like to hear your perspective."
To elaborate on my feelings about the truck:
(Obviously, I think the events "this is at least partially an attack on Paul" and "at least one of the authors of this post are connected to Control AI" are positively correlated, since this post is an attack on Paul. My probabilities are roughly 85% and 97%*, respectively.)
*For a broad-ish definition of "connected to"
I don't particularly see a reason to dox the people behind the truck, though I am not totally sure. My bar against doxxing is pretty high, though I do care about people being held accountable for large scale actions they take.
That's fair. I think that it would be better for the world if Control AI were not anonymous, and I judge the group negatively for being anonymous. On the other hand, I don't think I endorse them being doxxed. So perhaps my request to Connor and Gabriel is: please share what connection you have to Control AI, if any, and share what more information you have permission to share.
(Conflict of interest note: I work at ARC, Paul Christiano's org. Paul did not ask me to write this comment. I first heard about the truck (below) from him, though I later ran into it independently online.)
There is an anonymous group of people called Control AI, whose goal is to convince people to be against responsible scaling policies because they insufficiently constraint AI labs' actions. See their Twitter account and website (also anonymous Edit: now identifies Andrea Miotti of Conjecture as the director). (I first ran into Control AI via this tweet, which uses color-distorting visual effects to portray Anthropic CEO Dario Amodei in an unflattering light, in a way that's reminiscent of political attack ads.)
Control AI has rented a truck that had been circling London's Parliament Square. The truck plays a video of "Dr. Paul Christiano (Made ChatGPT Possible; Government AI adviser)" saying that there's a 10-20% chance of an AI takeover and an overall 50% chance of doom, and of Sam Altman saying that the "bad case" of AGI is "lights out for all of us". The back of the truck says "Responsible Scaling: No checks, No limits, No control". The video of Paul seems to me to be an attack on Paul (but see Twitter discussion here).
I currently strongly believe that the authors of this post are either in part responsible for Control AI, or at least have been working with or in contact with Control AI. That's because of the focus on RSPs and because both Connor Leahy and Gabriel Alfour have retweeted Control AI (which has a relatively small following).
Connor/Gabriel -- if you are connected with Control AI, I think it's important to make this clear, for a few reasons. First, if you're trying to drive policy change, people should know who you are, at minimum so they can engage with you. Second, I think this is particularly true if the policy campaign involves attacks on people who disagree with you. And third, because I think it's useful context for understanding this post.
Could you clarify if you have any connection (even informal) with Control AI? If you are affiliated with them, could you describe how you're affiliated and who else is involved?
EDIT: This Guardian article confirms that Connor is (among others) responsible for Control AI.
Social graces are not only about polite lies but about social decision procedures on maintaining game theoretic equilibria to maintain cooperation favoring payoff structures.
This sounds interesting. For the sake of concreteness, could you give a couple of central examples of this?
I'm curious what disagree votes mean here. Are people disagreeing with my first sentence? Or that the particular questions I asked are useful to consider? Or, like, the vibes of the post?
(Edit: I wrote this when the agree-disagree score was -15 or so.)