The effects of subtracting or adding a "sycophancy vector" to one bias term. TL;DR: By just adding e.g. a "sycophancy vector" to one bias term, we outperform supervised finetuning and few-shot prompting at steering completions to be more or less sycophantic. Furthermore, these techniques are complementary: we show evidence that...
Summary Lots of agent foundations research is motivated by the idea that alignment techniques found by empirical trial-and-error will fail to generalize to future systems. While such threat models are plausible, agent foundations researchers have largely failed to make progress on addressing them because they tend to make very few...
According to various sources, the US Supreme Court is poised to rule on and potentially overturn the principle of "Chevron deference." Chevron deference is a key legal principle by which the entire federal bureaucracy functions, being perhaps the most cited case in American administrative law. Basically, it says that when...
Summary In this post, we empirically test Epoch AI’s theoretical model of an upper bound on AI training run lengths. According to this model, an upper bound for training run time can be estimated by assuming that the length of a training run is optimized for maximizing the FLOP/$ subject...
As evidence that compute is a major bottleneck on capabilities has accumulated, many people have become more skeptical of extremely fast takeoff speeds. One major reason for this is that if humans were trying to stop it, it would probably be difficult for an AI to quickly accumulate lots of...
Summary Arguing about the conjunctivity vs disjunctivity of AI doom seems potentially unhelpful, as it may distract from crucial object level questions about the probabilities of particular conjuncts/disjuncts. However, I argue that if we are to use this frame, then we should consider the risk from any particular AGI project...
Summary AI Macrostrategy is the study of high level questions having to do with prioritizing the use of resources on the current margin in order to achieve good AI outcomes. AI macrostrategy seems important if it is tractable. However, while few people are working on estimating particular parameters relevant to...