Previously "Lanrian" on here. Research analyst at Open Philanthropy. Views are my own.
SB1047 was mentioned separately so I assumed it was something else. Might be the other ones, thanks for the links.
lobbied against mandatory RSPs
What is this referring to?
Thanks. It still seems to me like the problem recurs. The application of Occam's razor to questions like "will the Sun rise tomorrow?" seems more solid than e.g. random intuitions I have about how to weigh up various considerations. But the latter do still seem like a very weak version of the former. (E.g. both do rely on my intuitions; and in both cases, the domain have something in common with cases where my intuitions have worked well before, and something not-in-common.) And so it's unclear to me what non-arbitrary standards I can use to decide whether I should let both, neither, or just the latter be "outweighed by a principle of suspending judgment".
To be clear: The "domain" thing was just meant to be a vague gesture of the sort of thing you might want to do. (I was trying to include my impression of what eg bracketed choice is trying to do.) I definitely agree that the gesture was vague enough to also include some options that I'd think are unreasonable.
Also, my sense is that many people are making decisions based on similar intuitions as the ones you have (albeit with much less of a formal argument for how this can be represented or why it's reasonable). In particular, my impression is that people who are are uncompelled by longtermism (despite being compelled by some type of scope-sensitive consequentialism) are often driven by an aversion to very non-robust EV-estimates.
If I were to write the case for this in my own words, it might be something like:
I like this formulation because it seems pretty arbitrary to me where you draw the boundary between a credence that you include in your representor vs. not. (Like: What degree of justification is enough? We'll always have the problem of induction to provide some degree of arbitrariness.) But if we put this squarely in the domain of ethics, I'm less fuzzed about this, because I'm already sympathetic to being pretty anti-realist about ethics, and there being some degree of arbitrariness in choosing what you care about. (And I certainly feel some intuitive aversion to making choices based on very non-robust credences, and it feels interesting to interpret that as an ~ethical intuition.)
Just to confirm, this means that the thing I put in quotes would probably end up being dynamically inconsistent? In order to avoid that, I need to put in an additional step of also ruling out plans that would be dominated from some constant prior perspective? (It’s a good point that these won’t be dominated from my current perspective.)
One upshot of this is that you can follow an explicitly non-(precise-)Bayesian decision procedure and still avoid dominated strategies. For example, you might explicitly specify beliefs using imprecise probabilities and make decisions using the “Dynamic Strong Maximality” rule, and still be immune to sure losses. Basically, Dynamic Strong Maximality tells you which plans are permissible given your imprecise credences, and you just pick one. And you could do this “picking” using additional substantive principles. Maybe you want to use another rule for decision-making with imprecise credences (e.g., maximin expected utility or minimax regret). Or maybe you want to account for your moral uncertainty (e.g., picking the plan that respects more deontological constraints).
Let's say Alice have imprecise credences. Let's say Alice follows the algorithm: "At each time-step t, I will use 'Dynamic Strong Maximality' to find all plans that aren't dominated. I will pick between them using [some criteria]. Then I will take the action that plan recommends." (And then at the next timestep t+1, you re-do everything I just said in the quotes.)
If Alice does this, does she ended up being dynamically inconsistent? (Vulnerable to dutch-books etc.)
(Maybe it varies depending on the criteria. I'm interested if you have a hunch for what the answer will be for the sort of criteria you listed: maximin expected utility, minimax regret, picking the plan that respects more deontological constraints.)
I.e., I'm interested in: If you want to use dynamic strong maximality to avoid dominated strategies, does that require you to either have the ability to commit to a plan or the inclination to consistently pick your plan from some prior epistemic perspective. (Like an "updateless" agent might.) Or do you automatically avoid dominated strategies even if you're constantly recomputing your plan?
if the trend toward long periods of internal-only deployment continues
Have we seen such a trend so far? I would have thought the trend to date was neutral or towards shorter period of internal-only deployment.
Tbc, not really objecting to your list of reasons why this might change in the future. One thing I'd add to it is that even if calendar-time deployment delays don't change, the gap in capabilities inside vs. outside AI companies could increase a lot if AI speeds up the pace of AI progress.
ETA: Dario Amodei says "Sonnet's training was conducted 9-12 months ago". He doesn't really clarify whether he's talking the "old" or "new" 3.5. Old and new sonnet were released in mid-June and mid-October, so 7 and 3 months ago respectively. Combining the 3 vs. 7 months options with the 9-12 months range imply 2, 5, 6, or 9 months of keeping it internal. I think for GPT-4, pretraining ended in August and it was released in March, so that's 7 months from pre-training to release. So that's probably on the slower side of Claude possibilities if Dario was talking about pre-training ending 9-12 months ago. But probably faster than Claude if Dario was talking about post-training finishing that early.
Source? I thought 2016 had the most takers but that one seems to have ~5% trans. The latest one with results out (2023) has 7.5% trans. Are you counting "non-binary" or "other" as well? Or referring to some other survey.