Day 2 of forced writing with an accountability partner
With all of the existential weight of the alignment problem upon people’s shoulders, some may find it in poor taste to discuss gimmicky ways of solving it. I am not one of those people, so in this shortform I’ll introduce what I call “Top God Alignment,” which is perhaps most oversimplistically summarized as “the simulation argument + pascal’s wager + wishful chicanery.”
Up front, do I think it will work? No. However, I’m currently unclear why it won’t work, and after asking multiple people and hearing unconvincing objections (some of which the people retracted in the face of responses), I’m increasingly curious to figure out where this is wrong.
What is the method? It is roughly formulated as follows:
The result is seemingly a recursive structure which, theoretically, could result in dozens or hundreds (or more) simulated worlds. Thus, “Bob” cannot tell whether he is in fact “Top God,” or if he is just another Bob’s Charlie (i.e., a demi-god). Out of fear of being brutally punished, Bob will ideally prefer to go along with the cycle.
In a future post, I will go into detail to respond to the objections that I have heard from people and/or I suspect some people will have. Then again, it seems entirely plausible that by that time, I will have actually written enough about this idea to discover some clear flaw that just isn’t that obvious in conversations, where the premises and arguments are a bit fast and loose. Still, I’ll highlight now that I think that if you assign credence to the simulation argument and understand its defenses, this does a fair bit prebuttal. Moreover, I think people are too often hastily dismissive of Pascal’s Wager on the basis of relatively slim (but still potentially legitimate!) objections, such as the Professor’s God.
Despite my responses, I’m still incredibly pessimistic and don’t take this seriously. There are a few reasons for this:
Ultimately, as of right now, this seems to be the best option in my mental folder of “gimmick alignment solutions,” which is an incredibly low bar. But if nothing else I’ve had fun playing with it and semi-sarcastically presenting it at parties/with friends. Now that I've established myself as Top God's Prophet Premier, I'll sign off 🙏
Day 1 of forced writing with an accountability partner (for context: I plan to write at least 500 words on some topic every day/weekday for the next few weeks... I occasionally rely on Chat-GPT to turn outlines into paragraphs):
Title: Can we Make a Better Concept Learning System Than Lists and Tag Libraries?
I enjoy finding concrete concepts that are valuable and which I can clearly delineate between knowing and not knowing. For example, Schelling points refer to the ability or tendency of people to coordinate their actions around certain salient or focal points, even in the absence of explicit communication; Survivor bias is the tendency to focus on successful individuals or outcomes while ignoring those who were unsuccessful; R&D externalities refer to the positive spillover effects of research and development activities, and can better explain why businesses choose not to invest in seemingly valuable research/technology (as opposed to narratives such as “shareholders are irrationally short-sighted or risk-averse”).
One might argue that there are already many lists out there that provide similar information, so why is this different and better? There are a few reasons why the system I have in mind may outperform a traditional “list of valuable concepts”, but many of these boil down to aggregation, curation, and tailoring: there are potentially hundreds or even thousands of concepts and audiences may have diverse intellectual backgrounds, so you probably want better systems for filtering or recommending concepts for users rather than a “one-size fits all” list. At the same time, you also probably want to bring multiple lists into one place. There are a few ways in which this might be better achieved with a more advanced platform of the type I have in mind:
There is also a potential argument to make for dynamically crowdsourcing these ideas (rather than relying on a single author and/or at a fixed point in time), although this probably has some limitations.
Moving forward, there are a few things to consider.
By addressing these issues, we can create a system that provides real value to individuals looking to expand their knowledge and decision-making abilities.
Day 5 of forced writing with an accountability partner!
Leverage wrote a report on “argument mapping” in the early 2010s and published the findings in 2020. I am very interested in ”argument mapping”[1] for tough analytical problems like AI policy, and multiple people have directed me to this report when I bring up the topic. I think this report raises some important points but its findings are probably flawed—or at the very least, people reading the report probably derive an overly-pessimistic view of “argument mapping” as a whole, especially given that the evaluation metrics are strange.[2]
Rather than focus on where I agree with the report, in this shortform I will just briefly outline some of the qualms I have with this report. I do not consider these rebuttals definitive—I recognize that there may be more to the research than I can see—but I could not easily determine if/how the report responds to some of these criticisms (which has notable irony to it). Some of these objections include:
This term is painfully broad and, as Leverage demonstrates, often is used to refer to methods which I would not endorse, such as when they try create deductive arguments or otherwise heavily use formal logic. However, in lieu of a better term at the moment, I will continue referring to argument mapping in scare quotes.
Thus, it might be possible to claim that the report was accurate in its findings, but that the problem simply comes from misinterpretation. I think that the scope itself was problematic and undesirable, but in this shortform I will reserve deeper judgments on the matter.
I couldn’t quickly verify whether the report used alternative terms to get at this idea, but I don’t recall seeing this on previous occasions when I half-skimmed-half-read the report...
Day 4 of forced writing with an accountability partner!
The Importance (and Potential Failure) of "Pragmatism"[1] in Definitional Debates
In various settings, whether it's competitive debate, the philosophy of leadership class I took in undergrad, or the ACX philosophy of science meet-up I just attended, it's common for people to engage in definitional debates. For example, what is “science?” What is “leadership?” These questions touch on some nerves with people who want to defend or challenge the general concept in question, and it drives people towards debating about “the right” definitions—even if they don’t always say it that way. In competitive debate, debaters will sometimes explicitly say that their definition is the “right” definition, while in other cases they may say their definition is “better” with a clear implication that they mean “more correct” (e.g., "our dictionary/source is better than yours").
My initial (hot?) takes here are twofold:
First, when you find yourself in a muddy definitional debate (and you actually want to make progress), stop running on autopilot where you debate about whose definitions are “correct,” and focus instead on asking the pragmatic question: which definition is more helpful for answering specific questions, solving specific problems, or generally facilitating better discussion? Instead of getting stuck on abstract definitions, it's important to tailor the definition to the purpose of the discussion. For example, if you’re trying to run a study on the effects of individual “leadership” on business productivity, you should make sure anyone reading the study knows how you operationalized that variable (and make a clear warning to not misinterpret it). Similarly, if you’re judging a competitive debate, I’ve written about the importance of "debate theory[2] which makes debate more net beneficial," rather than blindly following norms or rules even in the face of loopholes or nonsense. In short, figure out what you’re actually optimizing for and optimize for that, with the recognition that it may not be some abstract (and perhaps purely nonexistent) notion of “correctness.” (To add an addendum, I would emphasize that regardless of whether this seems obvious to people when actually written down, in practice it just isn’t obvious to people in so many discussions I’ve been in; autopilot is subtle and powerful.)
Second, sometimes the first point is misleading and you should reject it and run on autopilot when it comes to definitions. As much as I liked Pragmatism [read: Consequentialism?] as a unifying, bedrock theory of competitive debate, I acknowledged that even Pragmatism could theoretically say "don't always think in terms of Pragmatism" and instead advocate defaulting to principles like “follow the rules unless there is abundantly clear reason not to.” Maybe there is no perfect definition of things like "elephant," but the definitions that exist are good enough for most conversations that you shouldn’t interrupt discussions and break out the Pragmatism argument to defend someone who starts saying that warthogs are elephants. So-called "Utilitarian calculus" even in its mild forms can easily be outperformed by rules of thumb and heuristics; humans are imperfect (e.g., we aren’t perfectly unitary in our own interests) and might be subject to self-deception/bias; all computational systems face constraints on data collection and computation (along with communication bandwidth and other capacity for enacting plans). To oversimplify and make nods to Kahneman’s System 1 vs. System 2 concept, I posit that humans can engage in cluster-y "modes of thought," and it’s hard to actually optimize in the spaces between those modes of thought. Thus, it’s sometimes better to just default to regular conversational autopilot regarding abstract “correctness” of definitions when the "rightness factor" in a given context is something like 0.998 (unless you are trying to focus on the .002 exception case).
I don't have the time or brainpower to go in greater detail on the synthesis of these two points, but I think they ought to be highlighted.
[Update, 3/29/23: I meant to clarify that I realize "Pragmatism" is an actual label that some people use to refer to a philosophical school of thought, but I'm not using it in that way here.]
I use the term "debate theory" in a broad sense that includes questions like “how to decide which definitions are better.” More generally, I would probably describe it as "meta-level arguments about how people—especially judges—should evaluate something in debate, such as whether some type of argument is 'legitimate.'
I try to ask myself whether the tenor of what I'm saying overshadows definitional specificity, and how I can provide a better mood or angle. If my argument is not atonal - if my points line up coherently, such that a willing ear will hear, definitionalist debates should slide on by.
As a descriptivist, rather than a prescriptivist, it really sucks to have to fall back on Socratic methods of pre-establishing definitions, except in highly-technical locations.
Thus, I prefer to avoid arguments which hinge on definitions altogether. This doesn't preclude examples-based arguments, where for example, various interlocutors are operating off different definitions of the same terms but have different examples.
For example, take the term tai.
For some, tai means not when ai is agentic, but when ai can transform the economy in some large or measurable way. For others, it is when the first agentic ai deployed at scale occurs. Yet still, others have differing definitions! Definitions which wildly transform predictions and change alignment discussions. Despite using the term with each other in different ways- with separate definitions- interlocutors often do not notice (or perhaps are subconsciously able to resolve the discrepancy?)!
TAI seems like a partially good example for illustrating my point: I agree that it's crucial that people have the same thing in mind when debating about TAI in a discussion, but I also think it's important to recognize that the goal of the discussion is (probably!) not "how should everyone everywhere define TAI" and instead is probably something like "when will we first see 'TAI.'" In that case, you should just choose whichever definition of TAI makes for a good, productive discussion, rather than trying to forcefully hammer out "the definition" of TAI.
I say partially good, however, because thankfully the term TAI has not taken such historically established root in people's minds and in dictionaries, so I think (hope!) most people accept there is not "a (single) definition."
Words like "science," "leadership," "Middle East," and "ethics," however... not the same story 😩🤖
Day 3 of writing with an accountability partner!
In my previous shortform, I introduced Top God Alignment, a foolproof gimmick alignment strategy that is basically “simulation argument + Pascal’s Wager + wishful chicanery.” In this post I will address some of the objections I’ve already heard, expect other people have, or have thought of myself.
(Note, Nate Soares was just unoccupied in a social setting when I asked this question)