Zvi analyzes Michael Lewis' book "Going Infinite" about Sam Bankman-Fried and FTX. He argues the book provides clear evidence of SBF's fraudulent behavior, despite Lewis seeming not to fully realize it. Zvi sees SBF as a cautionary tale about the dangers of pursuing maximalist goals without ethical grounding.

Customize
If you don't believe in your work, consider looking for other options I spent 15 months working for ARC Theory. I recently wrote up why I don't believe in their research. If one reads my posts, I think it should become very clear to the reader that either ARC's research direction is fundamentally unsound, or I'm still misunderstanding some of the very basics after more than a year of trying to grasp it. In either case, I think it's pretty clear that it was not productive for me to work there. Throughout writing my posts, I felt an intense shame imagining readers asking the very fair question: "If you think the agenda is so doomed, why did you keep working on it?"[1] In my first post, I write: "Unfortunately, by the time I left ARC, I became very skeptical of the viability of their agenda."This is not quite true. I was very skeptical from the beginning, for largely similar reasons I expressed in my posts. But first I told myself that I should stay a little longer. Either they manage to convince me that the agenda is sound, or I demonstrate that it doesn't work, in which case I free up the labor of the group of smart people working on the agenda. I think this was initially a somewhat reasonable position, though it was already in large part motivated reasoning. But half a year after joining, I don't think this theory of change was very tenable anymore. It was becoming clear that our arguments were going in circles. I couldn't convince Paul and Mark (the two people thinking the most about the big picture questions), nor could they convince me. Eight months in, two friends visited me in California, and they noticed that I always derailed the conversation when they asked me about my research. I think that should have been an important thing to notice that I was ashamed to talk about my research to my friends, because I was afraid they would see how crazy it was. I should have quit then, but I stayed for another seven months. I think this was largely due to cowardice.
TsviBT201
6
Periodic reminder: AFAIK (though I didn't look much) no one has thoroughly investigated whether there's some small set of molecules, delivered to the brain easily enough, that would have some major regulatory effects resulting in greatly increased cognitive ability. (Feel free to prove me wrong with an example of someone plausibly doing so, i.e. looking hard enough and thinking hard enough that if such a thing was feasible to find and do, then they'd probably have found it--but "surely, surely, surely someone has done so because obviously, right?" is certainly not an accepted proof. And don't call me Shirley!) I'm simply too busy, but you're not! https://www.lesswrong.com/posts/jTiSWHKAtnyA723LE/overview-of-strong-human-intelligence-amplification-methods#Signaling_molecules_for_creative_brains
ryan_greenblatt*Ω9319
9
Sometimes people talk about how AIs will be very superhuman at a bunch of (narrow) domains. A key question related to this is how much this generalizes. Here are two different possible extremes for how this could go: 1. It's effectively like an attached narrow weak AI: The AI is superhuman at things like writing ultra fast CUDA kernels, but from the AI's perspective, this is sort of like it has a weak AI tool attached to it (in a well integrated way) which is superhuman at this skill. The part which is writing these CUDA kernels (or otherwise doing the task) is effectively weak and can't draw in a deep way on the AI's overall skills or knowledge to generalize (likely it can shallowly draw on these in a way which is similar to the overall AI providing input to the weak tool AI). Further, you could actually break out these capabilities into a separate weak model that humans can use. Humans would use this somewhat less fluently as they can't use it as quickly and smoothly due to being unable to instantaneously translate their thoughts and not being absurdly practiced at using the tool (like AIs would be), but the difference is ultimately mostly convenience and practice. 2. Integrated superhumanness: The AI is superhuman at things like writing ultra fast CUDA kernels via a mix of applying relatively general (and actually smart) abilities, having internalized a bunch of clever cognitive strategies which are applicable to CUDA kernels and sometimes to other domains, as well as domain specific knowledge and heuristics. (Similar to how humans learn.) The AI can access and flexibly apply all of the things it learned from being superhuman at CUDA kernels (or whatever skill) and with a tiny amount of training/practice it can basically transfer all these things to some other domain even if the domain is very different. The AI is at least as good at understanding and flexibly applying what it has learned as humans would be if they learned the (superhuman) skill to the same ex
silentbob*330
4
One thing that confused me about transformers is the question of when (as in, after how many layers) each embedding "flips" from representing the original token to finally representing the prediction of the next token. By now, I think the answer is simply this: each embedding represents both at the same time (and more). For instance, in GPT3 there are 12,288 embedding dimensions. At first I thought that all of them initially encode the original token, and after going through all the layers they eventually all encode the next token, and somewhere in the layers between this shift must happen. But what, upon some reflection, makes much more sense would be something very roughly like, say: * some 1000 dimensions encode the original token * some other 1000 dimensions encode the prediction of the next token * the remaining 10,288 dimensions encode information about all available context (which will start out "empty" and get filled with meaningful information through the layers). In practice, things are of course much less clean, and probably most dimensions will have some role in all these things, to different degrees, as of course all of this is learned through gradient descent and hence will be very noisy and gradual. Additionally, there's the whole positional encoding thing which is also part of the embeddings and makes clear distinctions even more difficult. But the key point remains that a single embedding encodes many things, only one of which is the prediction, and this prediction is always there from the beginning (when it's still very superficial and bad) and then, together with the rest of the embedding, gets refined more and more throughout the layers. Another misconception I had was that embedding and unembedding are very roughly symmetric operations that just "translate" from token space to embedding space and vice versa[1]. This made sense in relation to the initial & naive "embeddings represent tokens" interpretation, but with the updated view as des
Elizabeth519
3
Sometimes people deliberately fill their environment with yes-men and drive out critics. Pointing out what they're doing doesn't help, because they're doing it on purpose. However there are ways well intentioned people end up driving out critics unintentionally, and those are worth talking about. The Rise and Fall of Mars Hill Church (podcast) is about a guy who definitely drove out critics deliberately. Mark Driscoll fired people, led his church to shun them, and rearranged the legal structure of the church to consolidate power. It worked, and his power was unchecked until the entire church collapsed. Yawn. What's interesting is who he hired after the purges. As described in a later episode, his later hiring was focused on people who were executives in the secular world. These people were great at executing on tasks, but unopinionated about what their task should be. Whatever Driscoll said was what they did.  This is something a good, feedback-craving leader could have done by accident. Hiring people who are good at the tasks you want them to do is a pretty natural move. But I think the speaker is correct (alas I didn't write down his name) that this is anti-correlated at the tails- the best executors become so by not caring about what they're executing.  So if you're a leader and want to receive a healthy amount of pushback, it's not enough to hire hypercompetent people and listen when they push back. You have to select specifically for ability to push back (including both willingness, and having good opinions).

Popular Comments

The key question is whether you can find improvements which work at large scale using mostly small experiments, not whether the improvements work just as well at small scale. The 3 largest algorithmic advances discussed here (Transformer, MoE, and MQA) were all originally found at tiny scale (~1 hr on an H100 or ~1e19 FLOP[1] which is ~7 orders of magnitude smaller than current frontier training runs).[2] This paper looks at how improvements vary with scale, and finds the best improvements have returns which increase with scale. But, we care about predictability given careful analysis and scaling laws which aren't really examined. > We found that, historically, the largest algorithmic advances couldn't just be scaled up from smaller versions. They needed to have large amounts of compute to develop and validate This is false: the largest 3 advances they identify were all first developed at tiny scale. To be clear, the exact versions of these advances used in modern AIs are likely based on higher compute experiments. But, the returns from these more modern adaptations are unclear (and plausibly these adaptations could be found with small experiments using careful scaling analysis). ---------------------------------------- Separately, as far as I can tell, the experimental results in the paper shed no light on whether gains are compute-dependent (let alone predictable from small scale). Of the advances they experimentally test, only one (MQA) is identified as compute dependent. They find that MQA doesn't improve loss (at small scale). But, this isn't how MQA is supposed to help, it is supposed to improve inference efficiency which they don't test! So, these results only confirm that a bunch of innovations (RoPE, FA, LN) are in fact compute independent. Ok, so does MQA improve inference at small scale? The paper says: > At the time of its introduction in 2019, MQA was tested primarily on small models where memory constraints were not a major concern. As a result, its benefits were not immediately apparent. However, as model sizes grew, memory efficiency became increasingly important, making MQA a crucial optimization in modern LLMs Memory constraints not being a major concern at small scale doesn't mean it didn't help then (at the time, I think people didn't care as much about inference efficiency, especially decoder inference efficiency). Separately, the inference performance improvements at large scale are easily predictable with first principles analysis! The post misses all of this by saying: > MQA, then, by providing minimal benefit at small scale, but much larger benefit at larger scales —is a great example of the more-general class of a compute-dependent innovation. I think it's actually unclear if there was minimal benefit at small scale—maybe people just didn't care much about (decoder) inference efficiency at the time—and further, the inference efficiency gain at large scale is easily predictable as I noted! The post says: > compute-dependent improvements showed minimal benefit or actually hurt performance. But, as I've noted, they only empirically tested MQA and those results are unclear! The transformer is well known to be a huge improvement even at very small scale. (I'm not sure about MoE.) ---------------------------------------- FAQ: Q: Ok, but surely the fact that returns often vary with scale makes small scale experiments less useful? A: Yes, returns varying with scale would reduce predictability (all else equal), but by how much? If returns improve in a predictable way that would be totally fine. Careful science could (in principle) predict big gains at large scale despite minimal or negative gains at small scale. Q: Ok, sure, but if you actually look at modern algorithmic secrets, they are probably much less predictable from small to large scale. (Of course, we don't know that much with public knowledge.) A: Seems quite plausible! In this case, we're left with a quantitative question of how predictable things are, whether we can identify if something will be predictable, and if there are enough areas of progress which are predictable. ---------------------------------------- Everyone agrees compute is a key input, the question is just how far massively accelerated, much more capable, and vastly more prolific labor can push things. ---------------------------------------- This was also posted as a (poorly edited) tweet thread here. ---------------------------------------- 1. While 1e19 FLOP is around the scale of the final runs they included in each of these papers, these advances are pretty likely to have been initially found at (slightly) smaller scale. Like maybe 5-100x lower FLOP. The larger runs were presumably helpful for verifying the improvement, though I don't think they were clearly essential, probably you could have instead done a bunch of careful scaling analysis. ↩︎ 2. Also, it's worth noting that Transformer, MoE, and MQA are selected for being large single advances, making them unrepresentative. Large individual advances are probably typically easier to identify, making them more likely to be found earlier (and at smaller scale). We'd also expect large single improvements to be more likely to exhibit returns over a large range of different scales. But I didn't pick these examples, they were just the main examples used in the paper! ↩︎
Getting to the point where mechanical engineering is "easy to verify" seems extremely challenging to me. I used to work in manufacturing. Basically everyone I know in the field has completely valid complaints about mechanical engineers who are mostly familiar with CAD, simulations, and textbook formulas, because they design parts that ignore real world manufacturing constraints. AI that designs with simulations seems likely to produce the same result. Additionally, I would guess that today's humanoid robots are already good enough on the mechanical side, and they could become self replicating if they were just more intelligent and dextrous. One example of the sort of problem that could be difficult to simulate: I was working on a process where a robot automatically loaded parts into a CNC machine. The CNC machine produced metal chips as it removed material from the part. The chips would typically be cleared away by a stream of coolant from a mounted hose. Under certain angles of the hose, chips would accumulate in the wrong locations over the course of multiple hours, interfering with the robot's placement of the part. Even if the hoses were initially positioned correctly, they could move because someone bumped it when inspecting something or due to vibration. Simulating how chips come off the part, how coolant flow moves them in the machine, etc, requires an incredible level of fidelity in the simulation and could be potentially intractable to simulate. And this is a very constrained manufacturing task that doesn't really have to interact with the real world at all. In general, prototyping something that works is just pretty easy. The challenge is more: * How to manufacture something that will be reliable over the course of many years, even when falling, being exposed to dust and water, etc? * How to manufacture something efficiently at a good price and quality? * etc I had some discussion on AI and the physical world here: https://www.lesswrong.com/posts/r3NeiHAEWyToers4F/frontier-ai-models-still-fail-at-basic-physical-tasks-a
Strong disagree voted. To me this is analogous to saying that, given that Leonardo da Vinci tried to design a flying machine and believed this to be possible, despite not really understanding aerodynamics, that the Wright brothers believing the aeroplane they designed would fly "can't really be based on those technical details in any deep or meaningful way." "Maybe a thing smarter than humans will eventually displace us" is really not a very complicated argument, and no one is claiming it is. So it should be part of our hypothesis class, and various people like Turing thought of it well before modern ML. The "rationally grounded in a technical understanding of today’s deep learning systems" part is about how we update our probabilities of the hypotheses in our hypothesis class, and how we can comfortably say "yes, terrible outcomes still seem plausible", as they did on priors without needing to look at AI systems at all (my probability is moderately lower than it would have been without looking at AIs at all, but with massive uncertainty) Intuition and rigour agreeing is not some kind of highly suspicious gotcha
Load More

Recent Discussion

A lot of our work involves "redunds".[1] A random variable  is a(n exact) redund over two random variables  exactly when both

Conceptually, these two diagrams say that  gives exactly the same information about  as all of , and  gives exactly the same information about  as all of ; whatever information  contains about  is redundantly represented in  and . Unpacking the diagrammatic notation and simplifying a little, the diagrams say  for all  such that .

The exact redundancy conditions are too restrictive to be of much practical relevance, but we are more interested in approximate redunds. Approximate redunds are defined by approximate versions of the same two diagrams:

Unpacking the diagrammatic notation, these two diagrams say

This bounty problem is about the existence of a(n approximate) maximal redund : a redund which contains (approximately) all the information about  contained in any other (approximate) redund. Diagrammatically, a maximal redund  satisfies:

Finally, we'd...

I dunno how much it's obvious for people who want to try for bounty, but I only now realized that you can express criteria for redund as inequality with mutual information and I find mutual information to be much nicer to work with, even if from pure convenience of notation. Proof: 

Let's take criterion for redund  w.r.t. of 

expand expression for KL divergence:

expand joint distribution:

... (read more)

This is a linkpost for https://gwern.net/fiction/october

Clarity didn't work, trying mysterianism

gwern40

I appreciate everyone's comments here, they were very helpful. I've heavily revised the story to fix the issues with it, and hopefully it will be more satisfactory now.

We should probably try to understand the failure modes of the alignment schemes that AGI developers are most likely to attempt.

I still think Instruction-following AGI is easier and more likely than value aligned AGI. I’ve updated downward on the ease of IF alignment, but upward on how likely it is. IF is the de-facto current primary alignment target (see definition immediately below), and it seems likely to remain so until the first real AGIs, if we continue on the current path (e.g., AI 2027).

If this approach is doomed to fail, best to make that clear well before the first AGIs are launched. If it can work, best to analyze its likely failure points before it is tried.

Definition of IF as an alignment target

What I mean by IF...

3juggins
Do you have any quick examples of value-shaped interpretations that conflict? So perhaps the level of initiative the AI takes? E.g. a maximally initiative-taking AI might respond to 'fetch me coffee' by reshaping the Principal's life so they get better sleep and no longer want the coffee. I think my original reference to 'perfect' value understanding is maybe obscuring these tradeoffs (maybe unhelpfully), as in theory that includes knowledge of how the Principal would want interpretative conflicts managed.

Do you have any quick examples of value-shaped interpretations that conflict?

  1. Someone trying but failing to quit smoking. On one interpretation, they don't really want to smoke, smoking is some sort of mistake. On another interpretation, they do want to smoke, the quitting-related behavior is some sort of mistake (or has a social or epistemological reason).

    This example stands in for other sorts of "obvious inconsistency," biases that we don't reflectively endorse, etc. But also consider cases where humans say they don't want something but we (outside the th

... (read more)
2Farkas
I did a high-level exploration of the field a few years ago. It was rushed and optimized more for getting it out there than rigor and comprehensiveness, but hopefully still a decent starting point. I personally think you'd wanna first look at the dozens of molecules known to improve one or another aspect of cognition in diseases (e.g. Alzheimer's and schizophrenia), that were never investigated for mind enhancement in healthy adults. Given that some of these show very promising effects (and are often literally approved for cognitive enhancement in diseased populations), given that many of the best molecules we have right now were initially also just approved for some pathology (e.g. methylphenidate, amphetamine, modafinil), and given that there is no incentive for the pharmaceutical industry to conduct clinical trials on healthy people (FDA etc. do not recognize healthy enhancement as a valid indication), there seems to even be a sort of overhang of promising molecule candidates that were just never rigorously tested for healthy adult cognitive enhancement. https://forum.effectivealtruism.org/posts/hGY3eErGzEef7Ck64/mind-enhancement-cause-exploration Appendix C includes a list of 'almost deployable' candidates: Amantadine, Amisulpride, Amphetamine (incl. dexamphetamine, levoamphetamine) Aripiprazole. Armodafinil, Atomoxetine, Brexiprazole, Bupropion Carbidopa-levodopa, Clonidine, Desvenlaflaxine, Donepezil, Duloxetine, Entacapone, Folic acid, Galantamine, Gingko Biloba, Guanfacine, Istradefylline, Ketamine, Lisdexamphetamine, Memantine, Methamphetamine, Methylphenidate (incl. dexmethylphenidate), Modafinil, Opicapone, Piracetam, Pitolisant, Pramipexole, Rasagiline, Reboxetine, Rivastigmine, Ropinirole, Rotigotine, Safinamide, Selegiline, Sodium oxybate, Tacrine, Tolcapone, Venlafaxine, Viloxazine, Vortioxetine
2Mateusz Bagiński
Perhaps you misread the OP as saying "small molecules" rather than "small set of molecules".
dr_s20

Fair, though generally I conflated them because if your molecules aren't small, due to sheer combinatorics the set of the possible candidates becomes exponentially massive. And then the question is "ok but where are we supposed to look, and by which criterion?".

2TsviBT
Thanks. One of the first places I'd look would be hormones, which IIUC don't count as small molecules? Though maybe natural hormones have already been tried? But I'd wonder about more obscure or risky ones, e.g. ones normally only active in children.

tl;dr:

From my current understanding, one of the following two things should be happening and I would like to understand why it doesn’t:

Either

  1. Everyone in AI Safety who thinks slowing down AI is currently broadly a good idea should publicly support PauseAI.

    Or

  2. If pausing AI is much more popular than the organization PauseAI, that is a problem that should be addressed in some way.

 

Pausing AI

There does not seem to be a legible path to prevent possible existential risks from AI without slowing down its current progress.

 

I am aware that many people interested in AI Safety do not want to prevent AGI from being built EVER, mostly based on transhumanist or longtermist reasoning.

Many people in AI Safety seem to be on board with the goal of “pausing AI”, including, for example,...

Obviously P(doom | no slowdown) < 1.

This is not obvious. My P(doom|no slowdown) is like 0.95-0.97, the difference from 1 being essentially "maybe I am crazy or am missing something vital when making the following argument".

Instrumental convergence suggests that the vast majority of possible AGI will be hostile. No slowdown means that neural-net ASI will be instantiated. To get ~doom from this, you need some way to solve the problem of "what does this code do when run" with extreme accuracy in order to only instantiate non-hostile neural-net ASI (you nee... (read more)

Epistemic status: I feel that naming this axis deconfuses me about agent foundations about as much as writing the rest of this sequence so far - so it is worth a post even though I have less to say about it. 

I think my goal in studying agent foundations is a little atypical. I am usually trying to build an abstract model of superintelligent agents and make safety claims based on that model.

For instance, AIXI models a very intelligent agent pursuing a reward signal, and allows us to conclude that it probably seizes control of the reward mechanism by default. This is nice because it makes our assumptions fairly explicit. AIXI has epistemic uncertainty but no computational bounds, which seems like a roughly appropriate model for agents much...

I meant what I said at a higher level of abstraction - optimization pressure may destroy leaky abstractions. I don’t think value learning immediately solves this. 

2Vanessa Kosoy
Btw, what are some ways we can incorporate heuristics into our algorithm while staying on level 1-2? 1. We don't know how the prove to required desiderata about the heuristic, but we can still reasonably conjecture them and support the conjectures with empirical tests. 2. We can't prove or even conjecture anything useful-in-itself about the heuristic, but the way the heuristic is incorporated into the overall algorithm makes it safe. For example, maybe the heuristic produces suggestions together with formal certificates of their validity. More generally, we can imagine an oracle-machine (where the heuristic is slotted into the oracle) about which we cannot necessarily prove something like a regret bound w.r.t. the optimal policy, but we can prove (or at least conjecture) a regret bound w.r.t. some fixed simple reference policy. That is, the safety guarantee shows that no matter what the oracle does, the overall system is not worse than "doing nothing". Maybe, modulo weak provable assumptions about the oracle, e.g. that it satisfies a particular computational complexity bound. 3. [Epistemic status: very fresh idea, quite speculative but intriguing.] We can't find even a guarantee like a above for a worst-case computationally bounded oracle. However, we can prove (or at least conjecture) some kind of an "average-case" guarantee. For example, maybe we have high probability of safety for a random oracle. However, assuming a uniformly random oracle is quite weak. More optimistically, maybe we can prove safety even for any oracle that is pseudorandom against some complexity class C1 (where we want C1 to be as small as possible). Even better, maybe we can prove safety for any oracle in some complexity class C2 (where we want C2 to be as large as possible) that has access to another oracle which is pseudorandom against C1. If our heuristic is not actually in this category (in particular, C2 is smaller than P and our heuristic doesn't lie in C2), this doesn't formally gu
2Cole Wyeth
I agree with this - and it seems unrealistic to me that we will do better than 3 or 4. 
2Vanessa Kosoy
Hold my beer ;)
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Confidence notes: I am a physicist working on computational material science, so I have some familiarity with the field, but don't know much about R&D firms or economics. Some of the links in this article were gathered from a post at pivot-to-ai.com and the BS detector

The paper "Artificial Intelligence, Scientific Discovery, and Product Innovation" was published as an Arxiv preprint last December, roughly 5 months ago, and was submitted to a top economics journal. 

 The paper claimed to show the effect of an experiment at a large R&D company. It claimed the productivity of a thousand material scientists was tracked before and after the introduction of an machine learning material generation tool. The headline results was that the AI caused a 44% increase in materials discovery at the...

Rasool10

Also picked up here

Our universe is probably a computer simulation created by a paperclip maximizer to map the spectrum of rival resource‑grabbers it may encounter while expanding through the cosmos. The purpose of this simulation is to see what kind of ASI (artificial superintelligence) we humans end up creating. The paperclip maximizer likely runs a vast ensemble of biology‑to‑ASI simulations, sampling the superintelligences that evolved life tends to produce. Because the paperclip maximizer seeks to reserve maximum resources for its primary goal (which despite the name almost certainly isn’t paperclip production) while still creating many simulations, it likely reduces compute costs by trimming fidelity: most cosmic details and human history are probably fake, and many apparent people could be non‑conscious entities.  Arguments in support of this thesis include:

  1. The space of
...
dirk10

I don't think this post makes compelling arguments for its premises. Downvoted.

3Sausage Vector Machine
I don't think this scenario is likely. Except for degenerate cases, an ASI would have to continue to grow and evolve well beyond the point at which a simulation would need to stop, to avoid consuming an inordinate amount of resources. And, to take an analogy, studying human psychology based on prokaryotic life forms that will someday evolve into humans seems inefficient. If I were preparing for a war with an unknown superintelligent opponent, I would probably be better off building weapons and studying (super)advanced game theory. Which ideas seem slightly more likely to me? * Our universe might be a breeding ground for new minds needed for some autonomous tasks. // Suffering could even be one of the main points of the required training. * We could be a brainstorming vat or a focus group for a new product. The product could be anything, but for it to make sense to be marketed to superintelligent beings, it might be something from the experiential line, like the sense of hearing or the qualia of dawn. // Suffering might be duly compensated per corporate policy. * This is a correctional facility. I can't help but make a silly reference to the TV series Hard Time on Planet Earth, but a more appropriate reference would be Dante's Inferno. // Suffering is the whole point. * This could be an attempt by the final universal Omega Point mind to reflect on its origins during the last few millennia before the heat death of the universe, in order to get some insight into why things went the way they did and not otherwise, and whether things could have been different. It could also be an attempt at gaining redemption for real or perceived mistakes. // All the suffering is real, but the sufferer is the Omega Point mind itself. Although I think these are slightly more likely than the proposed hypothesis, they are still not very likely. However, it seems logical that there should be many more simulated worlds than real ones. So I believe it is reasonable to think about some
3James_Miller
An implicit assumption (which should have been made explicit) of the post is that the cost per simulation is tiny. This is like in WW II where the US would send a long-range bomber to take photos of Japan. I agree with your last paragraph and I think it gets to what is consciousness. Is the program's existence enough to generate consciousness, or does the program have to run to create conscious observers?

Background 1: Preferences-over-future-states (a.k.a. consequentialism) vs Preferences-over-trajectories other kinds of preferences

(Note: The original version of this post said "preferences over trajectories" all over the place. Commenters were confused about what I meant by that, so I have switched the terminology to "any other kind of preference" which is hopefully clearer.)

The post Coherent decisions imply consistent utilities (Eliezer Yudkowsky, 2017) explains how, if an agent has preferences over future states of the world, they should act like a utility-maximizer (with utility function defined over future states of the world). If they don’t act that way, they will be less effective at satisfying their own preferences; they would be “leaving money on the table” by their own reckoning. And there are externally-visible signs of agents being suboptimal in that...

I think I like the thing I wrote here:

To be more concrete, if I’m deciding between two possible courses of action, A and B, “preference over future states” would make the decision based on the state of the world after I finish the course of action—or more centrally, long after I finish the course of action. By contrast, “other kinds of preferences” would allow the decision to depend on anything, even including what happens during the course-of-action.

By “world” I mean “reality” more broadly, possibly including the multiverse or whatever the agent cares abo... (read more)