Zvi analyzes Michael Lewis' book "Going Infinite" about Sam Bankman-Fried and FTX. He argues the book provides clear evidence of SBF's fraudulent behavior, despite Lewis seeming not to fully realize it. Zvi sees SBF as a cautionary tale about the dangers of pursuing maximalist goals without ethical grounding.
A lot of our work involves "redunds".[1] A random variable is a(n exact) redund over two random variables exactly when both
Conceptually, these two diagrams say that gives exactly the same information about as all of , and gives exactly the same information about as all of ; whatever information contains about is redundantly represented in and . Unpacking the diagrammatic notation and simplifying a little, the diagrams say for all such that .
The exact redundancy conditions are too restrictive to be of much practical relevance, but we are more interested in approximate redunds. Approximate redunds are defined by approximate versions of the same two diagrams:
Unpacking the diagrammatic notation, these two diagrams say
This bounty problem is about the existence of a(n approximate) maximal redund : a redund which contains (approximately) all the information about contained in any other (approximate) redund. Diagrammatically, a maximal redund satisfies:
Finally, we'd...
I dunno how much it's obvious for people who want to try for bounty, but I only now realized that you can express criteria for redund as inequality with mutual information and I find mutual information to be much nicer to work with, even if from pure convenience of notation. Proof:
Let's take criterion for redund w.r.t. of ,
expand expression for KL divergence:
expand joint distribution:
...
Clarity didn't work, trying mysterianism
I appreciate everyone's comments here, they were very helpful. I've heavily revised the story to fix the issues with it, and hopefully it will be more satisfactory now.
We should probably try to understand the failure modes of the alignment schemes that AGI developers are most likely to attempt.
I still think Instruction-following AGI is easier and more likely than value aligned AGI. I’ve updated downward on the ease of IF alignment, but upward on how likely it is. IF is the de-facto current primary alignment target (see definition immediately below), and it seems likely to remain so until the first real AGIs, if we continue on the current path (e.g., AI 2027).
If this approach is doomed to fail, best to make that clear well before the first AGIs are launched. If it can work, best to analyze its likely failure points before it is tried.
What I mean by IF...
Do you have any quick examples of value-shaped interpretations that conflict?
Someone trying but failing to quit smoking. On one interpretation, they don't really want to smoke, smoking is some sort of mistake. On another interpretation, they do want to smoke, the quitting-related behavior is some sort of mistake (or has a social or epistemological reason).
This example stands in for other sorts of "obvious inconsistency," biases that we don't reflectively endorse, etc. But also consider cases where humans say they don't want something but we (outside the th
Fair, though generally I conflated them because if your molecules aren't small, due to sheer combinatorics the set of the possible candidates becomes exponentially massive. And then the question is "ok but where are we supposed to look, and by which criterion?".
tl;dr:
From my current understanding, one of the following two things should be happening and I would like to understand why it doesn’t:
Either
Everyone in AI Safety who thinks slowing down AI is currently broadly a good idea should publicly support PauseAI.
Or
There does not seem to be a legible path to prevent possible existential risks from AI without slowing down its current progress.
I am aware that many people interested in AI Safety do not want to prevent AGI from being built EVER, mostly based on transhumanist or longtermist reasoning.
Many people in AI Safety seem to be on board with the goal of “pausing AI”, including, for example,...
Obviously P(doom | no slowdown) < 1.
This is not obvious. My P(doom|no slowdown) is like 0.95-0.97, the difference from 1 being essentially "maybe I am crazy or am missing something vital when making the following argument".
Instrumental convergence suggests that the vast majority of possible AGI will be hostile. No slowdown means that neural-net ASI will be instantiated. To get ~doom from this, you need some way to solve the problem of "what does this code do when run" with extreme accuracy in order to only instantiate non-hostile neural-net ASI (you nee...
Epistemic status: I feel that naming this axis deconfuses me about agent foundations about as much as writing the rest of this sequence so far - so it is worth a post even though I have less to say about it.
I think my goal in studying agent foundations is a little atypical. I am usually trying to build an abstract model of superintelligent agents and make safety claims based on that model.
For instance, AIXI models a very intelligent agent pursuing a reward signal, and allows us to conclude that it probably seizes control of the reward mechanism by default. This is nice because it makes our assumptions fairly explicit. AIXI has epistemic uncertainty but no computational bounds, which seems like a roughly appropriate model for agents much...
I meant what I said at a higher level of abstraction - optimization pressure may destroy leaky abstractions. I don’t think value learning immediately solves this.
Confidence notes: I am a physicist working on computational material science, so I have some familiarity with the field, but don't know much about R&D firms or economics. Some of the links in this article were gathered from a post at pivot-to-ai.com and the BS detector.
The paper "Artificial Intelligence, Scientific Discovery, and Product Innovation" was published as an Arxiv preprint last December, roughly 5 months ago, and was submitted to a top economics journal.
The paper claimed to show the effect of an experiment at a large R&D company. It claimed the productivity of a thousand material scientists was tracked before and after the introduction of an machine learning material generation tool. The headline results was that the AI caused a 44% increase in materials discovery at the...
Our universe is probably a computer simulation created by a paperclip maximizer to map the spectrum of rival resource‑grabbers it may encounter while expanding through the cosmos. The purpose of this simulation is to see what kind of ASI (artificial superintelligence) we humans end up creating. The paperclip maximizer likely runs a vast ensemble of biology‑to‑ASI simulations, sampling the superintelligences that evolved life tends to produce. Because the paperclip maximizer seeks to reserve maximum resources for its primary goal (which despite the name almost certainly isn’t paperclip production) while still creating many simulations, it likely reduces compute costs by trimming fidelity: most cosmic details and human history are probably fake, and many apparent people could be non‑conscious entities. Arguments in support of this thesis include:
I don't think this post makes compelling arguments for its premises. Downvoted.
(Note: The original version of this post said "preferences over trajectories" all over the place. Commenters were confused about what I meant by that, so I have switched the terminology to "any other kind of preference" which is hopefully clearer.)
The post Coherent decisions imply consistent utilities (Eliezer Yudkowsky, 2017) explains how, if an agent has preferences over future states of the world, they should act like a utility-maximizer (with utility function defined over future states of the world). If they don’t act that way, they will be less effective at satisfying their own preferences; they would be “leaving money on the table” by their own reckoning. And there are externally-visible signs of agents being suboptimal in that...
I think I like the thing I wrote here:
To be more concrete, if I’m deciding between two possible courses of action, A and B, “preference over future states” would make the decision based on the state of the world after I finish the course of action—or more centrally, long after I finish the course of action. By contrast, “other kinds of preferences” would allow the decision to depend on anything, even including what happens during the course-of-action.
By “world” I mean “reality” more broadly, possibly including the multiverse or whatever the agent cares abo...