User Comment Replies

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?

dx266mo10

I recommend https://agentfoundations.study/, and much of https://www.aisafety.com/stay-informed,

Currently these two links include the commas so they redirect to 404 pages

2plex6mo

Oh, yup, thanks, fixed.

dx26's Shortform

dx267mo84

We (or at least a majority of humans) do still have inner desires to have kids, though; they just get balanced out by other considerations, mostly creature comforts/not wanting to deal with the hassle of kids. But yeah, evolution did not foresee birth control, so that's a substantial misgeneralization.

We are still a very successful species overall according to IGF, but birth rates continue to decline, which is why I made my last point about inner alignment possibly drifting farther and farther away the stronger the inner optimizer (e.g. human culture) becomes.

dx26's Shortform

dx267mo43

I saw that Katja Grace has said something similar here; I'm just putting my own spin on the idea.

The relevance of the evolutionary analogy for inner alignment has been long discussed in this community, but one observation that seems to not be mentioned is that humans are still... pretty good at inclusive genetic fitness? Even in way-out-of-distribution environments like modern society, we still have strong desires to eat food, stay alive, find mates and reproduce (although the last one has relatively decreased recently; IGF hasn't totally generalized). We ... (read more)

3JBlack7mo

I think one argument is that optimizing for IGF basically gives humans two jobs: survive, and have kids. Animal skulls are evidence that the "survive" part can be difficult. We've nailed that one, though. Very few humans in developed countries die before reaching an age suitable for having kids. I doubt that there are any other animal species that come close to us in that metric. Almost all of us have "don't die" ingrained pretty deeply. It's looking like we are moving toward failing pretty heavily on the second "have kids" job though, and you would think that would be the easier one. So if there's a 50% failure rate on preserving outer optimizer values within the inner optimizer, that's actually pretty terrible.

Can someone, anyone, make superintelligence a more concrete concept?

dx267mo21

The thing is, there exists lots of popular movies about rogue AIs taking over the world -- 2001, Terminator, etc etc -- so the concept should already exist in popular culture. The roadblocks seem to be:

The threat somehow doesn't seem as tangible or threatening as, for example, ISIS developing a bioweapon or the CCP permanently dominating the world. One explanation is that the reference class for "enemy does bad things with new technology" or other near-term threat models contains lots of examples throughout history, whereas "species smarter than humans"

dx269mo41

In this case, the starving person presumably has to press the button or else starve to death, and thus has no bargaining power. The other person only has to offer the bare minimum beyond what the starving person needs to survive, and the starving person must take the deal. In Econ 101 (assuming away monopolies, information asymmetry, etc.), exploited workers do have bargaining power by being able to work for other companies, hence why companies can’t just do stupid, spiteful actions in the long term.

2Viliam9mo

I guess this is the part where the real life often differs from the simplified model. Information asymmetry seems to be the norm. In small cities, big companies can become local monopolies on providing job opportunities of a certain kind (yes, you could choose to work for a different company, but that could be 3 extra hours of commute). Companies providing rare kinds of job opportunities love to have NDAs, non-poaching agreements, etc.

Coherence of Caches and Agents

dx261y32

It might be relevant to note that the meaningfulness of this coherence definition depends on the chosen environment. For instance, in an deterministic forest MDP where an agent at a state $s$ can never return to $s$ for any $s$ and there is only one path between any two states, suppose we have a deterministic policy $π$ and let $s_{1} = π (s)$ , $s_{2} = π (s_{1})$ , etc. Then for the zero-current-payoff Bellman equations, we only need that $V (s_{1}) > V (s^{'})$ for any successor $s^{'}$ from $s$ , $V (s_{2}) >$ ... (read more)

Measuring Coherence of Policies in Toy Environments

dx262y10

Right, I think this somewhat corresponds to the "how long it takes a policy to reach a stable loop" (the "distance to loop" metric), which we used in our experiments.

What did you use your coherence definition for?

3Garrett Baker2y

They are related, but time-to-loop fails when there are many loops a random policy is likely to access. For example, if a “do nothing” action is the default, your agent will immediately enter a loop, but the sum of the absolute values of the real parts of the eigenvales will be very high (the number of states in the environment).

4Garrett Baker2y

Its a long story, but I wanted to see what the functional landscape of coherence looked like for goal misgeneralizing RL environments after doing essential dynamics. Results forthcoming.

LESSWRONG
LW

All of dx26's Comments + Replies