nealeratzlaff — LessWrong

LESSWRONG
LW

Replying toCollider bias as a cognitive blindspot?

The difficulty of correctly reasoning with probabilities reminds of something Geoff Hinton said about working in high dimensional space (paraphrasing): "when we try to imagine high dimensions, we all just imagine a 3D surface and say 'N dimensions' really loud in our heads". I have a habit of trying to use probabilities whenever I'm trying to reason about something, but I'm becoming increasingly sure that my Bayes net (or causal graph) is badly wired with wrong probabilities everywhere.

I see quite a few papers on PubMed discussing collider bias with regard to obesity-associated health risks. The effect is probably in full swing with covid research, unfortunately.

Avoiding Side Effects in Complex Environments

TurnTrout

TurnTrout, nealeratzlaff

Previously: Attainable Utility Preservation: Empirical Results; summarized in AN #105

Our most recent AUP paper was accepted to NeurIPS 2020 as a spotlight presentation:

Reward function specification can be difficult, even in simple environments. Rewarding the agent for making a widget may be easy, but penalizing the multitude of possible negative side effects is hard. In toy environments, Attainable Utility Preservation (AUP) avoided side effects by penalizing shifts in the ability to achieve randomly generated goals. We scale this approach to large, randomly generated environments based on Conway’s Game of Life. By preserving optimal value for a single randomly generated reward function, AUP incurs modest overhead while leading the agent to complete the specified

... (read 553 more words →)

Attainable Utility Preservation: Empirical Results

TurnTrout

TurnTrout, nealeratzlaff

Reframing Impact has focused on supplying the right intuitions and framing. Now we can see how these intuitions about power and the AU landscape both predict and explain AUP's empirical success thus far.

Conservative Agency in Gridworlds

Let's start with the known and the easy: avoiding side effects^[1] in the small AI safety gridworlds (for the full writeup on these experiments, see Conservative Agency). The point isn't to get too into the weeds, but rather to see how the weeds still add up to the normalcy predicted by our AU landscape reasoning.

In the following MDP levels, the agent can move in the cardinal directions or do nothing ( $\emptyset$ ). We give the agent a reward... (read 2772 more words →)