User Comment Replies

You don't know how bad most things are nor precisely how they're bad.

Coding20778mo70

This was very satisfying for me to read!

Not only did I find this story a very convincing example for the point this articles is trying to make ("You don't know how bad most things are nor precisely how they're bad." and, related "Reality has a surprising amount of detail.").

But the writing was great as well! The fact that you were not a complete novice, but someone who tried to follow along with the piano tuner's every step, and failed to predict every next little problem that the piano tuner identified made for a great reading experience for me. It evoked... (read more)

Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)

Coding20779mo10

I found your reply really interesting.

Because I find it so interesting and want to understand it: What does the "RLed" in "Unfortunately it seems to me that humans are RLed pretty hard by doing a lot of playing of these games" mean? That term is not familiar to me.

2the gears to ascension9mo

Like seth said, I just mean reinforcement learning. Described in more typical language, people take their feelings of success from whether they're winning at the player-vs-environment and player-vs-player contests one encounters in everyday life; opportunities to change what contests are possible are unfamiliar. I also think there are decision theory issues[1] humans have. and then of course people do in fact have different preferences and moral values. but even among people where neither issue is in play, I think people have pretty bad self-misalignment as a result of taking what-feels-good-to-succeed-at feedback from circumstances that train them into habits that work well in the original context, and which typically badly fail to produce useful behavior in contexts like "you can massively change things for the better". Being prepared for unreasonable success is a common phrase referring to this issue, I think. [1] in case this is useful context: a decision theory is a small mathematical expression which roughly expresses "what part of past, present, and future do you see as you-which-decides-together", or stated slightly more technically, what's the expression that defines how you consider counterfactuals when evaluating possible actions you "could [have] take[n]"; I'm pretty sure humans have some native one, and it's not exactly any of the ones that are typically discussed but rather some thing vaguely in the direction of active inference, though people vary between approximating the typically discussed ones. The commonly discussed ones around these parts are stuff like EDT/CDT/LDTs { FDT, UDT, LIDT, ... }

2Seth Herd9mo

Reinforcement learning.

Commonsense Good, Creative Good

Coding20772y10

Thanks for the explanation. This makes a lot of sense to me now. I'm glad I asked!

While I agree that there is value in "don't tie yourself up in knots overthinking", my intuition tells me that there is a lot of value in just knowing about / considering that there is more information about a situation to be had, which might, in theory, influence my decision about that situation in important ways. It changes how I engange with all kinds of situations beforehand, and also after the fact. So considering the motivations and backstories of the people... (read more)

Commonsense Good, Creative Good

Coding20772y30

This sounds intuitively interesting to me.

Can you maybe give an example or two (or one example and one counter example) to help illustrate how a moral principle displaying "robustness to auxiliary information" operates in practice, versus one that does not? Specifically, I'm interested in understanding how the variance in outcomes might manifest with the addition of new information.

4rpglover642y

Let's consider the trolley problem. One consequentialist solution is "whichever choice leads to the best utility over the lifetime of the universe", which is intractable. This meta-principle rules it out as follows: if, for example, you learned that one of the 5 was on the brink of starting a nuclear war and the lone one was on the brink of curing aging, that would say switch, but if the two identities were flipped, it would say stay, and generally, there are too many unobservables to consider. By contrast, a simple utilitarian approach of "always switch" is allowed by the principle, as are approaches that take into account demographics or personal importance. The principle also suggests that killing a random person on the street is bad, even if the person turns out to be plotting a mass murder, and conversely, a doctor saving said person's life is good. Two additional cases where the principle may be useful and doesn't completely correspond to common sense: * I once read an article by a former vegan arguing against veganism and vegetarianism; one example was the fact that the act of harvesting grain involves many painful deaths of field mice, and that's not particularly better than killing one cow. Applying the principle, this suggests that suffering or indirect death cannot straightforwardly be the basis for these dietary choices, and that consent is on shaky ground. * When thinking about building a tool (like the LW infrastructure) that could be either hugely positive (because it leads to aligned AI) or hugely negative (because it leads to unaligned AI by increasing AI discussions), and there isn't really a way to know which, you are morally free to build it or not; any steps you take to increase the likelihood of a positive outcome are good, but you are not required to stop building the tool due to a huge unknowable risk. Of course, if there's compelling reason to believe that the tool is net-negative, that reduces the variance and suggests that you shouldn

LESSWRONG
LW

All of Coding2077's Comments + Replies