Yesterday evening, I pasted to two IRC channels an excerpt of what someone had written. In the context of the original text, that excerpt had seemed to me like harmless if somewhat raunchy humor. What I didn't realize at the time was that by removing the context, the person writing it came off looking like a jerk, and by laughing at it I came off looking as something of a jerk as well.
Two people, both of whom I have known for many years now and whose opinions I value, approached me by private message and pointed out that that may not have been the smartest thing to do. My initial reaction was defensive, but I soon realized that they were right and thanked them for pointing it out to me. Putting on a positive growth mindset, I decided to treat this event as a positive one, as in the future I'd know better.
Later that evening, as I lay in bed waiting to fall asleep, the episode replayed itself in my mind. I learnt long ago that trying to push such replays out of my mind would just make them take longer and make them feel worse. So I settled back to just observing the replay and waiting for it to go away. As I waited, I started thinking about what kind of lower-level neural process this feeling might be a sign of.
Artificial neural networks use what is called a backpropagation algorithm to learn from mistakes. First the network is provided some input, then it computes some value, and then the obtained value is compared to the expected value. The difference between the obtained and expected value is the error, which is then propagated back from the end of the network to the input layer. As the error signal works it way through the network, neural weights are adjusted in such a fashion to produce a different output the next time.
Backprop is known to be biologically unrealistic, but there are more realistic algorithms that work in a roughly similar manner. The human brain seems to be using something called temporal difference learning. As Roko described it: "Your brain propagates the psychological pain 'back to the earliest reliable stimulus for the punishment'. If you fail or are punished sufficiently many times in some problem area, and acting in that area is always preceeded by [doing something], your brain will propagate the psychological pain right back to the moment you first begin to [do that something]".
As I lay there in bed, I couldn't help the feeling that something similar to those two algorithms was going on. The main thing that kept repeating itself was not the actual action of pasting the quote to the channel or laughing about it, but the admonishments from my friends. Being independently rebuked for something by two people I considered important: a powerful error signal that had to be taken into account. Their reactions filling my mind: an attempt to re-set the network to the state it was in soon after the event. The uncomfortable feeling of thinking about that: negative affect flooding the network as it was in that state, acting as a signal to re-adjust the neural weights that had caused that kind of an outcome.
After those feelings had passed, I thought about the episode again. Now I felt silly for committing that faux pas, for now it felt obvious that the quote would come across badly. For a moment I wondered if I had just been unusually tired, or distracted, or otherwise out of my normal mode of thought to not have seen that. But then it occurred to me - the judgment of this being obviously a bad idea was produced by the network that had just been rewired in response to social feedback. The pain of the feedback had been propagated back to the action that caused it, so just thinking about doing that (or thinking about having done that) made me feel stupid. I have no way of knowing whether the "don't do that, idiot" judgment is something that would actually have been produced had I been paying more attention, or if it's a genuinely new judgment that wouldn't have been produced by the old network.
I tend to be somewhat amused by the people who go about claiming that computers can never be truly intelligent, because a computer doesn't genuinely understand the information it's processing. I think they're vastly overestimating how smart we are, and that a lot of our thinking is just relatively crude pattern-matching, with various patterns (including behavioral ones) being labeled as good or bad after the fact, as we try out various things.
On the other hand, there would probably have been one way to avoid that incident. We do have the capacity for reflective thought, which allows us to simulate various events in our heads without needing to actually undergo them. Had I actually imagined the various ways in which people could interpret that quote, I would probably have relatively quickly reached the conclusion that yes, it might easily be taken as jerk-ish. Simply imagining that reaction might then have provided the decision-making network with a similar, albeit weaker, error signal and taught it not to do that.
However, there's the question of combinatorial explosions: any decision could potentially have countless of consequences, and we can't simulate them all. (See the epistemological frame problem.) So in the end, knowing the answer to the question of "which actions are such that we should pause to reflect upon their potential consequences" is something we need to learn by trial and error as well.
So I guess the lesson here is that you shouldn't blame yourself too much if you've done something that feels obviously wrong in retrospect. That decision was made by an earlier version of you. Although it feels obvious now, that version of you might literally have had no way of knowing that it was making a mistake, as it hadn't been properly trained yet.
Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.