Maxwell - LessWrong

The Emperor's New Clothes: a story of motivated stupidity

Since this comment is being upvoted, I have to ask, how would being autistic affect your decision-making in that situation?

I think (incorrectly?) that everyone, except maybe children and drunk people, would remain quiet, and would either get angry or not depending on what they care about and models of the situation that vary from person to person.

I mean, think of everything that would need to go wrong in order to scream “The emperor is naked!”:

They would need to be certain about what is going through the emperor’s mind. It seems more likely that the emperor is drunk than him being a nudist. You don’t know how a drunk person with so much power would react, so the best action is not to speak.

Even if they thought it was safe, it is obvious to everyone that the emperor is naked, screaming would not give any new information.

And even it they thought that it would be informative, how do they expect it to lead to good outcomes? Because it would increase their social status or decrease the emperor’s? That seems like something that autism would make less likely.

And after hearing him say that he thinks he is clothed, they might get angry if they care a lot about not lowering the emperor’s status, or they might pretend to be angry so that the child doesn’t put himself in potential danger again or to receive whatever benefits come from siding with the man in power. Everyone else would just find the situation amusing.

Discussion with Eliezer Yudkowsky on AGI interventions

Maxwell3y30

Yes, I’m aware of that, I tried to find a better proof but failed. Attempts based on trying to compute the maximum possible change (instead of figuring out how to get a desired change) are doomed. Changing the last bit isn’t an infinitesimal change, so using calculus to compute the maximum derivative won’t work. EfficientNets use swish activations, not ReLUs, which aren’t locally linear, so we will have to deal with the chaotic dynamics that show up whenever non-linear functions are iteratively applied. The sigmoid inside the swish does eventually saturate because of floating-point arithmetic, making it effectively locally linear, but then we have to deal with the details of the exact binary representations.

There might be another solution, though: the softmax is the n-input generalization of the sigmoid, so it can also saturate to exact 0s and 1s. We could try to overfit a network using a very high learning rate so that for one randomly generated but fixed image it predicts some class with 100% certainty and given the same image with some pixel changed it predicts another class with also 100% certainty. Then, if this works, we could try training it on MNIST, adding the two original images to every mini-batch with a loss multiplier much greater than for the regular samples. That way the answer to your question becomes 1.

If nobody comes up with anything better, when I get some free time I will try to implement the binarization approach and then I will send you the code.

Discussion with Eliezer Yudkowsky on AGI interventions

Maxwell3y162

That might actually be easy to prove with some effort (or it might not), consider this strategy:

Let’s assume that the input to the system are PNG images with 8-bit values between 0 and 255, that are converted into floating-point tensors before entering the net, and that the bits you are talking about are those of the original images. And lets also assume that the upscaling from the small MNIST images to the input of the net is such that each float of the tensor corresponds to exactly one value in the original image (that is, there is no interpolation). And that we are using the exact architecture of table 1 of the EfficientNet paper (https://arxiv.org/pdf/1905.11946v5.pdf).

Then we might be able to train the network freezing the weights of the first Conv3x3 to the identity (1 in the centre of the kernel and 0s around it) so that the 32 channels of the next layer receive 2x downscaled copies of the image. We also remove the first batch norm layer, it will be added later. Losing the ability to perform useful computations in this layer is probably not enough to make it perform below 99% accuracy given that MNIST is so simple.

If this works, then we need to edit the weights of the first Conv layer and of the batch norm layer (if they can be called weights) to (assuming float32):

Central value of kernel: 2**-149

Division value in batch norm: 2**-49

Multiplication value of the batch norm: 2**100

All other values to the neutral element of their operations

(check https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html)

This will turn all values of 127 (in the PNG) and below into exactly 0, and values above into exactly 1 because of numerical underflow (I checked this with numpy). The network will process this from the swish activation onwards exactly the same way it would process the binarized versions of the images, and in all likelihood give them the same class.

127 in base 2 is 01111111, and 128 is 10000000. You can’t go from a perceived black pixel to a perceived white pixel by only changing the last bit, so the maximum difference in class probability that you can obtain by only changing the least significant bit is zero.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments