[Epistemic status: I assign a 70% chance that this model proves to be useful, 30% chance it describes things we are already trying to do to a large degree, and won't cause us to update much.] 

I'm going to talk about something that's a little weird, because it uses some results from some very recent ML theory to make a metaphor about something seemingly entirely unrelated - norms surrounding discourse. 

I'm also going to reach some conclusions that surprised me when I finally obtained them, because it caused me to update on a few things that I had previously been fairly confident about. This argument basically concludes that we should adopt fairly strict speech norms, and that there could be great benefit to moderating our discourse well. 

I argue that in fact, discourse can be considered an optimization process and can be thought of in the same way that we think of optimizing a large function. As I will argue, thinking of it in this way will allow us to make a very specific set of norms that are easy to think about and easy to enforce. It is partly a proposal for how to solve the problem of dealing with speech that is considered hostile, low-quality, or otherwise harmful. But most importantly, it is a proposal for how to ensure that the discussion always moves in the right direction: Towards better solutions and more accurate models. 

It will also help us avoid something I'm referring to as "mode collapse" (where new ideas generated are non-diverse and are typically characterized by adding more and more details to ideas that have already been tested extensively). It's also highly related to the concepts discussed in the Death Spirals and the Cult Attractor portion of the Sequences. Ideally, we'd like to be able to make sure that we're exploring as much of the hypothesis space as possible, and there's good reason to believe we're probably not doing this very well.  

The challenge: Making sure we're searching for the global optimum in model-space sometimes requires reaching out blindly into the frontiers, the not well-explored regions, which runs the risk of ending up somewhere very low-quality or dangerous. There are also sometimes large gaps between very different regions of model-space where the quality of the model is very low in-between, but very high on each side of the gap. This requires traversing through potentially dangerous territory and being able to survive the whole way through.

(I'll be using terms like "models" and "hypotheses" quite often, and I hope this isn't confusing. I am using them very broadly, to refer to both theoretical understandings of phenomenon and blueprints for practical implementations of ideas). 

We desire to have a set of principles which allows us to do this safely - to think about models of the world that are new and untested, solutions for solving problems that have never been done in a similar way - and they should ensure that, eventually, we can reach the global optimum. 

Before we derive that set of principles, I am going to introduce a topic of interest from the field of Machine Learning. This topic will serve as the main analogy for the rest of this piece, and serve as a model for how the dynamics of discourse should work in the ideal case. 

I. The Analogy: Generative Adversarial Networks

For those of you who are not familiar with the recent developments in deep-learning, Generative Adversarial Networks (GANs)[intro pdf here] are a new type of generative model class that are ideal for producing high-quality samples from very high-dimensional, complex distributions. They have caused great buzz and hype in the deep-learning community due to how impressive some of the samples they produce are, and how efficient they are at generation.

Put simply, a generator model and a critic (sometimes called a discriminator) model perform a two-player game where the critic is trained to distinguish between samples produced by the generator and the "true" samples taken from the data distribution. In turn, the generator is trained to maximize the critic's loss function. Both models are usually parametrized by deep neural networks and can be trained by taking turns running a gradient descent step on each. The Nash equilibrium of this game is when the generator's distribution matches that of the data distribution perfectly. This is never really borne out in practice, but sometimes it gets so close that we don't mind. 

GANs have one principal failure mode, which is often thought to be due to the instability of the system, which is often called "mode collapse" (a term I'm going to appropriate to refer to a much broader concept). It was often believed that, if a careful balance between the generator and critic could not be maintained, one would eventually overpower the other - leading the critic to provide either useless or overly harsh information to the generator. Useless information will cause the generator to update very slowly or not at all, and overly harsh information will lead the samples to "collapse" to a small region of the data space that are the easiest targets for the generator to hit.  

This problem was essentially solved earlier this year due to a series of papers that propose modifications to the loss functions that GANs use, and, most crucially, add another term to the critic's loss which stabilizes the gradient (with respect to the inputs) to have a norm close to one. It was recognized that we actually desire an extremely powerful critic so that the generator can make the best updates it possibly can, but the updates themselves can't go beyond what the generator is capable of handling. With these changes to the GAN formulation, it became possible to use crazy critic networks such as ultra-deep ResNets and train them as much as desired before updating the generator network.  

The principle behind their operation is rather simple to describe, but unfortunately, it is much more difficult to explain why they work so well. However, I believe that as long as we know how to make one, and know specific implementation details that improve their stability, then I believe their principles can be applied more broadly to achieve success in a wide variety of regimes. 

II. GANs as a Model of Discourse

In order to use GANs as a tool for conceptual understanding of discourse, I propose to model of the dynamics of debate as a collection of hypothesis-generators and hypothesis-critics. This could be likened to the structure of academia - researchers publish papers, they go through peer-review, the work is iterated on and improved - and over time this process converges to more and more accurate models of reality (or so we hope). Most individuals within this process play both roles, but in theory this process would still work even if they didn't. For example, Isaac Newton was a superb hypothesis generator, but he also had some wacky ideas that most of us would consider to be obviously absurd. Nevertheless, calculus and Newtonian physics became a part of our accepted scientific knowledge, and alchemy didn't. The system adopted and iterated on his good ideas while throwing away the bad. 

Our community should be capable of something similar, while doing it more efficiently and not requiring the massive infrastructure of academia. 

A hypothesis-generator is not something that just randomly pulls out a model from model-space. It proposes things that are close modifications of things it already holds to be likely within its model (though I expect this point to be debatable). Humans are both hypothesis-generators and hypothesis-critics. And as I will argue, that distinction is not quite as sharply defined as one would think. 

I think there has always been an underlying assumption within the theory of intelligence that creativity and recognition / distinction are fundamentally different. In other words, one can easily understand Mozart to be a great composer, but it is much more difficult to be a Mozart. Naturally this belief entered it's way into the field of Artificial Intelligence too, and became somewhat of a dogma. Computers might be able to play Chess, they might be able to play Go, but they aren't doing anything fundamentally intelligent. They lack the creative spark, they work on pure brute-force calculation only, with maybe some heuristics and tricks that their human creators bestowed upon them.  

GANs seem to defy this principle. Trained on a dataset of photographs of human faces, a GAN generator learns to produce near-photo-realistic images that nonetheless do not fully match any the faces the critic network saw (one of the reasons why CelebA was such a good choice to test these on), and are therefore in some sense producing things which are genuinely original. It may have once been thought that there was a fundamental distinction between creation and critique, but perhaps that's not really the case. GANs were a surprising discovery, because they showed that it was possible to make impressive "creations" by starting from random nonsense and slowly tweaking it in the direction of "good" until it eventually got there (well okay, that's basically true for the whole of optimization, but it was thought to be especially difficult for generative models).

What does this mean? Could someone become a "Mozart" by beginning a musical composition from random noise and slowly tweaking it until it became a masterpiece?

The above seems to imply "yes, perhaps." However, this is highly contingent on the quality of the "tweaking." It seems possible only as long as the directions to update in are very high quality. What if they aren't very high quality? What if they point nowhere, or in very bad directions?

I think the default distribution of discourse is that it is characterized by a large number of these directionless, low quality contributions. And that it's likely that this is one of the main factors behind mode collapse. This is related to what has been noted before: Too much intolerance for imperfect ideas (or ideas outside of established dogma) in a community prevent useful tasks from being accomplished, and progress from being made. Academia does not seem immune to this problem. Where low-quality or hostile discussion is tolerated is where this risk is greatest.  

Fortunately, making sure we get good "tweaks" seems to be the easy part. Critique is in high abundance. Our community is apparently very good at it. We also don't need to worry much about the ratio of hypothesis-generators to hypothesis-critics, as long as we can establish good principles that allow us to follow GANs as closely as possible. The nice feature of the GAN formulation is that you are allowed to make the critic as powerful as you want. In fact, the critic should be more powerful than the generator (If the generator is too powerful, it just goes directly to the argmax of the critic). 

(In addition, any collection of generators is a generator, and any collection of critics is a critic. So this formulation can be applied to the community setting).

III. The Norm One Principle

So the question then becomes, how do we take an algorithm governing a game between models much simpler than a human, and use the same tweaks which consist of nothing more than a few very simple equations? 

Here what I devise is a strategy for taking the concept of the norm of the critic gradient being as close to one as possible, and using that as a heuristic for how to structure appropriate discourse. 

(This is where my argument gets more speculative and I expect to update this a lot, and where I welcome the most criticism).

What I propose is that we begin modeling the concept of "criticism" based on how useful it is to the idea-generator receiving the criticism. Under this model, I think we should start breaking down criticism into two fundamental attributes:

  1. Directionality - does the criticism contain highly useful information, such that the "generator" knows how to update their model / hypothesis / proposal?
  2. Magnitude - Is the criticism too harsh, does it point to something completely unlike the original proposal, or otherwise require changes that aren't feasible for the generator to make?

My claim is that any contribution to a discussion should satisfy the "Norm One Principle." In other words, it should have a well-defined direction, and the quantity of change should be feasible to implement.

If a critique can satisfy our requirements for both directionality and magnitude, then it serves a useful purpose. The inverse claim to this is that if we can't follow these requirements, we risk falling into mode collapse, and the ideas commonly proposed are almost indistinguishable from the ones which preceded them, and ideas which deviate too far from the norm are harshly condemned and suppressed. 

I think it's natural to question whether or not restricting criticism to follow certain principles is a form of speech suppression that prevents useful ideas from being considered. But the pattern I'm proposing doesn't restrict the "generation" process, the creative aspect which produces new hypotheses. It doesn't restrict the topics that can be discussed. It only restricts the criticism of those hypotheses, such that they are maximally useful to the source of the hypothesis. 

One of the primary fears behind having too much criticism is that it discourages people from contributing because they want to avoid the negative feedback. But under the Norm One Principle, I think it is useful to distinguish between disagreement and criticism. I think if we're following these norms properly, we won't need to consider criticism to be a negative reward. In fact, criticism can be positive. Agreement could be considered "criticism in the same direction you are moving in." Disagreement would be the opposite. And these norms also eliminate the kind of feedback that tends to be the most discouraging. 

For example, some things which violate "Norm One":

  • Ad hominem attacks (typically directionless). 
  • Affective Death Spirals (unlimited praise or denunciation is usually directionless, and usually very high magnitude). 
  • Signs that cause aversion (things I "don't like", that trigger my System 1 alarms, which probably violates both directionality and magnitude). 
  • Lengthy lists of changes to make (norm greater than 1, ideally we want to try to focus on small sets of changes that have the highest priority). 
  • Repetition of points that have already been made (norm greater than one). 

One of my strongest hopes is that whomever is playing the part of the "generator" is able to compile the list of critiques easily and use them to update somewhere close to the optimal direction. This would be difficult if the sum of all critiques is either directionless (many critics point in opposite or near-opposite directions) or very high-magnitude (Critics simply say to get as far away from here as possible). 

But let's suppose that each individual criticism satisfies the Norm One principle. We will also assume that the generator is weighing each critique by their respect for whoever produced it, which I think is highly likely. Then the generator should be able to move in a direction unless the sum of the directions completely cancel out. It is unlikely for this to happen - unless there is very strong epistemic disagreement in the community over some fundamental assumptions (in which case the conversation should probably move over to that). 

In addition, it also becomes less likely for the directions to cancel out as the number of inputs increases. Thus, it seems that proposals for new models should be presented to a wide audience, and we should avoid the temptation to keep our proposals hidden to all except for a small set of people we trust.

So I think that in general, this proposed structure should tend to increase the amount of collective trust we have in the community, and that it favors transparency and favors diversity of viewpoints. 

But what of the possible failure modes of this plan? 

This model should fail if the specific details of its implementation either remove too much discussion, or fail to deal with individuals who refuse to follow the norms and refuse to update. Any implementation should allow room for anyone to update. Someone who posts an extremely hostile, directionless comment should be allowed chances to modify their contribution. The only scenario in which the "banhammer" becomes appropriate is when this model fails to apply: The cardinal sin of rationality, the refusal to update. 

IV. Building the Ideal "Generator"

As a final point, I'll note that the above assumes that generators will be able to update their models incrementally. The easy part, as I mentioned, was obtaining the updates, the hard part is accumulating them. This seems difficult with the infrastructure we have in place. What we do have is a good system for posting proposals and receiving feedback (The blog post / comment thread set-up), but this assumes that each "generator" is keeping track of their models by themselves and has to be fully aware of the status of other models on their own. There is no centralized "mixture model" anywhere that contains the full set of models weighted by how much probability they are given by the community. Currently, we do not have a good solution for this problem. 

However, it seems that the first conception of Arbital was centered around finding a solution to this kind of problem:

Arbital has bigger ambitions than even that. We all dream of a world that eliminates the duplication of effort in online argument - a world where, the same way that Wikipedia centralized the recording of definite facts, an argument only needs to happen once, instead of being reduplicated all over the Internet; with all the branches of the argument neatly recorded in the same place, along with some indication of who believes what. A world where 'just check Arbital' had the same status for determining the current state of debates, as 'just check Wikipedia' now has when somebody starts arguing about the population of Melbourne. There's entirely new big subproblems and solutions, not present at all in the current Arbital, that we'd need to tackle that considerably more difficult problem. But to solve 'explaining things' is something of a first step. If you have a single URL that you can point anyone to for 'explaining Bayes', and if you can dispatch people to different pages depending on how much math they know, you're starting to solve some of the key subproblems in removing the redundancy in online arguments.

If my proposed model is accurate, then it suggests that the problem Arbital aims to solve is in fact quite crucial to solve, and that the developers of Arbital should consider working through each obstacle they face without pivoting from this original goal. I feel confident enough that this goal should be high priority that I'd be willing to support its development in whatever way is deemed most helpful and is feasible for me (I am not an investor, but I am a programmer and would also be capable of making small donations, or contributing material). 

The only thing that this model would require for Arbital to do would be to make it as open as possible to contribute, and then perform heavy moderation or filtering of contributed content (but importantly not the other way around, where it is closed to small group of trusted people).

Currently, the incremental changes that would have to be made to LessWrong and related sites like SSC would simply be increased moderation of comment quality. Otherwise, any further progress on the problem would require overcoming much more serious obstacles requiring significant re-design and architecture changes. 

Everything I've written above is also subject to the model I've just outlined, and therefore I expect to make incremental updates as feedback to this post accrues.

My initial prediction for feedback to this post is that the ideas might be considered helpful and offer a useful perspective or a good starting point, but that there are probably many details that I have missed that would be useful to discuss, or points that were not quite well-argued or well thought-out. I will look out for these things in the comments.   

New to LessWrong?

New Comment
13 comments, sorted by Click to highlight new comments since: Today at 2:01 PM

Really great post. I really enjoyed the theoretical justification for a very practical idea. Overall I found the machine learning argument caused me to update significantly in favor of "norm-one" criticism. Some comments and questions:

1) Its not that clear to me how to estimate the "norm" of one's criticism. We aren't going to do math to commute this stuff. What kind of heuristics can we use? Notably the community requires some degree of consistency in how people estimate criticism norms.

2) If you strongly disagree with a proposition X it might be hard to give any norm-one criticism. Maybe someone is suggesting plan X and you very storngly think they should abandon the plan. It might feel dishonest, insincere, or immoral to give advice on how to make plan X go slightly less badly.

3) Say a friend of yours asks you to critique their writing. This advice basically says you should hold back on some/much of your feedback. In theory you should try to only send the feedback thats most useful but fits inside a "norm-one" limit. This seems different from the "wall of red ink" technique that is commonly praised in wiriting cricles. (Though I find the walls of red ink demoralizing I am not a writer).

4) Is it ever useful for someone to say: "Ignore the norm-one limit. Just give me all the criticism you have". Will it become "low status" not to ask for unlimited-norm-criticism?

1) Its not that clear to me how to estimate the "norm" of one's criticism. We aren't going to do math to commute this stuff. What kind of heuristics can we use? Notably the community requires some degree of consistency in how people estimate criticism norms.

I think that in any situation in which the overall quality of a contribution must be estimated, we will have the same problem. Ultimately, I believe it is going to require either some kind of averaged community sentiment, in a similar way to how things are upvoted / downvoted right now, or require heavy moderator involvement (and have lots of mods). Personally I think moderators have pretty good incentives to be honest and thorough in their judgement (since they could easily lose their status by making poor calls). I think they could be encouraged to notify people which portions of their comments need to be edited or removed, and allow time for changes like that to happen before taking any disciplinary actions. Being objectively close to norm one is probably not possible, but it is much more possible to determine when things are far away from this norm, which I think is the important thing.

2) If you strongly disagree with a proposition X it might be hard to give any norm-one criticism. Maybe someone is suggesting plan X and you very storngly think they should abandon the plan. It might feel dishonest, insincere, or immoral to give advice on how to make plan X go slightly less badly.

I think it is possible to decouple norm-one criticism from your overall appraisal of the plan itself. Personally, I believe it is possible to be sincere when giving advice on how to slightly improve the plan without stating any disapproval you may have. It may not be the most candid and transparent summary of your feelings, and I realize that some might feel difficulty repressing the urge to express them, but if I have to be honest about what my plan suggests, then that is what I believe has to be done.

There might yet be a place for overall appraisal to be given in each critique, separately from the rest of the critique that follows norm one. But I still think it is good to avoid appraisal that is overly negative. The reason I'm not very worried about this particular issue is that, for most proposals or plans that require collective action, there has to be a level of support that must be reached before any progress on it can be made. Therefore, I do not think there is much risk in not making disapproval well-known. You can simply opt-out of participation. I think there is room for exceptions in the case that someone is planning on taking dangerous actions by themselves, in which case it might be the correct action to try and stop them.

3) Say a friend of yours asks you to critique their writing. This advice basically says you should hold back on some/much of your feedback. In theory you should try to only send the feedback thats most useful but fits inside a "norm-one" limit. This seems different from the "wall of red ink" technique that is commonly praised in wiriting cricles. (Though I find the walls of red ink demoralizing I am not a writer).

Hm, I'm not at all familiar with the "wall of red ink" technique. I too would feel completely overwhelmed by that kind of thing. Funnily, just by randomly Googling a bit I found a writing education company called "NoRedInk".

4) Is it ever useful for someone to say: "Ignore the norm-one limit. Just give me all the criticism you have". Will it become "low status" not to ask for unlimited-norm-criticism?

That's a difficult question. I think it is possible that asking for unlimited criticism could become a status-signalling kind of thing, but I also feel that it wouldn't be subtle enough for it to really work, especially if the norm-one limit is a visible community principle. Then it might be possible to get called-out for doing that.

Magnitude - Is the criticism too harsh, does it point to something completely unlike the original proposal, or otherwise require changes that aren't feasible for the generator to make?

I'm confused, I thought the point was to avoid getting stuck in local maxima. Discouraging criticisms that are too harsh or demand too many changes sees a weird way of doing that.

The point is that when someone is exploring / testing an idea, it might be better for them to explore the region of small updates around the original proposal, instead of easily giving up and trying something completely different. Many ideas fail because of small details that were gotten wrong. When criticism is too harsh, it prevents people from doing even this. They might instead just keep proposing something close to what's already being tried. This is how you actually end up in a local minima.

Not sure if this is an example of what you mean, but similar discussions remind me of Ignaz Semmelweis -- he was a doctor who noticed that some patients are disproportially dying, he made a hypothesis about the possible cause, he changed his behavior based on the hypothesis, and he experimentally observed that now fewer patients are dying.

What happened next? The medical community found a technical mistake in his hypothesis, and instead of saying "well, this is technically wrong, but it also seems to be approximately right, we should explore the neighborhood of this hypothesis", they focused on getting the man fired. Later, the correct hypothesis (germ theory) was actually found in the neighborhood of his hypothesis.

I understood your article to mean that this is the type of mistake we should avoid. That we should be able to explore the neighborhood of the "technically wrong" ideas. (Without going to the opposite extreme and accepting all ideas indiscriminately, of course.)

This also seems related to the idea of "steelmanning"; instead of finding a small mistake and dismissing the whole thing, try to fix the mistake, and think again about the stronger version of the original hypothesis.

I'll throw in a heuristic that I've seen Tyler Cowen use:

Instead of pointing out that X is wrong, ask under which set of circumstances might X be right.

My heuristics is: "assume that people's reported experiences are usually true, but their interpretations and conclusions are usually wrong". Not used for science, but for everyday life. And assumes basic trustworthiness of the person, i.e. not having previous experience that the person misreports facts.

To summarize, the best kind of criticism is proposing small improvements ("constructive criticism"), because it gives new ideas a chance to grow before they compete with conventional ones. That seems insightful and I agree, thanks for writing that! Even when an idea is so broken that small tweaks won't help, like a perpetual motion machine, it might be useful to take a cue from the trisector article and find some way in which the idea succeeds, conspicuously ignoring the ways in which it fails. An extreme form of that technique is called "fogging" and it's hilarious to use.

[-][anonymous]7y10

I found this to be useful. I had not explicitly reasoned about the hypothesis generation and subsequent iteration process like this.

For this part about updating with regards to criticism:

One of my strongest hopes is that whomever is playing the part of the "generator" is able to compile the list of critiques easily and use them to update somewhere close to the optimal direction. This would be difficult if the sum of all critiques is either directionless (many critics point in opposite or near-opposite directions) or very high-magnitude (Critics simply say to get as far away from here as possible).

I'm curious exactly what that might entail. Are there any good examples you can give where someone gives a hypothesis, and then some critique in a certain direction / magnitude causes them to shift? What is the analogy when applied to, say, posts on LW about motivation, for example?

(Maybe someone gives an equation for motivation that satisfies certain qualities. And then someone critiques by bringing up an important quality the equation misses out?)

I'm curious exactly what that might entail. Are there any good examples you can give where someone gives a hypothesis, and then some critique in a certain direction / magnitude causes them to shift?

Well, I think the recent Dragon Army post and subsequent discussion was a good example. It generated a huge volume of critique, much of it following Norm One, and some of it not. The stuff that did follow Norm One actually did point mainly in the same direction, and mostly consisted of suggestions for how to make the system more robust to failure and implementing proper safe-guards. This did seem to cause Duncan to update his plan in that direction, and made it a lot more palatable to some (consider Scott Alexander's shift of opinion on it).

Contrast that with the more hostile criticism from that discussion, which probably caused no one to update in any direction, and if anything made it more likely for people to become entrenched in their views.

[-][anonymous]7y00

Cool. I think I agree with the general spirit of making criticism that follows these guidelines.

I think the thing I'm having trouble parsing is how to translate typical critique into the style of Norm One. My current interpretation is something like, "Give small, incremental suggestions that they can actually implement, rather than larger, more nebulous vague-pointing things. (And if you want to do the nebulous thing, maybe only do that after giving the small incremental stuff and use the nebulous thing as more of a goalpost of what you intend by giving the incremental stuff.)"

Excellent post, congratulations!

It's a > ~80% match with my models, which I have independently derived from the mechanisms of individual human cognition (that also can be thought of as having two components that work in an adversarial setup), so I guess that counts as additional evidence for both of us.

I guess positive feedback should also have a direction, and in this case I would spend most of my norm on reinforcing the way of thinking that you used to generate this post: noticing a deep correspondence between abstract objects in different domains, and drawing useful conclusions from it.

Hint: if you make a single connection ML social norms, you have two ways to draw useful conclusions. If you add human cognition to the mix, you form a triangle with three bidirectional correspondences, and that gives you six ways to generate useful cross-domain insights. [This generalizes to thinking efficiently about anything.]

Simple old school internet forum could be the technical solution, where there are forums and subforums, and topics go and down depending of commenting and pinning. But don't rush to create it. There is also need for a social solution so everybody would contribute only in one place. Maybe we need a system to move comments and ideas to one place from all other - or it is reddit?