The game theory textbook "A Course in Microeconomic Theory" (Kreps) addresses this situation. Quoting from page 516:
We will give an exact analysis of this problem momentarily (in smaller type), but you should have no difficulty seeing the basic trade-off; too little punishment, triggered only rarely, will give your opponent the incentive to try to get away with the noncooperative strategy. You have to punish often enough and harshly enough so that your opponent is motivated to play [cooperate] instead of [defect]. But the more often/more harsh is the punishment, the less are the gains from cooperation. And even if you punish in a fashion that leads you to know that your opponent is (in her own interests) choosing [cooperate] every time (except when she is punishing), you will have to "punish" in some instances to keep your opponent honest.
We know that Tit-for-Tat and variants do very well in iterated-Prisoner's-Dilemma tournaments. However, such tournaments are a bit unrealistic in that they give the agents instant and complete information about each other's actions. What if this signal is obscured? Suppose, for example, that if I press "Cooperate", there is a small chance that my action is reported to you as "Defect", presumably causing you to retaliate; and conversely, if I press "Defect" there is a chance that you see "Cooperate", thus letting me get away with cheating. Does this affect the optimal strategy? Does the probability of getting wrong information matter? What if it is asymmetric, ie P(observe C | actual D) != P(Observe D | actual C)?