The target network has access to more information than the Q-network does, and thus is a better predictor
Some additional context for anyone confused by this (as I was): "more information" is not a statement about training data, but instead about which prediction task the target network has to do.
In other words, the target network has an "easier" prediction task: predicting the return from a state further in the future. I was confused because I thought the suggestion was that the target network has been trained for longer, but it hasn't, it's just an old checkpoint of the Q-network.
The differentiation between the Q- and target-networks here is actually not very important, the key point would hold even if you were using the Q-network instead. That is: predicting Q(s′,a′;θ−i) is "easier" than predicting Q(s,a;θi) for any network, as the first is in some sense a subset of the second task.
Some additional context for anyone confused by this (as I was): "more information" is not a statement about training data, but instead about which prediction task the target network has to do.
In other words, the target network has an "easier" prediction task: predicting the return from a state further in the future. I was confused because I thought the suggestion was that the target network has been trained for longer, but it hasn't, it's just an old checkpoint of the Q-network.
The differentiation between the Q- and target-networks here is actually not very important, the key point would hold even if you were using the Q-network instead. That is: predicting Q(s′,a′;θ−i) is "easier" than predicting Q(s,a;θi) for any network, as the first is in some sense a subset of the second task.