They don't test that directly. From what they report, it looks like the average is more accurate than the second guess, but not statistically significantly so. The average is 7.6 better than the first guess (with mean errors of 123.2 vs. 130.8, looking at all participants' first guesses, and the averages of only those in the dialectical bootstrapping condition). The second guess (of those in the dialectical bootstrapping condition) is only 4.5 better than their first guess, which is not reliably different from zero (95% CI = -1.0 to +10.4).
"Dialectical Bootstrapping" is a simple procedure that may improve your estimates. This is how it works:
Herzog and Hertwig find that average of the two estimates (in a historical-date estimating task) is more accurate than the first estimate, (Edit: or the average of two estimates without the "assume you're wrong" manipulation). To put the finding in a OB/LW-centric manner, this procedure (sometimes, partially) avoids Cached Thoughts.