This is the way I would do it, also taking into account EY's point of not hiding away new comments:
Assume each comment has an ‘upvote rate’ U, such that the probability that a comment at the age of t has u upvotes is a Poisson distribution with parameter Ut,
and similarly for downvotes,
If the prior probability distribution for U and D is P(U, D), their posterior probability distribution will be
Then, you sort comments according to a functional of the posterior pdf of U and D; in analogy with expected utility maximization you could use the posterior expectation value of some function f(U, D), but other choices would be possible. (This reduces to your proposal when you take f(U, D) = U/(U + D).)
Of course this model isn't entirely realistic because U and D ought to vary with time (according to timezone, how old the thread is and whether it's currently linked to from the main page, etc.), but the main effect of disregarding this (pretending that a comment has the same probability of getting upvoted in the 10,000th hour after its publication as in the 1st hour) would be to cause very recent comments to be sorted higher, which IMO is a Good Thing anyway.
I think some factor for decreasing votes over time should be included. Exponentially decaying rates seem reasonable, and the decay time constant can be calibrated with the overall data in the domain (assuming we have data on voting times available).
In How Not To Sort By Average Rating, Evan Miller gives two wrong ways to generate an aggregate rating from a collection of positive and negative votes, and one method he thinks is correct. But the "correct" method is complicated, poorly motivated, insufficiently parameterized, and founded on frequentist statistics. A much simpler model based on a prior beta distribution has more solid theoretical foundation and would give more accurate results.
Evan mentions the sad reality that big organizations are using obviously naive methods. In contrast, more dynamic sites such as Reddit have adopted the model he suggested. But I fear that it would cause irreparable damage if the world settles on this solution.
Should anything be done about it? What can be done?
This is also somewhat meta in that LW also aggregates ratings, and I believe changing the model was once discussed (and maybe the beta model was suggested).
In the Bayesian model, as in Evan's model, we assume for every item there is some true probability p of upvoting, representing its quality and the rating we wish to give. Every vote is a Bernoulli trial which gives information on p. The prior for p is the beta distribution with some parameters a and b. After observing n actual votes, of which k are positive, the parameters of the posterior distribution are a+k and b+(n-k), so the posterior mean of p is (a+k)/(a+b+n). This gives the best estimate for the true quality, and reproduces all the desired effects - convergence to the proportion of positive ratings, where items with insufficient data are pulled towards the prior mean.
The specific parameters a and b depend on the quality distribution in the specific system. a/(a+b) is the average quality and can be taken as simply the empirical proportion of positive votes among all votes in the system. a+b is an inverse measure of variance - a high value means most items are average quality, and a low value means items are either extremely good or extremely bad. It is harder to calibrate, but can still be done using the overall data (e.g., MLE from the entire voting data).
For the specific problem of sorting, there are other considerations than mere quality. A comment can be in universal agreement, but not otherwise interesting or notable. These may not deserve as prominent a mention as controversial comments which provoke stronger reactions. For this purpose, the "sorting rating" can be multiplied by some function of the total number of votes, such as the square root. If the identity function is used, this becomes similar to a simple difference between the number of positive and negative votes.