Well, there's a couple of issues here: first, logP(data|model) is a concave function for logistic regression, so unless logP(model) is also concave, the maximization may not reach the global optimum.
Secondly, the proper Bayesian thing to do would be to sample from the posterior, not maximize; for instance, in logistic regression the model is given by a vector of parameters denoted by theta. Suppose that we actually believed that the prior on theta was exp(-|theta|), where |theta| is the sum of the absolute values of the coordinates of theta. Then maximizing P(model|data) in this case will tend to give you solutions where most of the entries of theta are equal to 0, whereas the actual posterior places zero probability mass on such solutions.
On the second point - fair enough, though even under Bayes it's sometimes reasonable to want a single answer on account of you only get to actually do one thing.
If you have that prior and you maximize P(model|data) on solutions with a zero probability mass on either P(data|model) or P(model), you're screwing up multiplication.
Question in title.
This is obviously subjective, but I figure there ought to be some "go-to" paper. Maybe I've even seen it once, but can't find it now and I don't know if there's anything better.
Links to multiple papers with different focus would be welcome. For my current purpose I have a preference for one that aims low and isn't too long.