Aesthetics is confusing me a bit right now. You might also ask the question "why?" with painting or architecture, for example. I am singling out music because I got to thinking about it via how we understand music.
Neurological problems can separately disable pitch/melody recognition, rhythm recognition and emotional reaction to music, and people can lose all of these without losing speech and speech processing. This is odd. Liking music is then some messy neurological process with its own special pathways. And it's probably not all that complicated, from a brain standpoint, just fuzzy and parallel.
What do we know? We know that we don't generally like contextless musical objects, but instead enjoy relationships between musical objects, especially with some rhythm. And yet we enjoy music "in the moment," (a musical object in the context of the last few measures) without having to listen to a whole piece. We tend to ascribe emotion to music (particularly the stress patterns, which seems vaguely connected to speech), and people can express themselves through music. Music can differ from culture to culture, but we usually like discrete, repeating scales and rhythms that partially repeat.
One proposal is that we form a vague (consistent with many possibilities) model of what the musician is likely to do next, and enjoy it when the model feels accurate. The emotional content also suggests that "musical grammar" model, where different elements of the music communicate things to us and what we enjoy is deciphering the communication and experiencing the communicated emotions. I'd enjoy it if people had more suggestions and possible experiments. Going more abstract, should these proposals generalize well to other sorts of aesthetics, or should we assume that since it's probably all different neurons we shouldn't try too hard? If so, why do we feel like we enjoy aesthetic pursuits in similar ways?
The paper isn't particularly long, if you haven't read it.
It doesn't attempt to explain music at a cultural level, only an individual one. You don't need a theory of aesthetics to explain why people would decide to like whatever their peers do, there's plenty of general psychology to cover that.
As for different musical tastes, the compression algorithm that the model is based around is subjective and adaptive. So mine can be different from yours (though there's a fair amount that humans on the whole will tend to have in common), and yours can change over time (esp in response to new data).
In particular, if you've been exposed to a lot of e.g. reggae music, then your algorithm will likely be especially efficient at compressing reggae. So it will seem more accessible to you. If I've been exposed to only a little reggae, it will likely seem less accessible, but more interesting: the compression algorithm can detect the presence of order and structure, but still has work to do in uncovering and utilizing all the regularity that's there. And if someone has never heard any music but classical, reggae could be incomprehensible gibberish to that person (read: they won't like it), because it clashes with their existing model and expectations so drastically.