Sure, you can model music composition as a RL task. The AI composes a song, then predicts how much a human will like it. It then tries to produce songs that are more and more likely to be liked.
Another interesting thing that alphago did, was start by predicting what moves a human would make. Then it switched to reinforcement learning. So for a music AI, you would start with one that can predict the next note in a song. Then you switch to RL, and adjust it's predictions so that it is more likely to produce songs humans like, and less likely to produce ones we don't like.
However automated composition is something that a lot of people have experimented with before. So far there is nothing that works really well.
One difference is that you can't get feedback as fast when dealing with human judgement rather than win/lose in a game (where AlphaGo can play millions of games against itself).
There have been a couple of brief discussions of this in the Open Thread, but it seems likely to generate more so here's a place for it.
The original paper in Nature about AlphaGo.
Google Asia Pacific blog, where results will be posted. DeepMind's YouTube channel, where the games are being live-streamed.
Discussion on Hacker News after AlphaGo's win of the first game.