LESSWRONG
LW

The Fiction Genome Project — LessWrong

19 The Fiction Genome Project

by [anonymous]

29th Jun 2012

3 min read

19

The Music Genome Project is what powers Pandora. According to Wikipedia:

The Music Genome Project was first conceived by Will Glaser and Tim Westergren in late 1999. In January 2000, they joined forces with Jon Kraft to found Pandora Media to bring their idea to market.[1] The Music Genome Project was an effort to "capture the essence of music at the fundamental level" using almost 400 attributes to describe songs and a complex mathematical algorithm to organize them. Under the direction of Nolan Gasser, the musical structure and implementation of the Music Genome Project, made up of 5 Genomes (Pop/Rock, Hip-Hop/Electronica, Jazz, World Music, and Classical), was advanced and codified.

A given song is represented by a vector (a list of attributes) containing approximately 400 "genes" (analogous to trait-determining genes for organisms in the field of genetics). Each gene corresponds to a characteristic of the music, for example, gender of lead vocalist, level of distortion on the electric guitar, type of background vocals, etc. Rock and pop songs have 150 genes, rap songs have 350, and jazz songs have approximately 400. Other genres of music, such as world and classical music, have 300–500 genes. The system depends on a sufficient number of genes to render useful results. Each gene is assigned a number between 1 and 5, in half-integer increments.[2]

Given the vector of one or more songs, a list of other similar songs is constructed using a distance function. Each song is analyzed by a musician in a process that takes 20 to 30 minutes per song.[3] Ten percent of songs are analyzed by more than one technician to ensure conformity with the in-house standards and statistical reliability. The technology is currently used by Pandora to play music for Internet users based on their preferences. Because of licensing restrictions, Pandora is available only to users whose location is reported to be in the USA by Pandora's geolocation software.[4]

Eminent lesswronger, strategist, and blogger, Sebastian Marshall, wonders:

Personally, I was thinking of doing a sort of “DNA analysis” of successful writing. Have you heard of the Music Genome Project? It powers Pandora.com.

So I was thinking, you could probably do something like that for writing, and then try to craft a written work with elements known to appeal to people. For instance, if you wished to write a best selling detective novel, you might do an analysis of when the antagonist(s) appear in the plot for the first time. You might find that 15% of bestsellers open with the primary antagonist committing their crime, 10% have the antagonist mixed in quickly into the plot, and 75% keep the primary antagonist a vague and shadowy figure until shortly before the climax.

I don’t know if the pattern fits that – I don’t read many detective novels – but it would be a bit of a surprise if it did. You might think, well, hey, I better either introduce the antagonist right away having them commit their crime, or keep him shadowy for a while.

Or, to use an easier example – perhaps you could wholesale adopt the use of engineering checklists into your chosen discipline? It seems to me like lots of fields don’t use checklists that could benefit tremendously from them. I run this through my mind again and again – what kind of checklist could be built here? I first came across the concept of checklists being adopted in surgery from engineering, and then having surgical accidents and mistakes go way down.

Some people at TV Tropes came across that article, and thought that their wiki's database might be a good starting point to make this project a reality. I came here to look for the savvy, intelligence, and level of technical expertise in all things AI and NIT that I've come to expect of this site's user-base, hoping that some of you might be interested in having a look at the discussion, and, perhaps, would feel like joining in, or at least sharing some good advice.

Thank you. (Also, should I make this post "Discussion" or "Top Level"?)

Personal Blog

19

New Comment

32 comments, sorted by

top scoring

Click to highlight new comments since: Today at 6:06 AM

[-]John_Maxwell14y110

Discussion for sure IMO.

[-][anonymous]14y-40

You don't think the mathematization and rationalization of the criteria for what makes fiction good or bad an interesting application of Bayesian rationality over subjective prejudices and intellectual pomposity?

[-]magfrump14y60

I would guess that the reason this feels like a "discussion" piece is that it is (mostly) about a reference to an outside-less-wrong project, whereas pieces in "main" are usually more self-contained academic summaries or longer expositions (in the form of sequences) on a specific topic.

Here's a referent I may have been using without realizing for a while: Things I like in main are things that I'd love to have in a physical "LessWrong Main" coffee table book. While the subject matter of this post could make it there, the format (mostly quotes and links to an outside discussion) doesn't make it seem like book material.

[-][anonymous]13y00

Seems like a reasonable criterion.

[-]jmmcd14y00

I think it's a clear-cut case of taking a very shallow view of a very deep area of human endeavour. The idea that an art form can be "solved" rationally, and that such a solution could trump subjective preference, is hopelessly naive.

[-][anonymous]13y00

There seems to be a confusion in terms. I don't remember suggesting anything like that; you seem to be projecting viewpoints that would have a similar-looking output into this, instead of seeing the suggestion for what it is; a humble yet ambitious attempt to get objective information on what people like, and to provide it to them. Certainly, an incomplete, superficial progress in the scientific study of this field is better than no progress at all?

[-]jmmcd13y00

I was responding to "the mathematization and rationalization of the criteria for what makes fiction good or bad" which doesn't sound humble and does sound a bit like solving the art form. And I was responding to "over subjective prejudices" (my emphasis), which does sound like you think subjectivity, the core issue, can be trumped. I still read your GP comment the same way, but if you think I misinterpreted then I won't argue, of course.

I do think there is room for an approach which strictly distinguishes between "art" and "craft" aspects of an art form, and applies reductive or analytical methods to the latter.

[-][anonymous]13y00

If you think art forms weren't ultimately "solvable" in some way, you're putting a rather hard limit on the achievements an AI could make. That would be an interesting replacement to the Turing Test; "artificial, mathematic beings suffer from creative sterility; they can't make good art, and they can't tell good art from bad". Is that what you're suggesting?

As for trumping subjectivity, it's more that I'd like to build a "critic" or "recommender" that isn't burdened by personalBias Steamroller effects. Its bias would be the bias of the public at large, on average. It doesn't "trump subjictivity", but it mitigates its effect for the sake of recommending to people what they are most likely to like.

[-]jmmcd13y00

Recommender systems are a respectable topic in machine learning. That is quite different from "try[ing] to craft a written work with elements known to appeal to people" (OP) or "the mathematization and rationalization of the criteria for what makes fiction good or bad" (GGP).

No, I'm not suggesting any particular limit on what an AI might do in creative domains. I do think a domain like writing or reviewing fiction is probably AI-hard, meaning that sub-AI approaches like statistical machine learning won't be enough.

[-][anonymous]13y00

I certainly agree with that, but the way to eat an elephant is; one mouthful at a time.

[-]jmmcd14y100

I think the Music Genome project is misleadingly-named. A genome is generative: there is a mapping from a genome to an organism. There is no reverse mapping. In the case of music, there is a reverse mapping from a piece of music to these 400 odd features, but there's no forward mapping.

And that misleading name has indeed misled the person who suggests using the detective novel features as a generative representation. They're not generative. They're just observed features. It's a useful distinction to keep in mind.

[-]Lapsed_Lurker14y100

Just reading the title of this post, TVTropes came to mind, and there it was when I read it, which made me feel both good that I had made a successful prediction, and worried that it was probably me being biased by not remembering all the fleeting predictions that don't come true.

[-]erratio14y80

My experience using Pandora is that I usually have to use the same seed pieces a number of times before the algorithm will correctly pick up on the common feartures I like. This isn't too surprising; if there are a lot of features then presumably there are a ton of different pieces that are equidistant from my starting seed, and only a subset of them are in the right direction in featurespace. It's a model that works well enough for Pandora, because I can get a feel for a piece of music very quickly, and it will take me at most half an hour to pin down the feature area I'm looking for. I question whether this same model is feasible for fiction, because while you can select for things like writing style or camera technique within a couple of minutes, the time investment for most other features is going to be much higher. (Examples: is foreshadowing done well or not? Do characters change realistically in response to events over the course of the fiction? Is the ending logically and/or emotionally satisfying? Is there troubling subtext? Is the narrator reliable?)

[-][anonymous]14y00

Fiction "genes" will definitely be more complex creatures than music "genes". We've actually tackled the issues you raise in this post, in the linked thread.

[-]novalis14y40

In my experience, Pandora simply doesn't tend to give me music that I like even when I put in an artist that I like.

This is probably because some human came up with the categories. Humans aren't very good at that -- machines are much better at figuring this sort of thing out. Of course, getting machines to read fiction would be difficult, but without that, you're likely to end up with formulas that work fine if one is already a writer and not so much otherwise.

[-]stcredzero14y80

In my experience, Pandora simply doesn't tend to give me music that I like even when I put in an artist that I like.

Yes, Pandora does give me music with qualities in common with the music I like. It's just that those aren't the qualities that make me really like the music. Instead, I just get ho-hum dopplegangers of bands that I like.

[-]DanArmak14y20

Never used Pandora, but it sounds suspicious that it has "world music" as a genre with 'genes'. The definition is more or less "non-Western traditional music from everywhere in the world". It sounds like "all of music" wouldn't be much more diverse than "all non-Western music", so why group by genre at all? Alternatively, if Western music contains most of the musical diversity in the world, that's an interesting result I'd love to read about.

[-][anonymous]14y40

"World music" usually means "non-Western music marketed to Westerners". You won't get Indian or Arabian "Pop" genres labeled "World Music". As a matter of fact, you're unlikely to find them at all.

[-]falenas10814y10

When I started, I just put in every artist that I like. After a few days of more downvoting than I'd like, Pandora got pretty good at predicting. It just needed more data points on what I liked and disliked in music.

[-]NancyLebovitz14y00

I found that upvoting and downvoting on Pandora made the results insipid. It works better for me to just have a bunch of stations based on music I like, and then shuffle them.

[-]thomblake14y00

I've used Pandora extensively since around its launch. I find it's really good if I'm looking for a 'feel' - I tend to have music that's for particular purposes, like "Dungeons and Dragons combat music" - and with a couple of appropriate seeds and a little grooming, such stations work out really well.

One thing that consistently annoys me about Pandora is the amount of fake constraints they put on the system. Like, "Kids' music" doesn't come up on 'Adult' stations, so TMBG's kids' albums don't interact with their adult albums at all. And things like "artist" and "humorous lyrics" pretty much trump everything else, so if you seed a station with "Fake Bjork song" you have no chance of getting Bjork but probably will get other Liam Lynch songs and stuff like Tenacious D.

[-][anonymous]14y00

As someone who doesn't live in the USA and thus doesn't have access to Pandora, I'm finding your experiences an interesting wealth.

[-]Risto_Saarelma14y20

sigfpe on recommendation systems:

Whenever I go to amazon.com I get presented with recommendations based on my purchase and browsing history. It's boring. I know all about the books that are similar to what I already have. I want to see books I don't know about. I want to see other people's recommendations.

When I enter a physical bookstore I'm often presented with a curated display of books on a theme I hadn't really thought about, or sometimes employee recommendations. It's so much more interesting than a list generated by me sampling a list that was itself generated by me sampling a list that was ultimately based on stuff that I already have.

[-]Viliam_Bur14y20

I know all about the books that are similar to what I already have.

This could be fixed in user interface by a button that means "I already know about this book, and I will not buy it from you, so don't show me this book again". And the system would instead display another similar book... until you find something interesting you don't know.

[-]CronoDAS14y20

Amazon.com does have a "not interested" button.

[-]Manfred14y10

Yeah, turning around to make this generative is difficult. And the result might not even be too good, because the different "genes" aren't independent with respect to the quality of the song or book.

For example, maybe detective novels with a clear villain are very successful. And maybe detective novels whose protagonist has a day job other than "detective" are very successful. But maybe novels with both properties are unsuccessful - they work against each other, and you'd be better off just picking one route. Accounting for the effect of each gene on each other gene is doable, but it changes the size of the number of things you have to keep track of from O(n) to O(10^n).

Still, it's a cool idea.

[-]Baughn13y00

This feels like nit-picking, but.. O(2^n), surely.

[-]pleeppleep14y00

I saw the title of this post and clicked to see how long it would take for someone to bring up Tvtropes.

[-]NancyLebovitz14y00

Another way of gathering information-- your e-reader reads you.

[+][anonymous]14y-110

Moderation Log

Curated and popular this week

32Comments

New Comment

32 comments, sorted by

top scoring

Click to highlight new comments since: Today at 6:06 AM

[-]John_Maxwell14y110

Discussion for sure IMO.

[-][anonymous]14y-40

[-]magfrump14y60

[-][anonymous]13y00

Seems like a reasonable criterion.

[-]jmmcd14y00

[-][anonymous]13y00

[-]jmmcd13y00

I do think there is room for an approach which strictly distinguishes between "art" and "craft" aspects of an art form, and applies reductive or analytical methods to the latter.

[-][anonymous]13y00

[-]jmmcd13y00

[-][anonymous]13y00

I certainly agree with that, but the way to eat an elephant is; one mouthful at a time.

[-]jmmcd14y100

[-]Lapsed_Lurker14y100

[-]erratio14y80

[-][anonymous]14y00

Fiction "genes" will definitely be more complex creatures than music "genes". We've actually tackled the issues you raise in this post, in the linked thread.

[-]novalis14y40

In my experience, Pandora simply doesn't tend to give me music that I like even when I put in an artist that I like.

[-]stcredzero14y80

In my experience, Pandora simply doesn't tend to give me music that I like even when I put in an artist that I like.

[-]DanArmak14y20

[-][anonymous]14y40

"World music" usually means "non-Western music marketed to Westerners". You won't get Indian or Arabian "Pop" genres labeled "World Music". As a matter of fact, you're unlikely to find them at all.

[-]falenas10814y10

[-]NancyLebovitz14y00

I found that upvoting and downvoting on Pandora made the results insipid. It works better for me to just have a bunch of stations based on music I like, and then shuffle them.

[-]thomblake14y00

[-][anonymous]14y00

As someone who doesn't live in the USA and thus doesn't have access to Pandora, I'm finding your experiences an interesting wealth.

[-]Risto_Saarelma14y20

sigfpe on recommendation systems:

Whenever I go to amazon.com I get presented with recommendations based on my purchase and browsing history. It's boring. I know all about the books that are similar to what I already have. I want to see books I don't know about. I want to see other people's recommendations.

When I enter a physical bookstore I'm often presented with a curated display of books on a theme I hadn't really thought about, or sometimes employee recommendations. It's so much more interesting than a list generated by me sampling a list that was itself generated by me sampling a list that was ultimately based on stuff that I already have.

[-]Viliam_Bur14y20

I know all about the books that are similar to what I already have.

[-]CronoDAS14y20

Amazon.com does have a "not interested" button.

[-]Manfred14y10

Yeah, turning around to make this generative is difficult. And the result might not even be too good, because the different "genes" aren't independent with respect to the quality of the song or book.

Still, it's a cool idea.

[-]Baughn13y00

This feels like nit-picking, but.. O(2^n), surely.

[-]pleeppleep14y00

I saw the title of this post and clicked to see how long it would take for someone to bring up Tvtropes.

[-]NancyLebovitz14y00

Another way of gathering information-- your e-reader reads you.

[+][anonymous]14y-110

Moderation Log