STORY (skippable)
Every Who down in Whoville enjoys Christmas Day
But you are the Grinch! And whatever they say
You HATE Christmas Day, with those terrible lights
(They shine into your eyes, and they shine much too bright)
You HATE Christmas Day, with those huge awful trees
(You think you're allergic...they're making you sneeze)
But you HATE most of all all those Christmasy toys
That bring Christmas morning that Christmasy noise
(Oh god, all the noise! All the noise, noise, NOISE, NOISE!)
(If there's one thing you hate, it's the noise, noise, NOISE, NOISE!!!!!!)
And tomorrow you know all those noises will start
As the presents bring joy to each Who child's heart
And they'll show all this joy (as Who children do)
By making loud noises and bothering you!
But as you are leaving earplugs by your bed
And preparing the pillows to cover your head
You realize there might be a thing you can do!
You can't steal the presents (they'd know it was you)
But for fifty-three years you've been hearing this riot
And sometimes it's louder! And sometimes more quiet!
For the more a Who child enjoys all their toys
The more they will make all that terrible noise
But what if - you think - they were feeling less glee?
Yes, that is the way, it's so clear now you see!
For it's long been tradition (among all the Whos)
That when you give presents, each child must get two
And it's also tradition there (don't ask me why)
To distribute them randomly, closing your eyes
So at night, when the Whos sleep and dream of their joy
You'll creep in and rearrange all of those toys
And the very next morn, when the Who children wake,
They'll run down the stairs and their presents they'll take
But then, why, instead of the gifts that they're hoping
You'll give each one gifts that will make them start moping!
A doll for the Who who thinks dolls are for babies!
Puppies for the Whos who think they carry rabies!
And then, why at last, why on Christmas this year
You'll sleep soothed by the sound of those Who children's tears!
But the whisper, so soft, from your treacherous heart:
You could bring them more joy, and no more stand apart...
DATA & OBJECTIVE:
- Each Who child has been given two presents. Based on long Who tradition, you believe these presents have historically been allocated among the children randomly.
- You must allocate two presents to each child, and may not assign any child two of the same present, but other than that you may distribute the gifts however you wish.
- Your goal is to minimize the amount of noise produced (by all children combined). You have a dataset of past presents received and noise made to help with this.
- (You could instead/also submit a solution where you invert the goal and try to maximize the noise. But why would you want to do that?)
The ten current Who children are:
Name | Age | Gender |
Andy Sue Who | 12 | M |
Betty Drew Who | 11 | F |
Sally Sue Who | 11 | F |
Phoebe Drew Who | 9 | F |
Freddie Lou Who | 8 | M |
Eddie Sue Who | 8 | M |
Cindy Drew Who | 6 | F |
Mary Lou Who | 6 | F |
Ollie Lou Who | 5 | M |
Johnny Drew Who | 4 | M |
You have the following 20 toys to allocate among them (remember, exactly two toys to each child, and no child may get two of the same toy, or you'll give yourself away!):
- Four Blum-Bloopers
- Four Fum-Foozlers
- Two Gah-Ginkas
- Three Sloo-Slonkers
- Three Trum-Troopas
- Four Who-Whonkers
An answer key and leaderboard will be posted on Monday the 10th (my holiday schedule permitting).
As usual, working together is allowed and encouraged, but for the sake of those who wish to work alone please spoiler-tag (type '>!' at the start of a line) any comments with information on the dataset.
Thank you to abstractapplic for feedback on a draft of this! (For clarification, abstractapplic has no inside information on the dataset and can still play the scenario).
Approach:
I split the problem into two parts: first, modeling how much noise will be produced by a given Who child with given presents, and second, how to optimize that value.
I declined to use the names of the Who children, since my intuition said that those shouldn't be predictive of anything. Also, there were Who children with the same name and same ID who lived years apart, which seemed like a bug.
I tried several models (random forest, gradient boosted forest) but got the best cross-validation accuracy when I used a ridge regression with product features. I ended up using the following features:
['Age', 'BlumBlooper__Age', 'BlumBlooper', 'FumFoozler__Age', 'FumFoozler__BlumBlooper', 'FumFoozler', 'GahGinka__Age', 'GahGinka__BlumBlooper', 'GahGinka__FumFoozler', 'GahGinka', 'SlooSlonker__Age', 'SlooSlonker__BlumBlooper', 'SlooSlonker__FumFoozler', 'SlooSlonker__GahGinka', 'SlooSlonker', 'SlooSlonker__GenderDummy_F', 'SlooSlonker__GenderDummy_M', 'TrumTroopa__Age', 'TrumTroopa__BlumBlooper', 'TrumTroopa__FumFoozler', 'TrumTroopa__GahGinka', 'TrumTroopa__SlooSlonker', 'TrumTroopa', 'TrumTroopa__GenderDummy_F', 'TrumTroopa__GenderDummy_M', 'WhoWhonker__Age', 'WhoWhonker__BlumBlooper', 'WhoWhonker__FumFoozler', 'WhoWhonker__GahGinka', 'WhoWhonker__SlooSlonker', 'WhoWhonker__TrumTroopa', 'WhoWhonker', 'WhoWhonker__GenderDummy_F', 'WhoWhonker__GenderDummy_M', 'GenderDummy_F__Age', 'GenderDummy_F__BlumBlooper', 'GenderDummy_F__FumFoozler', 'GenderDummy_F__GahGinka', 'GenderDummy_F', 'GenderDummy_M__Age', 'GenderDummy_M__BlumBlooper', 'GenderDummy_M__FumFoozler', 'GenderDummy_M__GahGinka', 'GenderDummy_M__GenderDummy_F', 'GenderDummy_M']
To optimize the noise, I assigned the presents randomly, checking that each was unique. Then I did a Markov chain optimization procedure where I swapped presents if it improved the score or made it worse by less than a random threshold. This procedure could probably be improved. I'm thinking about applying a quadratic programming library to the optimization procedure, but that seems kind of difficult.
Maximum noise proposal
Estimated noise: 195.72749659660874
Andy Sue Who WhoWhonker SlooSlonker
Betty Drew Who FumFoozler SlooSlonker
Sally Sue Who FumFoozler SlooSlonker
Phoebe Drew Who BlumBlooper FumFoozler
Freddie Lou Who TrumTroopa WhoWhonker
Eddie Sue Who TrumTroopa WhoWhonker
Cindy Drew Who GahGinka FumFoozler
Mary Lou Who BlumBlooper GahGinka
Ollie Lou Who BlumBlooper WhoWhonker
Johnny Drew Who TrumTroopa BlumBlooper
Minimum noise proposal
Estimated noise: 129.9544674398252
Andy Sue Who TrumTroopa GahGinka
Betty Drew Who BlumBlooper WhoWhonker
Sally Sue Who BlumBlooper WhoWhonker
Phoebe Drew Who BlumBlooper WhoWhonker
Freddie Lou Who FumFoozler GahGinka
Eddie Sue Who FumFoozler TrumTroopa
Cindy Drew Who SlooSlonker WhoWhonker
Mary Lou Who BlumBlooper SlooSlonker
Ollie Lou Who FumFoozler TrumTroopa
Johnny Drew Who FumFoozler SlooSlonker
After I posted my first post, but before reading the other answers, it occurred to me that I was probably leaving noise on the table by not modeling the individual Who children. Reading the other answers, it seems like doing that is key.
Revised results below when taking individual idosyncrasies into account in the ridge regression:
MIN SOLUTION
130.6603587239382
Andy Sue Who TrumTroopa FumFoozler
Betty Drew Who WhoWhonker BlumBlooper
Sally Sue Who BlumBlooper WhoWhonker
Phoebe Drew Who WhoWhonker BlumBlooper
Freddie Lou Who TrumTroopa GahGinka
Eddie Sue Who GahGin