One thing that I’ve commented on a few times so far and that I repeatedly bemoan to myself as I work my way through various courses, is that none of the experiences I’m having are really tailored to me. All of these approaches to teaching deep learning are tied up in first: rebuilding neural networks from the ground up, and second: using very specific techniques on toy problem sets.
After quite some time of finding this frustrating (and about a year after I wrote my own exercises for a different course where… I did basically the exact same thing) I took a minute to think about why this would be and what the challenges are in teaching this material.
Before I get into this too much I want to point out another anecdote: when working on an earlier exercise I ran some code that didn’t match what the instructions inspected. They had a loss value for a specific network and problem, and I got a different loss value. My code still ran, and my code ended up getting significantly better performance than the expectation. Something like 80% vs 73%.
Probably the difference was that some of my hyperparameters somehow were off by a little bit, or maybe I computed a different number of things between setting the random seed and doing the training. But it turns out that if you have enough data, and your hyperparameters aren’t too far off, then the network does most of the work of convergence for you. Especially if you’re using the right kind of learning rate decay and adam optimization.
This is one of the big reasons that I don’t really care about hyperparameter tuning—given a specific application it can improve performance, but it isn’t going to bridge any theoretical gaps.
What I do care about is being able to set up a new dataset on my own computer, and turn the ins and outs of that dataset into a reinforcement learning setup that has the appropriate features, like the ability to record some memories about its past environment.
If I wanted to design an automatically graded online coding exercise to teach this skill, what would I need?
First, I would need a dataset or simulation environment large enough to be used for a deep reinforcement learning problem.
Second, I would need a specific reinforcement learning framework (whether Q-learning or policy gradient or something else).
Third, I would need a way to map the dataset or environment to the framework that can be easily explained so that students can use the same mapping consistently.
Fourth, I would need a specific agent architecture to produce consistent results that I could evaluate.
Fifth, I would need to be sure the student had enough computational resources to actually execute the agent.
Sixth, I would need a reporting format that would reliably summarize the student’s actual results and send them to my evaluation server.
Not all of these tasks are difficult, but none of them are particularly easy, and they are difficult in a wide variety of ways.
The first task, getting an environment, can be solved with things like the ATARI project or TORCS, but when I personally have tried setting those up I’ve given up in the past, which means that students would be likely to give up on setting them up. Getting that working reliably across many different computers, and writing up a better tutorial for installation, is a substantial task. Not theoretically difficult but I don’t have the time in my day to do it myself.
The second task has no added difficulty—introducing a specific reinforcement learning framework in technical detail is already something I’d expect to do anyway. Though I will note, while supervised and unsupervised learning frameworks are very well documented and most frameworks follow the scikit-learn API for these tasks, reinforcement learning models haven’t been standardized in the same way and searching for ‘keras reinforcement learning’ on Google gives me a bunch of personal github accounts for results, which isn’t exactly inspiring if I’m trying to create a product that will be useful to students moving forward. That means that either I’ll have to write a framework myself, or have students write it themselves, or use something ad hoc, or change around which libraries I’m using. None of those are clean options that easily lead me to the results I want.
The third task isn’t so hard in theory, but it may be hard to evaluate. Since I can’t run every deep learning experiment on AWS (at least not if I want students to work with something actually interesting), they’ll need to be running the evaluation on their own installation of the simulator and its intersection with their own installation of the framework. If it’s too difficult, again most students will be lost (even very smart ones like me who just hate coming back to IT issues instead of deep learning issues!)
The fourth is possibly the hardest from the actual deep learning perspective. If I want to automatically evaluate submissions, I want students to produce consistent results. But if I want students to have an authentic experience, I want them to be able to experiment with hyperparameters, training times, and architectures in an unstructured way. I could just set a benchmark, but I’d have to set it high enough that it couldn’t be passed without doing the proper tuning, which might mean requiring either a lot of tuning or a lot of training that makes the process slow. I could require that it runs in a certain amount of time to prevent them from just building a larger network and training it much harder, but this would be difficult to monitor across different computers students may be working from. And the answers to any of these concerns may depend heavily on the task I’m asking them to attend to—if they’re playing Pong the answers might be easy, but if they’re playing Tetris the answers could be impossible.
The fifth issue could probably just be solved by giving them minimum specs for a virtual environment that I provide a download link for… but those minimum specs probably will need to end up being >8GB of RAM which could be limiting (it certainly excludes students who work from a library and may also exclude say, kids who are working on their parent’s computer)
And the final sixth issue is a combination technical-pedagogical issue that can’t even really be addressed without answers to numbers 2, 3, and 4.
It’s not very interesting to look at a powerpoint slide saying “if you’ve seen humans get performance with .5% error, and your algorithm has 1% test set error and 2% dev set error, what sorts of tuning should you be doing?” and I would much rather actually have that error and try different tuning methods and see what helps.
But when I actually step back and look at what the instructor would need to do in order to reliably put me in that situation so that they can predict the consequences of my actions and make sure I’m learning something that will be applicable outside of that specific problem…
I guess I can understand why the thing I want doesn’t already exist.
One thing that I’ve commented on a few times so far and that I repeatedly bemoan to myself as I work my way through various courses, is that none of the experiences I’m having are really tailored to me. All of these approaches to teaching deep learning are tied up in first: rebuilding neural networks from the ground up, and second: using very specific techniques on toy problem sets.
After quite some time of finding this frustrating (and about a year after I wrote my own exercises for a different course where… I did basically the exact same thing) I took a minute to think about why this would be and what the challenges are in teaching this material.
Before I get into this too much I want to point out another anecdote: when working on an earlier exercise I ran some code that didn’t match what the instructions inspected. They had a loss value for a specific network and problem, and I got a different loss value. My code still ran, and my code ended up getting significantly better performance than the expectation. Something like 80% vs 73%.
Probably the difference was that some of my hyperparameters somehow were off by a little bit, or maybe I computed a different number of things between setting the random seed and doing the training. But it turns out that if you have enough data, and your hyperparameters aren’t too far off, then the network does most of the work of convergence for you. Especially if you’re using the right kind of learning rate decay and adam optimization.
This is one of the big reasons that I don’t really care about hyperparameter tuning—given a specific application it can improve performance, but it isn’t going to bridge any theoretical gaps.
What I do care about is being able to set up a new dataset on my own computer, and turn the ins and outs of that dataset into a reinforcement learning setup that has the appropriate features, like the ability to record some memories about its past environment.
If I wanted to design an automatically graded online coding exercise to teach this skill, what would I need?
First, I would need a dataset or simulation environment large enough to be used for a deep reinforcement learning problem.
Second, I would need a specific reinforcement learning framework (whether Q-learning or policy gradient or something else).
Third, I would need a way to map the dataset or environment to the framework that can be easily explained so that students can use the same mapping consistently.
Fourth, I would need a specific agent architecture to produce consistent results that I could evaluate.
Fifth, I would need to be sure the student had enough computational resources to actually execute the agent.
Sixth, I would need a reporting format that would reliably summarize the student’s actual results and send them to my evaluation server.
Not all of these tasks are difficult, but none of them are particularly easy, and they are difficult in a wide variety of ways.
The first task, getting an environment, can be solved with things like the ATARI project or TORCS, but when I personally have tried setting those up I’ve given up in the past, which means that students would be likely to give up on setting them up. Getting that working reliably across many different computers, and writing up a better tutorial for installation, is a substantial task. Not theoretically difficult but I don’t have the time in my day to do it myself.
The second task has no added difficulty—introducing a specific reinforcement learning framework in technical detail is already something I’d expect to do anyway. Though I will note, while supervised and unsupervised learning frameworks are very well documented and most frameworks follow the scikit-learn API for these tasks, reinforcement learning models haven’t been standardized in the same way and searching for ‘keras reinforcement learning’ on Google gives me a bunch of personal github accounts for results, which isn’t exactly inspiring if I’m trying to create a product that will be useful to students moving forward. That means that either I’ll have to write a framework myself, or have students write it themselves, or use something ad hoc, or change around which libraries I’m using. None of those are clean options that easily lead me to the results I want.
The third task isn’t so hard in theory, but it may be hard to evaluate. Since I can’t run every deep learning experiment on AWS (at least not if I want students to work with something actually interesting), they’ll need to be running the evaluation on their own installation of the simulator and its intersection with their own installation of the framework. If it’s too difficult, again most students will be lost (even very smart ones like me who just hate coming back to IT issues instead of deep learning issues!)
The fourth is possibly the hardest from the actual deep learning perspective. If I want to automatically evaluate submissions, I want students to produce consistent results. But if I want students to have an authentic experience, I want them to be able to experiment with hyperparameters, training times, and architectures in an unstructured way. I could just set a benchmark, but I’d have to set it high enough that it couldn’t be passed without doing the proper tuning, which might mean requiring either a lot of tuning or a lot of training that makes the process slow. I could require that it runs in a certain amount of time to prevent them from just building a larger network and training it much harder, but this would be difficult to monitor across different computers students may be working from. And the answers to any of these concerns may depend heavily on the task I’m asking them to attend to—if they’re playing Pong the answers might be easy, but if they’re playing Tetris the answers could be impossible.
The fifth issue could probably just be solved by giving them minimum specs for a virtual environment that I provide a download link for… but those minimum specs probably will need to end up being >8GB of RAM which could be limiting (it certainly excludes students who work from a library and may also exclude say, kids who are working on their parent’s computer)
And the final sixth issue is a combination technical-pedagogical issue that can’t even really be addressed without answers to numbers 2, 3, and 4.
It’s not very interesting to look at a powerpoint slide saying “if you’ve seen humans get performance with .5% error, and your algorithm has 1% test set error and 2% dev set error, what sorts of tuning should you be doing?” and I would much rather actually have that error and try different tuning methods and see what helps.
But when I actually step back and look at what the instructor would need to do in order to reliably put me in that situation so that they can predict the consequences of my actions and make sure I’m learning something that will be applicable outside of that specific problem…
I guess I can understand why the thing I want doesn’t already exist.