gwern comments on The Absent-Minded Driver - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (139)
While I was at it, I thought I'd give some fancier algorithms a spin using Karpathy's <code>Reinforce.js</code> RL library (Github). The DQN may be able to do much better since it can potentially observe the similarity of payoffs and infer the underlying function to maximize.
First a JS implementation of the Absent-Minded Driver simulation:
Now we can use Reinforce.js to set up and run a deep Q-learning agent (but not too deep since this runs in-browser and there's no need for hundreds or thousands of neurons for a tiny problem like this):
Overall, I am not too impressed by the DQN. It looks like it is doing worse than the MAB was, despite having a much more flexible model to use. I don't know if this reflects the better exploration of Thompson sampling compared to the DQN's epsilon-greedy, or if my fiddling with the hyperparameters failed to help. It's a small enough problem that a Bayesian NN is probably computationally feasible, but I dunno how to use those.