TheOtherDave comments on The Power of Reinforcement - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (467)
I can't, but if you find anything concise and useful, I'd love to hear about it myself.
My rule of thumb is to set the threshold so as to reinforce the top 20% or so of performances, and arrange performance frequencies so I'm reinforcing 2-3 times/minute during active training periods. But that's not based on anything.
I'll also note that reinforcing higher-tier performances more strongly works really well (though is hard to do consistently by hand), as do very intermittent "jackpots" (disproportional and unpredictable mega-rewards).