Thanks for the great article. A general question - what happens if the action space in the environment is state-dependent? In this case, if I use an "Atari-like" Neural Network for approximating Q function, it will also give values for Q(a,s) for non-feasible pairs of a and s. In practice, I could just ignore these pairs, but will this create any problems theoretically speaking? If so, could you give a quick suggestion about how to fix this or where to look for solution?
Hello,
Thanks for the great article. A general question - what happens if the action space in the environment is state-dependent? In this case, if I use an "Atari-like" Neural Network for approximating Q function, it will also give values for Q(a,s) for non-feasible pairs of a and s. In practice, I could just ignore these pairs, but will this create any problems theoretically speaking? If so, could you give a quick suggestion about how to fix this or where to look for solution?
Thanks!