Right. Imagine an agent picking actions in a discrete-time game. Each time-advancing decision is a step. (E.g. for a debate, submitting one argument is a step.) But you don't just leave it running forever, (typically) you occasionally reset the environment to a (potentially random) starting state and let the agent try again - an episode.
This is correct, but at least in the quote above, the most important distinction is that most RL algorithms propagate credit assignment back across steps but not across episodes.
I agree, except I want to add a caveat.
Sometimes 'step' refers to such atomic environmental interactions. Then this is right.
Other times, 'step' (especially 'training step' or 'gradient step' but not always qualified) refers to a step in a training algorithm. For example a classic pattern in RL is collect many episodes or sub-episode trajectory fragments, and use them to compute a gradient update. That's also called a 'step'. Outside of RL, this is probably the only (or at least main) use of the word 'step'.
Thanks Charlie, Evan H. and Oliver. Your comments definitely help to give me a clearer picture.
My impression is that steps and episodes are both time periods in a training process, and that these terms are somewhat common in RL. An episode is larger than a step and usually contains many steps.
Is this correct?
Some related questions:
I'd appreciate answers to any of these questions if you know them. (It's ok if you don't know all the answers or don't have time to include them all in a single answer.)
What's the context?
I have seen these terms come up in discussions about myopia in AI.
For example, in Evan Hubinger's post about AI safety via market making, he invokes the terms "per-step" as well as "per-episode" myopia:
Understanding these terms more clearly will help me understand proposals about myopia, which is relevant to some research I'm doing right now. These also seem like basic or generally important concepts in ML which I'd like to understand better.