I wrote a "how to" guide for picking a reference class here. Would be interested in any feedback from experienced forecasters. I haven't posted it because the example I work through is part of an upcoming, but as yet unpublished, piece of research I'm working on.
I just wrote a different post which discusses this issue in slightly different terms, with a few links which might be helpful: https://www.lesswrong.com/posts/SxpNpaiTnZcyZwBGL/multitudinous-outside-views
The reference class problem is when you have a singular phenomena (e.g your friend Josh) and to extrapolate data and make predictions about this singular phenomena, you have to put it in a reference class of similar phenomena. The question becomes how you quantify similarity. Everything has an indefinite number of proporties that could be used as the basis for selecting a reference class (Josh is male, likes jazz, is an animal, is born in Germany, has a freckle on his toe etc). You can almost always select a reference class in such a way that you get the results you want to see. So how do you judge a reference class?
EDIT: Put up a $100 bounty for anyone who can solve it before 2022