And it would probably want to have it's own copies be maximized as well [...] This means it would have to consider itself a form of paperclip
That's the problematic step. If maximizing copies of itself if what maximizes paperclips, it happens automatically. It doesn't have to decide "paperclips" stands for "paperclips and the 837 things I've found maximize them". It notices "making copies leads to more paperclips than self-destructing into paperclips", and moves on. Like you're not afraid that, if you don't believe growing cocoa beans is inherently virtuous, you might try to disassemble farms and build chocolate from their atoms.
I think I see what you're getting at. It's more in the vein of solving a logic/physics problem at that point. The only reason it would make the consideration I referred to would be if by making that consideration, it could make more paperclips, so it would come down to which type of replication code allowed for less effort to be spent on maximizers and more effort to be spent on paperclips over the time period considered.
(Why? Because it's fun.)
1) Do paperclip maximizers care about paperclip mass, paperclip count, or both? More concretely, if you have a large, finite amount of metal, you can make it into N paperclips or N+1 smaller paperclips. If all that matters is paperclip mass, then it doesn't matter what size the paperclips are, as long as they can still hold paper. If all that matters is paperclip count, then, all else being equal, it seems better to prefer smaller paperclips.
2) It's not hard to understand how to maximize the number of paperclips in space, but how about in time? Once it's made, does it matter how long a paperclip continues to exist? Is it better to have one paperclip that lasts for 10,000 years and is then destroyed, or 10,000 paperclips that are all destroyed after 1 year? Do discount rates apply to paperclip maximization? In other words, is it better to make a paperclip now than it is to make it ten years from now?
3) Some paperclip maximizers claim want to maximize paperclip <i>production</i>. This is not the same as maximizing paperclip count. Given a fixed amount of metal, a paperclip count maximizer would make the maximum number of paperclips possible, and then stop. A paperclip production maximizer that didn't care about paperclip count would find it useful to recycle existing paperclips, melting them down so that new ones could be made. Which approach is better?
4) More generally, are there any conditions under which the paperclip-maximizing thing to do involves destroying existing paperclips? It's easy to imagine scenarios in which destroying some paperclips causes there to be more paperclips in the future. (For example, one could melt down existing paperclips and use the metal to make smaller ones.)