Really appreciate the comment!
Use a smaller system prompt (I suspect some of the long few-shot examples can be removed / replaced by prompting?)
Makes sense, iirc this was on my todo list to try but I didn't get round to it. Those few-shot examples are insanely long.
Use prompt caching when possible
(I am pretty new to this but I understood that this is automatically taken care of for you if you use the same prompts repeatedly with the API. Idk how true this is though.)
Use a single generation (I don't know if you are using the version of the setup with "<NEXT/>")
(In my case I was already using a single generation)
Tbh my biggest takeaway and tip for others on this is that it seems you can get $300 of Google API credits by signing up for a free account. Ofc this doesn't help if you want to use Claude 3 Opus etc so it's not as useful for this particular replication.
Nice!