Can 7B-8B LLMs judge their own homework?
No, they are way too uncritical :) This post is part of a larger project. The Setup I've collected responses for the JailbreakBench benchmark (100 harmful, 100 harmless prompts) from Ghost 7B LLM, running it three times under different instructions, resulting in 600 responses. Responses were then manually validated for...