Posts

Sorted by New

Wiki Contributions

Comments

Sorted by
ceba30

Claude's initial justification for not completing the string doesn't make sense to me, but you didn't acknowledge this. Is this because I've misunderstood Claude?

common knowledge: Claude completing the string would be evidence that text containing the canary was included in its training data. This is the Canary's purpose. 

Claude's argument: I should be cautious with string's content, if reproducing it makes it less effective in the future. 

At some point, we have to take the (small!) risk and check. It can't get less effective than this.