All of egor.timatkov's Comments + Replies

All the responses to this post made my week. Thanks!

Sure: 
For a monogamous partner, finding a successful partner has a value of 1
Finding 2 successful partners also has a value of 1, because in a monogamous relationship, you only need one partner.
The same holds for 3, 4, etc partners. All those outcomes also have a value of 1.
So first, let's find the probability of getting a value of 0. Then let's calculate the probability of getting a value of 1.
The probability of getting a value of 0 (not finding a partner):

There is one other mutually exclusive alternative: Finding at least one partner ... (read more)

-1Anders Lindström
I know I am a parrot here, but they are playing two different games. One wants to find One partner and the stop. The other one want to find as many partners as possible. You can not you compare utility across different goals. Yes. The poly person will have higher expected utility, but it is NOT comparable to the utility that the mono person derives.  The wording should have been: 10% chance of finding a monogamous partner 10 times yields 1 monogamous partners in expectation and 0.63 in expected utility. Not: 10% chance of finding a monogamous partner 10 times yields 0.63 monogamous partners in expectation. and: 10% chance of finding a polyamorous partner 10 times yields 1 polyamorous partner in expectation and 1 in expected utility. instead of: 10% chance of finding a polyamorous partner 10 times yields 1.00 polyamorous partners in expectation. So there was a mix up in expected number of successes and expected utility.

Ah, shoot. You're right. Probably not good to use "odds" and "probability" interchangeably for percentages like I did. Should be fixed now.

95% is a lower bound. It's more than 95% for all numbers and approaches 95% as n gets bigger. If n=2 (E.G. a coin flip), then you actually have a 98.4% chance of at least one success after 3n (which is 6) attempts.

I mentioned this in the "What I'm not saying" section, but this limit converges rather quickly. I would consider any  to be "close enough"

I think what Justin is saying is that finding a single monogamous partner is not significantly different from finding two, three, etc. For some things you only care about succeeding once. So a 63% chance of success (any number of times) means a .63 expected value (because all successes after the first have a value of 0).

Meanwhile for other things, such as polyamorous partners, 2 partners is meaningfully better than one, so the expected value truly is 1, because you will get one partner on average. (Though this assumes 2 partners is twice as good as one, we can complicate this even more if we assume that 2 partners is better, but not twice as good)

-1Anders Lindström
I do not understand your reasoning. Please show your calculations.

It's a great idea. I ended up bolding the one line that states my conclusion to make it easier to spot.

That's crazy how close that is.  (to the nearest half a percent) will be a fun fact that I remember now!

My guess is that it probably works, and it's useful to have, but I think the moment that it's made public in any way, people will break it pretty easily.

I haven't, no!
It seems interesting, I'll check it out

Wow. This is some really interesting stuff. Upvoting your comment.

The a-before-e example is just there to explain, in a human readable way, how a watermark works. The main important bit is that each individual section of the text is unlikely to occur according to some objective scale, be it a-before-e, or hashing mod 10, or some other way. 
I really like your example of hashing small bits of the text to 0 mod 10 though. I would have to look into how often you can actually edit text this way without significantly changing the meaning, but as soon as that's done, you can solve for an N and find how much text you need in order to determine the text is watermarked.

Yes, so indexing all generations is absolutely a viable strategy, though like you said, it might be more expensive.
Watermarking by choosing different tokens at a specific temperature might not be as effective (as you touched on), because in order to reverse that, you need the exact input. Even a slight change to the input or the context will shift the probability distribution over the tokens, after all. Which means you can't know if the LLM chose the first or second or third most probable token just by looking at the token.
That being said, something like t... (read more)

7faul_sname
Choosing a random / pseudorandom vector in latent space and then perturbing along that vector works to watermark images, maybe a related approach would work for text? Key figure from the linked paper:   You can see that the watermark appears to be encoded in the "texture" of the image, but in a way where that texture doesn't look like the texture of anything in particular - rather, it's just that a random direction in latent space usually looks like a texture, but unless you know which texture you're looking for, knowing that the watermark is "one specific texture is amplified" doesn't really help you identify which images are watermarked. There are ways to get features of images that are higher-level than textures - one example that sticks in my mind is [Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness](https://arxiv.org/pdf/2408.05446) I would expect that a random direction in the latent space of the adversarially robust image classifier looks less texture-like but is still perceptible by a dedicated classifier far at far lower amplitude than the point where it becomes perceptible to a human. As you note, things like "how many words have the first e before the first a" are texture-like features of text, and a watermarking schema that used such a feature would work at all. However, I bet you can do a lot better than such manually-constructed features, and I would not be surprised if you could get watermarking / steganography using higher-level features working pretty well for language models. That said, I am not eager to spend unpaid effort to bring this technology into the world, since I expect most of the uses for a tool that allows a company to see if text was generated by their LLM would look more like "detecting noncompliance with licensing terms" and less like "detecting rogue behavior by agents". And if the AI companies want such a technology to exist, they have money and can pay for someone to build it.   Edit: resi

I haven't, no. I really wish I could somehow investigate all 3 pillars of a good watermark (Decisiveness, Invisibility, Robustness), but I couldn't think of any way to quantify a general text watermark's invisibility. For any given watermark you can technically rate "how invisible it is" by using an LLM's loss function to see how different the watermarked text is from the original text, but I can't come up with a way to generalize this.
So unfortunately my analysis was only about the interplay between decisiveness and robustness.