Wiki Contributions

Comments

I'd say "in many contexts" in practice refers to when you are already working with relatively blessed information. It's just that while most domains are overwhelmingly filled with garbage information (e.g. if you put up a camera at a random position on the earth, what it records will be ~useless), the fact that they are so filled with garbage means that we don't naturally think of them as being "real domains".

Basically, I don't mean that blessed information is some obscure thing that you wouldn't expect to encounter, I mean that people try to work with as much blessed information as possible. Logs were sort of a special case of being unusually-garbage.

You can't distill information in advance of a bug (or anomaly, or attack) because a bug by definition is going to be breaking all of the past behavior & invariants governing normal behavior that any distillation was based on.

Depends. If the system is very buggy, there's gonna be lots of bugs to distill from. Which bring us to the second part...

The logs are for the exceptions - which are precisely what any non-end-to-end lossy compression (factor analysis or otherwise) will correctly throw out information about to compress as residuals to ignore in favor of the 'signal'.

Even if lossy compression threw out the exceptions we were interested in as being noise, that would actually still be useful as a form of outlier detection. One could just zoom in on the biggest residuals and check what was going on there.

Issue is, the logs end up containing ~all the exceptions, including exceptional user behavior and exceptional user setups and exceptionally error-throwing non-buggy code, but the logs are only useful for bugs/attacks/etc. because the former behaviors are fine and should be supported.

It's also putting the attribute on the wrong thing - it's not garbage data, it's data that's useful for other purposes than the one at hand.

Mostly it's not useful for anything. Like the logs contains lots of different types of information, and all the different types of information are almost always useless for all purposes, but each type of information has a small number of purpose for which a very small fraction of that information is useful.

Blessed and cursed are much worse as descriptors. In most cases there's nobody doing the blessing or cursing, and it focuses the mind on the perception/sanctity of the data, not the use of it.

This is somewhat intentional. One thing one can do with information is give it to others who would not have seen it. Here one sometimes needs to be careful to preserve and highlight the blessed information and eliminate the cursed information.

Actually I guess that's kind of trivial (the belief state geometry should be tensored together). Maybe a more interesting question is what happens if you use a Markov chain to transduce it.

One thing I'm concerned about is that this seems most likely to work for rigid structures like CNNs and RNNs, rather than dynamic structures like Transformers. Obviously the original proof of concept was done in a transformer, but it was done in a transformer that was modelling a Markov model, whereas in the general case, transformers can model non-Markov processes

Well, sort of - obviously they ultimately still have a fixed context window, but the difficulty in solving the quadratic bottleneck suggests that this context window is an important distorting factor in how Transformers work - though maybe Mamba will save us, idk.

What do the fractals look like if you tensor two independent variables together?

Market:

Reminder that making multiple similar markets is encouraged so we get different angles on it.

with "one outcome is real" axiom

How would you formulate this axiom?

The point is that I can understand the argument for why the true stochasticity may be coherent, but I don't get why it would be better.

I find your post hard to respond to because it asks me to give my opinion on "the" mathematical model of true stochasticity, yet I argued that classical math is deterministic and the usual way you'd model true stochasticity in it is as many-worlds, which I don't think is what you mean (?).

Like, are you saying the mathematical model of true stochasticity (with some "one thing is real" formalization) is somehow incomplete or imprecise or wrong, because mathematics is deterministic?

Which model are you talking about here?

Another issue with "ontologising" the Wigner function is that you need some kind of idea of what those negatives "really mean". I spent some time thinking about "If the many worlds interpretation comes from ontologising the wavefunction, what comes from doing that to the Wigner function?" a few years ago. I never got anywhere.

Wouldn't it also be many worlds, just with a richer set of worlds? Because with wavefunctions, your basis has to pick between conjugate pairs of variables, so your "worlds" can't e.g. have both positions and momentums, whereas Wigner functions tensor the conjugate pairs together, so their worlds contain both positions and momentums in one.

Neat!

I'd expect Wigner functions to be less ontologically fundamental than wavefunctions because a wavefunction into a real function in this way introduces a ton of redundant parameters, since now it's a function of phase space instead of configuration space. But they're still pretty cool.

Load More