LESSWRONG
LW

All of Matthias G. Mayer's Comments + Replies

Microsoft Research Paper Claims Sparks of Artificial Intelligence in GPT-4

Matthias G. Mayer

2y41

It is correct, but I agree the reasoning is not written out properly. I'll write y for $y^{*}$

We are in the case $g (x) > x^{2}$ . So if $g (y) >= y^{2}$ , then $g (x) + g (y) > x^{2} + y^{2} >= 2 x y$ , where the last inequality is true because it's equivalent to $(x - y)^{2} >= 0$ . So we have $g (x) + g (y) > 2 x y$ , which is a contradiction to the choice of y.

Using Finite Factored Sets for Causal Representation Learning?

Matthias G. Mayer

2y21

A direct application would need that you have an uncountable variable. You might want to do this if you have enough evidence to say this confidently. As a simple example imagine a real-valued graph where all your data points lie almost on the identity diagonal. You might then want to infer a variable which is the identity.

As a more general application, we want to model infinities because the world is probably infinite in some aspects. We then want a theorem that tells us, that even if the underlying model is infinite, if you have enough data points then you are close enough, like with the Strong law of Large numbers, for example.

Using Finite Factored Sets for Causal Representation Learning?

Answer by Matthias G. Mayer

Jan 11, 202330

I'm working on the FFS framework in general. I'm currently writing up decidability of finite temporal inference. After this I will probably start working on efficient finite temporal inference which is what you're referencing if I understood correctly.

I'm also working on extending the framework to the infinite setting and am almost finished except for conditional orthogonality for uncountable sets.

I quite like the name Logical Time Theory, under which I will probably publish those results in a month or so.

1David Reber2y

Hmm, what would be the intuition/application behind the uncountable setting? Like, when would one want that (I don't mind if it's niche, I'm just struggling to come up with anything)?

Finite Factored Sets in Pictures

Matthias G. Mayer

2y80

Here, we need to find a variable W such that
P(W|X,Y) is deterministic, because X and Y already fully describe our sample space. This means P(W|X,Y) is either 0 or 1
Z and W are independent
X and W are dependent
Y and W are dependent

I think your arguments in Section 3 to rule out Graph 3 can't be correct if you accept Graph 2.

To see this, note that there is a symmetry between $X$ and $Z$ . Namely, if we use FFS temporal inference, then we know that $X$ and $Z$ are both before $Y$ (and $\neg Y$ ).(here we even have&nb... (read more)

3Magdalena Wache2y

Good point, you are right! I just edited the post and now say that graph 2, 3, and 4 only have instantiations that are equivalent to graph 1. fixed, thanks!

Basin broadness depends on the size and number of orthogonal features

Matthias G. Mayer

3y*40

Two features are orthogonal if their $L_{2}$ norm is zero

Just as a side note about terminology: It is a bit imprecise that you use innerproduct and norm interchangeably.

Innerproduct is the function $< ., . >: H \times H \to R$ and the norm is $| | . | | : H \to R, x \mapsto \sqrt{< x, x >}$

Trying to find the underlying structure of computational systems

Matthias G. Mayer

3y40

The internals of a system of course determine its functional behavior. But there might be different systems that differ only in what they actually do. E.g. different sort algorithms all end up with a sorted list but sort it differently. Likewise, a pathfinding algorithm like Dijkstra is different than checking every possible path and picking the best one.

Looking only at functional behavior strips you of your ability to make predictions. You only know what has already happened. You can't generalize to new inputs.

This is the actual crux of why we care ... (read more)

1Ilio3y

Thanks, that clarifies your aims a lot. Did you gave some thoughts on how your approach would deal with cases of embodied cognition and uses of external memories?

Matthias G. Mayer

3y3-1

What if you had some computation that could be interpreted (e.g. decrypted with two different keys) as either a simulation full of happy people, or a simulation full of depressed people? I think an adequate theory of experience is able to look at the encrypted computation (or any computation) and decide directly if there is suffering happening there.

Also, what is the difference between normal computation and encrypted computation? I feel like looking at a process that you haven't programmed yourself is not really that different than looking at an encrypted... (read more)

1Tamsin Leake3y

i suspect that those two kinds of computation in fact have a profoundly different shape, such that you can't have something that can convert into either in a simple manner. if i am wrong about this, then alignment is harder than i thought, and i don't know what to think about encrypted computations in such a situation — i guess nuke them just to be safe? we can figure out some parts of normal programs, and sometimes possibly even large parts. whereas, an encrypted computation should be guaranteed to be un-figureout-able without exponential compute. the same way i could figure out some of the meaning of a large text that's been translated into dutch, but i'd likely be completely unable to figure out the meaning of a large text that's been encrypted through say AES.

Trying to find the underlying structure of computational systems

Matthias G. Mayer

3y40

This is not what I meant (I've edited the post to make that clearer). I am looking for a way to naturally express that a result of a computation changes how the computation progresses. In a*(b+c) + (1-a)*(b-c) you compute both (b+c)and (b-c) . This is not what actually happens in the program.

The curried node is an interesting idea but breaks down if we move away from this toy example. If both branches contain subgraphs with a different amount of nodes and different connections between them then currying does not work (or is very unnatural).

(Currying is a nice idea so yes)

Trying to find the underlying structure of computational systems

Matthias G. Mayer

3y50

Turing completeness regards only the functional behavior of a class of computational systems. I want to look at the internals, what the system is actually doing, and find abstractions in there: Modularity, search processes, and steering mechanisms for instance.

So it’s not about finding yet another framework whose expressiveness is equivalent to Turing completeness. It’s about finding a framework to express the actual computation.

4Ilio3y

In what sense is the functional behavior different from the internals/actual computations? Could you provide a few toy examples?