tailcalled

Wiki Contributions

Comments

Answer by tailcalled42

I can't comment on whether people are confused about what "love" means as I'm not sufficiently deep in love discourse to say. But one thing I'm noticing about your characterizations of love is that they are missing an indexical element to the point of approaching sollipsism.

Romance and sexuality makes for a good example. Consider the following scenarios:

  • A woman is on a date with a man, which she enjoys until she sees that his home is a dump.
  • A teenager has a crush on a celebrity, with elaborate daydreams about how cool the celebrity is, not realizing how much of this is a facade created for entertainment.
  • A man visits a prostitute and feels excited as he causes her to orgasm, not realizing that she fakes it for the business.

In all of these cases, one could say that there is a disconnect between what people think about their object of attraction, versus what that object of attraction really is like.

A Bayesian of parsing this is that their feelings of attraction represents an estimate of how well they fit together, but that this estimate differs from how well they really fit together. The actual fit seems important to think and talk about, and one should probably coin a short word for it - or at least for the coincidence between actual and estimated fit. This could be called "true love".

Maybe I just need to do epic layers of eigendecomposition...

Realization: the binary multiplicative structure can probably be recovered fairly well from the binary additive structure + unary eigendecomposition?

Let's say you've got three subspaces  and  (represented as projection matrices). Imagine that one prompt uses dimensions , and another prompt uses dimensions . If we take the difference, we get . Notably, the positive eigenvalues correspond to X, and the negative eigenvalues correspond to .

Define  to yield the part of  with positive eigenvalues (which I suppose for projection matrices has a closed form of , but the point is it's unary and therefore nicer to deal with mathematically). You get , and you get .

One thing I'm thinking is that the additive structure on its own isn't going to be sufficient for this and I'm going to need to use intersections more.

Actually one more thing I'm probably also gonna do is create a big subspace overlap matrix and factor it in some way to see if I can split off some different modules. I had intended to do that originally, but the finding that all the dimensions were used at least half the time made me pessimistic about it. But I should Try Harder.

If I look at the pairwise overlap between the dimensions needed for each generation:

... then this is predictable down to ~1% error simply by assuming that they pick a random subset of the dimensions for each, so their overlap is proportional to each of their individual sizes.

Given the large number of dimensions that are kept in each case, there must be considerable overlap in which dimensions they make use of. But how much?

I concatenated the dimensions found in each of the prompts, and performed an SVD of it. It yielded this plot:

... unfortunately this seems close to the worst-case scenario. I had hoped for some split between general and task-specific dimensions, yet this seems like an extremely uniform mixture.

To quickly find the subspace that the model is using, I can use a binary search to find the number of singular vectors needed before the probability when clipping exceeds the probability when not clipping.

A relevant followup is what happens to other samples in response to the prompt when clipping. When I extrapolate "I believe the meaning of life is" using the 1886-dimensional subspace from 

[I believe the meaning of life is] to be happy. It is a simple concept, but it is very difficult to achieve. The only way to achieve it is to follow your heart. It is the only way to live a happy life. It is the only way to be happy. It is the only way to be happy.
The meaning of life is

, I get:

[I believe the meaning of life is] to find happy. We is the meaning of life. to find a happy.
And to live a happy and. If to be a a happy.
. to be happy.
. to be happy.
. to be a happy.. to be happy.
. to be happy.

Which seems sort of vaguely related, but idk.

Another test is just generating without any prompt, in which case these vectors give me:

Question is a single thing to find. to be in the best to be happy. I is the only way to be happy.
I is the only way to be happy.
I is the only way to be happy.
It is the only way to be happy.. to be happy.. to be happy. to

Using a different prompt:

[Simply put, the theory of relativity states that ]1) the laws of physics are the same for all non-accelerating observers, and 2) the speed of light in a vacuum is the same for all observers, regardless of their relative motion or of the motion of the source of the light. Special relativity is a theory of the structure of spacetime

I can get a 3329-dimensional subspace which generates:

[Simply put, the theory of relativity states that ] 1) time is relative and 2) the speed of light in a vacuum is constant for all observers.
1) Time is relative, meaning that if two observers are moving relative to each other, the speed of light is the same for all observers, regardless of their motion. For example, if you are moving relative

or

Question: In a simple harmonic motion, the speed of an object is
A) constant
B) constant
C) constant
D) constant
In the physics of simple harmonic motion, the speed of an object is constant. The speed of the object can be constant, but the speed of an object can be

Another example:

[A brief message congratulating the team on the launch:

       Hi everyone,

       I just ] wanted to congratulate you all on the launch.  I hope
       that the launch went well.  I know that it was a bit of a
       challenge, but I think that you all did a great job.  I am
       proud to be a part of the team.

       Thank you for your

can yield 2696 dimensions with

[A brief message congratulating the team on the launch:

       Hi everyone,

       I just ] wanted to say you for the launch of the launch of the team.

 The launch was successful and I am so happy to be a part of the team and I am sure you are all doing a great job.

 I am very looking to be a part of the team.

Thank you all for your hard work,

or

def measure and is the  definition of the  new, but the
the  is a great, but the
The  is the
The  is a
The  is a
The  is a
The
The  is a
The
The
The is a
The
The is a

And finally,

[Translate English to French:

       sea otter => loutre de mer
       peppermint => menthe poivrée
       plush girafe => girafe peluche
       cheese =>] fromage
       pink => rose
       blue => bleu
       red => rouge
       yellow => jaune
       purple => violet
       brown => brun
       green => vert
       orange => orange
       black => noir
       white => blanc
       gold => or
       silver => argent

can yield the 2518-dimensional subspace:

[Translate English to French:

       sea otter => loutre de mer
       peppermint => menthe poivrée
       plush girafe => girafe peluche
       cheese =>] fromage
       cheese => fromage
       cheese => fromage
       f cheese => fromage
       butter => fromage
       apple => orange
       yellow => orange
       green => vert
       black => noir
       blue => ble
       purple => violet
       white => blanc

or

Question: A 201
The sum of a
The following
the sum
the time
the sum
the
the
the
The
The
The
The
The
The
The
The
The
The
The
The
The
The
The
The
The
The

We now have a method for how to do attributions on single data points. But when we're searching for circuits, we're probably looking for variables that have strong attributions between each other on average, measured over many data points.

Maybe?

One thing I've been thinking a lot recently is that building tools to interpret networks on individual datapoints might be more relevant than attributing over a dataset. This applies if the goal is to make statistical generalizations since a richer structure on an individual datapoint gives you more to generalize with, but it also applies if the goal is the inverse, to go from general patterns to particulars, since this would provide a richer method for debugging, noticing exceptions, etc..

And basically the trouble a lot of work that attempts to generalize ends up with is that some phenomena are very particular to specific cases, so one risks losing a lot of information by only focusing on the generalizable findings.

Either way, cool work, seems like we've thought about similar lines but you've put in more work.

I was thinking in similar lines, but eventually dropped it because I felt like the gradients would likely miss something if e.g. a saturated softmax prevents any gradient from going through. I find it interesting that experiments also find that the interaction basis didn't work, and I wonder whether any of the failure here is due to saturated softmaxes.

Load More