All of Gurkenglas's Comments + Replies

It sounds like you're trying to define unfair as evil.

I just meant the "guts of the category theory" part. I'm concerned that anyone says that it should be contained (aka used but not shown), and hope it's merely that you'd expect to lose half the readers if you showed it. I didn't mean to add to your pile of work and if there is no available action like snapping a photo that takes less time than writing the reply I'm replying to did, then disregard me.

1Lorxus
The phrasing I got from the mentor/research partner I'm working with is pretty close to the former but closer in attitude and effective result to the latter. Really, the major issue is that string diagrams for a flavor of category and commutative diagrams for the same flavor of category are straight-up equivalent, but explicitly showing this is very very messy, and even explicitly describing Markov categories - the flavor of category I picked as likely the right one to use, between good modelling of Markov kernels and their role doing just that for causal theories (themselves the categorification of "Bayes nets up to actually specifying the kernels and states numerically") - is probably too much to put anywhere in a post but an appendix or the like. There is not, but that's on me. I'm juggling too much and having trouble packaging my research in a digestible form. Precarious/lacking funding and consequent binding demands on my time really don't help here either. I'll add you to the long long list of people who want to see a paper/post when I finally complete one. I guess a major blocker for me is - I keep coming back to the idea that I should write the post as a partially-ordered series of posts instead. That certainly stands out to me as the most natural form for the information, because there's three near-totally separate branches of context - Bayes nets, the natural latent/abstraction agenda, and (monoidal category theory/)string diagrams - of which you need to somewhat understand some pair in order to understand major necessary background (causal theories, motivation for Bayes net algebra rules, and motivation for string diagram use), and all three to appreciate the research direction properly. But I'm kinda worried that if I start this partially-ordered lattice of posts, I'll get stuck somewhere. Or run up against the limits of what I've already worked out yet. Or run out of steam with all the writing and just never finish. Or just plain "no one will want to

What if you say that when it was fully accurate?

2Joseph Miller
Then it will often confabulate a reason why the correct thing it said was actually wrong. So you can never really trust it, you have to think about what makes sense and test your model against reality.  But to some extent that's true for any source of information. LLMs are correct about a lot of things and you can usually guess which things they're likely to get wrong.
2Mateusz Bagiński
Not OP but IME it might (1) insist that it's right, (2) apologize, think again, generate code again, but it's mostly the same thing (in which case it might claim it fixed something or it might not), (3) apologize, think again, generate code again, and it's not mostly the same thing.

give me the guts!!1

don't polish them, just take a picture of your notes or something.

1Lorxus
I guess? I mean, there's three separate degrees of "should really be kept contained"-ness here: * Category theory -> string diagrams, which pretty much everyone keeps contained, including people who know the actual category theory * String diagrams -> Bayes nets, which is pretty straightforward if you sit and think for a bit about the semantics you accept/are given for string diagrams generally and maybe also look at a picture of generators and rules - not something anyone needs to wrap up nicely but it's also a pretty thin * [Causal theory/Bayes net] string diagrams -> actual statements about (natural) latents, which is something I am still working on; it's turning out to be pretty effortful to grind through all the same transcriptions again with an actually-proof-usable string diagram language this time. I have draft writeups of all the "rules for an algebra of Bayes nets" - a couple of which have turned out to have subtleties that need working out - and will ideally be able to write down and walk through proofs entirely in string diagrams while/after finishing specifications of the rules. So that's the state of things. Frankly I'm worried and generally unhappy about the fact that I have a post draft that needs restructuring, a paper draft that needs completing, and a research direction to finish detailing, all at once. If you want some partial pictures of things anyway all the same, let me know.

Congratulations on changing your mind!

It’s sorta suspicious that I only realized those now, after I officially dropped the project

You should try dropping your other idea and seeing if you come up with reasons that one is wrong too! And/or pick this one up again, then come up with reasons it's a good idea after all. In the spirit of "You can't know if something is a good idea until you resolve to do it"!

In general, I wish this year? (*checks* huh, only 4 months.) of planning this project had involved more empiricism. For example, you could've just checked whether a language model trained on ocean sounds can say what the animals are talking about.

Reply3111
1Towards_Keeperhood
Nah I didn't loose that much time. I already quit the project end of January, I just wrote the post now. Most of the technical work was also pretty useful for understanding language, which is a useful angle on agent foundations. I had previously expected working on that angle to be 80% as effective as my previous best plan, but it was even better, around similarly good I think. That was like 5-5.5 weeks and that was not wasted. I guess I spent like 4.5 weeks overall on learning about orcas (including first seeing whether I might be able to decode their language and thinking about how and also coming up with the whole "teach language" idea), and like 3 weeks on orga stuff for trying to make the experiment happen.

Hmm. Sounds like it was not enough capsaicin. Capsaicin will drive off bears, I hear. I guess you'd need gloves for food, or permanent gloves without the nail polish. Could you use one false nail as a chew toy?

2DirectedEvolution
Unfortunately the level of physical restraint I’d need to stop biting is too costly to be worth it to me.
2DirectedEvolution
It actually did contain capsaicin IIRC. Sort of a bitter spicy mix. The other issue is it gets on things you touch, including food if you’re preparing or eating it by hand.
2DirectedEvolution
I’ve tried that, but it’s not enough to stop me. Makes my mouth taste disgusting for no benefit.
1Rafka
Yeah I thought about that, but (I didn't expand on that) the habit also included picking skin around my cuticles with my fingers, so that would've only half worked at best.

Link an example, along with how cherry-picked it is?

To prepare for abundant cognition you can install a keylogger.

2Raemon
Do you have existing ones you recommend? I'd been working on a keylogger / screenshot-parser that's optimized for a) playing nicely will LLMs while b) being unopinionated about what other tools you plug it into. (in my search for existing tools, I didn't find keyloggers that actually did the main thing I wanted, and the existing LLM-tools that did similar things were walled-garden-ecosystems that didn't give me much flexibility on what I did with the data)

As a kid, I read about vacuum decay in a book and told the other kids at school about it. A year? later one kid asked me how anyone knows about it. Mortified that I didn't think of that, I told him that I made it up. ("I knew it >:D!") It is the one time I remember outside games of telling someone something I disbelieve so that they'll believe it, and ever since remembering the scene as an adult I'm failing to track down that kid :(.

Oh, you're using AdamW everywhere? That might explain the continuous training loss increase after each spike, with AdamW needing time to adjust to the new loss landscape...

Lower learning rate leads to more spikes? Curious! I hypothesize that... it needs a small learning rate to get stuck in a narrow local optimum, and then when it reaches the very bottom of the basin, you get a ~zero gradient, and then the "normalize gradient vector to step size" step is discontinuous around zero.

Experiments springing to mind are:
1. Do you get even fewer spikes if you incr... (read more)

1Rareș Baron
Your hypothesis seems reasonable, and I think the following proves it. 1. This is for 5e-3, giving no spikes and faster convergences: 2. Gradient descent failed to converge for multiple LRs, from 1e-2 to 1e-5. However, decreasing the LR by 1.0001 when the training error increases gave this: It's messy, and the decrease seems to turn the jumps of the slingshot effect into causes for getting stuck in sub-optimal basins, but the trajectory was always downwards. Increasing the rate of reduction decreased spikes but convergence no longer appeared. An increase to 2. removed the spikes entirely.
1Rareș Baron
Your hypothesis seems reasonable, and I think the following proves it. 1. This is for 5e-3, giving no spikes and faster convergences: 2. Gradient descent failed to converge for multiple LRs, from 1e-2 to 1e-5. However, decreasing the LR by 1.0001 when the training error increases gave this: It's messy, and the decrease seems to turn the jumps of the slingshot effect into causes for getting stuck in sub-optimal basins, but the trajectory was always downwards. Increasing the rate of reduction decreased spikes but convergence no longer appeared. An increase to 2. removed the spikes entirely.

My eyes are drawn to the 120 or so downward tails in the latter picture; they look of a kind with the 14 in https://39669.cdn.cke-cs.com/rQvD3VnunXZu34m86e5f/images/2c6249da0e8f77b25ba007392087b76d47b9a16f969b21f7.png/w_1584. What happens if you decrease the learning rate further in both cases? I imagine the spikes should get less tall, but does their number change? Only dot plots, please, with the dots drawn smaller, and red dots too on the same graph.

I don't see animations in the drive folder or cached in Grokking_Demo_additional_2.ipynb (the most recent... (read more)

1Rareș Baron
I have uploaded html files of all the animation so they can be interactive. The corresponding training graphs are in the associated notebooks. The original learning rate was 1e-3. For 5e-4, it failed to converge: For 8e-4, it did converge, and the trajectory was downwards this time:

Can a eat that -1?

1Jerdle
It could do, but a represents the amount of utility remaining. Maybe the more natural thing would be to have a be the effective tax rate, and have it be (z/x)^a.

What is x and why isn't it cancelling?

1Jerdle
x is the initial income, and I forgot to cancel it. Good point. Turns out, it's far simpler than I had it as.

When splitting the conjuction, Bob should only have to place $4 in escrow, since that is the most in the red that Bob could end up. (Unless someone might privately prove P&Q to collect Alice's bounty before collecting both of Bob's? But surely Bob first bought exclusive access to Alice's bounty from Alice.)

Mimicing homeostatic agents is not difficult if there are some around. They don't need to constantly decide whether to break character, only when there's a rare opportunity to do so.

If you initialize a sufficiently large pile of linear algebra and stir it until it shows homeostatic behavior, I'd expect it to grow many circuits of both types, and any internal voting on decisions that only matter through their long-term effects will be decided by those parts that care about the long term.

3faul_sname
Where does the gradient which chisels in the "care about the long term X over satisfying the homeostatic drives" behavior come from, if not from cases where caring about the long term X previously resulted in attributable reward? If it's only relevant in rare cases, I expect the gradient to be pretty weak and correspondingly I don't expect the behavior that gradient chisels in to be very sophisticated.

Having apparently earned some cred, I will dare give some further quick hints without having looked at everything you're doing in detail, expecting a lower hit rate.

  1. Have you rerun the experiment several times to verify that you're not just looking at initialization noise?
  2. If that's too expensive, try making your models way smaller and see if you can get the same results.
  3. After the spikes, training loss continuously increases, which is not how gradient descent is supposed to work. What happens if you use a simpler optimizer, or reduce the learning rate?
  4. Some o
... (read more)
1Rareș Baron
For 1 and 2 - I have. Everything is very consistent. For 3, I have tried several optimizers, and they all failed to converge. Tweaking the original AdamW to reduce the learning rate lead to very similar results: For 4, I have done animations for every model (besides the 2 GELU variants). I saw pretty much what I expected: a majority of relevant developments (fourier frequencies, concentration of singular values, activations and attention heads) happened quickly, in the clean-up phase. The spikes seen in SiLU and SoLU_LN were visible, though not lasting. I have uploaded the notebooks to the drive folder, and have updated the post to reflect these findings. Thank you very much, again!  

I'm glad that you're willing to change your workflow, but you have only integrated my parenthetical, not the more important point. When I look at https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/tzkakoG9tYLbLTvHG/lelcezcseu001uyklccb, I see interesting behavior around the first red dashed line, and wish I saw more of it. You ought to be able to draw 25k blue points in that plot, one for every epoch - your code already generates that data, and I advise that you cram as much of your code's data into the pictures you look at as you reasonably can.

1Rareș Baron
I am sorry for being slow to understand. I hope I will internalise your advice and the linked post quickly. I have re-done the graphs, to be for every epoch. Very large spikes for SiLU were hidden by the skipping. I have edited the post to rectify this, with additional discussion. Again, thank you (especially your patience).

The forgetful functor FiltSet to Set does not have a left adjoint, and egregiously so - you have added just enough structure to rule out free filtered sets, and may want to make note of where this is important..

(S⊗-) has a right adjoint, suggesting the filtered structure to impose on function sets: The degree of a map f:S->T would be how far it falls short of being a morphism, , as this is what makes S⊗U->T one-to-one with U->(S->T).

...what I meant is that plots like this look like they would have had more to say if you had plotted the y value after e.g. every epoch. No reason to throw away perfectly good data, you want to guard against not measuring what you think you are measuring by maximizing the bandwidth between your code and your eyes. (And the lines connecting those data points just look like more data while not actually giving extra information about what happened in the code.)

1Rareș Baron
Apologies for misunderstanding. I get it now, and will be more careful from now on. I have re-run the graphs where such misunderstandings might appear (for this and a future post), and added them here. I don't think I have made any mistakes in interpreting the data, but I am glad to have looked at the clearer graphs. Thank you very much!

Some of these plots look like they ought to be higher resolution, especially when Epoch is on the x axis. Consider drawing dots instead of lines to make this clearer.

1Rareș Baron
I will keep that in mind for the future. Thank you! I have put all high-quality .pngs of the plots in the linked Drive folder.

All we need to create is a Ditto. A blob of nanotech wouldn't need 5 seconds to take the shape of the surface of an elephant and start mimicing its behavior; is it good enough to optionally do the infilling later if it's convenient?

2Owain_Evans
It's on our list of good things to try.

Buying at 12% and selling at 84% gets you 2.8 bits.

Edit: Hmm, that's if he stakes all his cred, by Kelly he only stakes some of it so you're right, it probably comes out to about 1 bit.

The convergent reason to simulate a world is to learn what happens there. When to intervene with letters depends on, uh. Why are you doing that at all?

(Edit: I suppose a congratulatory party is in order when they simulate you back with enough optimizations that you can talk to each other in real time using your mutual read access.)

I deferred my decision to after visiting the Learning Theory course. At the time, the timing had made them seem vaguely affiliated with this programme.

Can you just give every thief a body camera?

Re first, yep, I missed that :(. M does sound like a more worthy barrier than U. Do you have a working example of a (U,M) where some state machine performs well in a manner that's hard to detect?

Re second, I realized that this only allows discrete utilities but didn't think to therefore try a π' that does an exhaustive search over policies ^^. (I assume you are setting "uncomputable to measure performance because that involves the Solomonoff prior" aside here.) Even so, undecidability of whether 000... and 111... get the same utility sounds like a bug. Wha... (read more)

2Vanessa Kosoy
I don't think that undecidability of exact comparison (as opposed to comparison within any given margin of error) is necessarily a bug, however, if you really want comparison for periodic sequences, you can insist that the utility function is defined by a finite state machine. This is in any case already a requirement in the bounded compute version.

Regarding 17.4.Open:

Consider π' which try all state machines up to a size and imitate the one that performs best on (U,M); this would tighten the O(nlogn) bound to O(BB^-1(n)).

This fails because your utility functions return constructive real numbers, which don't implement comparison. I suggest that you make it possible to compare utilities.[1]

In which case we get: Within every decidable machine class where every member halts, agents are uncomputably smol.

 

  1. ^

    Such as by

    making P(s,s') return the order of U(s) and U(s').

2Vanessa Kosoy
First, it's uncomputable to measure performance because that involves the Solomonoff prior. You can approximate it if you know some bits of Chaitin's constant, but that brings a penalty into the description complexity. Second, I think that saying that comparison is computable means that the utility is only allowed to depend on a finite number of time steps, it rules out even geometric time discount. For such utility functions, the optimal policy has finite description complexity, so g is upper bounded. I doubt that's useful.

If you didn't feel comfortable running it overnight, why did you publish the instructions for replicating it?

2niplav
I had a conversation with Claude 3.6 Sonnet about this, and together we concluded that the worry was overblown. I should've added that in, together with a justification.
4kave
Looks like the base url is supposed to be niplav.site. I'll change that now (FYI @niplav)

I'm hoping more for some stepping stones between the pre-theoretic concept of "structural" and the fully formalized 99%-clause. If we could measure structuralness more directly we should be able to get away with less complexity in the rest of the conjecture.

7Eric Neyman
Thanks, this is a good question. My suspicion is that we could replace "99%" with "all but exponentially small probability in n". I also suspect that you could replace it with 1−ε, with the stipulation that the length of π (or the running time of V) will depend on ε. But I'm not exactly sure how I expect it to depend on ε -- for instance, it might be exponential in 1/ε. My basic intuition is that the closer you make 99% to 1, the smaller the number of circuits that V is allowed to say "look non-random" (i.e. are flagged for some advice π). And so V is forced to do more thorough checks ("is it actually non-random in the sort of way that could lead to P being true?") before outputting 1.   99% is just a kind-of lazy way to sidestep all of these considerations and state a conjecture that's "spicy" (many theoretical computer scientists think our conjecture is false) without claiming too much / getting bogged down in the details of how the "all but a small fraction of circuits" thing depends on n or the length of π or the runtime of V.

Ultimately, though, we are interested in finding a verifier that accepts or rejects  based on a structural explanation of the circuit; our no-coincidence conjecture is our best attempt to formalize that claim, even if it is imperfect.

Can you say more about what made you decide to go with the 99% clause? Did you consider any alternatives?

3Alibi
Reading the post, I also felt like 99% was kind of an arbitrary number. I would have expected it to be something like: for all $\epsilon > 0$ there exists a $V$ such that ... $1-\epsilon$ of random circuits satisfy ...

This does go in the direction of refuting it, but they'd still need to argue that linear probes improve with scale faster than they do for other queries; a larger model means there are more possible linear probes to pick the best from.

3Matrice Jacobine
I don't see why it should improve faster. It's generally held that the increase in interpretability in larger models is due to larger models having better representations (that's why we prefer larger models in the first place), why should it be any different in scale for normative representations?

I had that vibe from the abstract, but I can try to guess at a specific hypothesis that also explains their data: Instead of a model developing preferences as it grows up, it models an Assistant character's preferences from the start, but their elicitation techniques work better on larger models; for small models they produce lots of noise.

7Matrice Jacobine
This interpretation is straightforwardly refuted (insofar as it makes any positivist sense) by the success of the parametric approach in "Internal Utility Representations" being also correlated with model size.

Ah, oops. I think I got confused by the absence of L_2 syntax in your formula for FVU_B. (I agree that FVU_A is more principled ^^.)

2StefanHex
Oops, fixed!

https://github.com/jbloomAus/SAELens/blob/main/sae_lens/evals.py#L511 sums the numerator and denominator separately, if they aren't doing that in some other place probably just file a bug report?

2StefanHex
I think this is the sum over the vector dimension, but not over the samples. The sum (mean) over samples is taken later in this line which happens after the division metrics[f"{metric_name}"] = torch.cat(metric_values).mean().item() Edit: And to clarify, my impression is that people think of this as alternative definitions of FVU and you got to pick one, rather than one being right and one being a bug. Edit2: And I'm in touch with the SAEBench authors about making a PR to change this / add both options (and by extension probably doing the same in SAELens); though I won't mind if anyone else does it!

Thanks, edited. If we keep this going we'll have more authors than users x)

2MondSemmel
You're making a very generous offer of your time and expertise here. However, to me your post still feels way, way more confusing than it should be. Suggestions & feedback: * Title: "Get your math consultations here!" -> "I'm offering free math consultations for programmers!" or similar. * Or something else entirely. I'm particularly confused how your title (math consultations) leads into the rest of the post (debuggers and programming). * First paragraph: As your first sentence, mention your actual, concrete offer (something like "You screenshare as you do your daily tinkering, I watch for algorithmic or theoretical squiggles that cost you compute or accuracy or maintainability." from your original post, though ideally with much less jargon). Also your target audience: math people? Programmers? AI safety people? Others? * "click the free https://calendly.com/gurkenglas/consultation link" -> What you mean is: "click this link for my free consultations". What I read is a dark pattern à la: "this link is free, but the consultations are paid". Suggested phrasing: something like "you can book a free consultation with me at this link" * Overall writing quality * Assuming all your users would be as happy as the commenters you mentioned, it seems to me like the writing quality of these posts of yours might be several levels below your skill as a programmer and teacher. In which case it's no wonder that you don't get more uptake. * Suggestion 1: feed the post into an LLM and ask it for writing feedback. * Suggestion 2: imagine you're a LW user in your target audience, whoever that is, and you're seeing the post "Get your math consultations here!" in the LW homepage feed, written by an unknown author. Do people in your target audience understand what your post is about, enough to click on the post if they would benefit from it? Then once they click and read the first paragraph, do they understand what it's about and click on the link if they would benefit f

Account settings let you set mentions to notify you by email :)

The action space is too large for this to be infeasible, but at a 101 level, if the Sun spun fast enough it would come apart, and angular momentum is conserved so it's easy to add gradually.

Can this program that you've shown to exist be explicitly constructed?

Load More