Maybe this is the right place to ask/discuss this, and maybe not - if it's not; say so and I'll stop.
IIRC you (or maybe someone once mentioned hearing about people who try to [experience the first jhana][1] and then feeling pain as a result, and that you didn't really understand why that happened. There was maybe also a comment about "don't do that, that sounds like you were doing it wrong".
After some time spent prodding at myself and pulling threads and seeing where they lead... I am not convinced that they were doing it wrong at all. There's a kind ...
Here's a game-theory game I don't think I've ever seen explicitly described before: Vicious Stag Hunt, a two-player non-zero-sum game elaborating on both Stag Hunt and Prisoner's Dilemma. (Or maybe Chicken? It depends on the obvious dials to turn. This is frankly probably a whole family of possible games.)
The two players can pick from among 3 moves: Stag, Hare, and Attack.
Hunting stag is great, if you can coordinate on it. Playing Stag costs you 5 coins, but if the other player also played Stag, you make your 5 coins back plus another 10.
Hunting hare is fi...
A snowclone summarizing a handful of baseline important questions-to-self: "What is the state of your X, and why is that what your X's state is?" Obviously also versions that are less generally and more naturally phrased, that's just the most obviously parametrized form of the snowclone.
Classic(?) examples:
"What do you (think you) know, and why do you (think you) know it?" (X = knowledge/belief)
"What are you doing, and why are you doing it?" (X = action(-direction?)/motivation?)
Less classic examples that I recognized or just made up:
"How do you feel, and w...
"What is the state and progress of your soul, and what is the path upon which your feet are set?" (X = alignment with yourself) I affected a quasi-religious vocabulary, but I think this has general application.
"What are you trying not to know, and why are you trying not to know it?" (X = self-deceptions)
The phrasing I got from the mentor/research partner I'm working with is pretty close to the former but closer in attitude and effective result to the latter. Really, the major issue is that string diagrams for a flavor of category and commutative diagrams for the same flavor of category are straight-up equivalent, but explicitly showing this is very very messy, and even explicitly describing Markov categories - the flavor of category I picked as likely the right one to use, between good modelling of Markov kernels and their role doing just that for causal ...
I guess? I mean, there's three separate degrees of "should really be kept contained"-ness here:
I promise I am still working on working out all the consequences of the string diagram notation for latential Bayes nets, since the guts of the category theory are all fixed (and can, as a mentor advises me, be kept out of the public eye as they should be). Things can be kept (basically) purely in terms of string diagrams. In whatever post I write, they certainly will be.
I want to be able to show that isomorphism of natural latents is the categorical property I'm ~97% sure it is (and likewise for minimal and maximal latents). I need to sit myself down and ...
Because RLHF works, we shouldn't be surprised when AI models output wrong answers which are specifically hard for humans to distinguish from a right answer.
This observably (seems like it) generalizes to all humans, instead of (say) it being totally trivial somehow to train an AI on feedback only from some strict and distinguished subset of humanity such that any wrong answers it produced could be easily spotted by the excluded humans.
Such wrong answers which look right (on first glance) also observably exist, and we should thus expect that if there's anyth...
(Random thought I had and figured this was the right place to set it down:) Given how centally important token-based word embeddings as to the current LLM paradigm, how plausible is it that (put loosely) "doing it all in Chinese" (instead of English) is actually just plain a more powerful/less error-prone/generally better background assumption?
Associated helpful intuition pump: LLM word tokenization is like a logographic writing system, where each word corresponds to a character of the logography. There need be no particular correspondence between the form...
As someone who does both data analysis and algebraic topology, my take is that TDA showed promise but ultimately there's something missing such that it's not at full capacity. Either the formalism isn't developed enough or it's being consistently used on the wrong kinds of datasets. Which is kind of a shame, because it's the kind of thing that should work beautifully and in some cases even does!
Do you know when you started experiencing having an internal music player? I recall that that started for me when I was about 6. Also, do you know whether you can deliberately pick a piece of music, or other nonmusical sonic experiences, to playback internally? Can you make them start up from internal silence? Under what conditions can you make them stop? Do you ever experience long stretches where you have no internal music at all?
Sure - I can believe that that's one way a person's internal quorum can be set up. In other cases, or for other reasons, they might be instead set up to demand results, and evaluate primarily based on results. And that's not great or necessarily psychologically healthy, but then the question becomes "why do some people end up one way and other people the other way?" Also, there's the question of just how big/significant the effort was, and thus how big of an effective risk the one predictor took. Be it internal to one person or relevant to a group of humans, a sufficiently grand-scale noble failure will not generally be seen as all that noble (IME).
This makes some interesting predictions re: some types of trauma: namely, that they can happen when someone was (probably even correctly!) pushing very hard towards some important goal, and then either they ran out of fuel just before finishing and collapsed, or they achieved that goal and then - because of circumstances, just plain bad luck, or something else - that goal failed to pay off in the way that it usually does, societally speaking. In either case, the predictor/pusher that burned down lots of savings in investment doesn't get paid off. This is maybe part of why "if trauma, and help, you get stronger; if trauma, and no help, you get weaker".
I didn't enjoy this one as much, but that's likely down to not having had the time/energy to spend on thinking this through deeply. That said... I did not in fact enjoy it as much and I mostly feel like garbage for having done literally worse than chance, and I feel like it probably would have been better if I hadn't participated at all.
Let me see if I've understood point 3 correctly here. (I am not convinced I have actually found a flaw, I'm just trying to reconcile two things in my head here that look to conflict, so I can write down a clean definition elsewhere of something that matters to me.)
factors over . In , were conditionally independent of each other, given . Because factors over and because in , were conditionally independent of each other, given , we can very straightforwar...
We’ll refer to these as “Bookkeeping Rules”, since they feel pretty minor if you’re already comfortable working with Bayes nets. Some examples:
- We can always add an arrow to a diagram (assuming it doesn’t introduce a loop), and the approximation will get no worse.
Here's something that's kept bothering me on and off for the last few months: This graphical rule immediately breaks Markov equivalence. Specifically, two DAGs are Markov-equivalent only if they share an (undirected) skeleton. (Lemma 6.1 at the link.)
If the major/only thing we care about here regar...
I'm going to start by attacking this a little on my own before I even look much at what other people have done.
Some initial observations from the SQL+Python practice this gave me a good excuse to do:
I'm gonna leave my thoughts on the ramifications for academia, where a major career step is to repeatedly join and leave different large bureaucratic organizations for a decade, as an exercise to the reader.
Like, in a world where the median person is John Wentworth (“Wentworld”), I’m pretty sure there just aren’t large organizations of the sort our world has.
I have numerous thoughts on how Lorxusverse Polity handles this problem but none of it is well-worked out enough to share. In sum though: Probably cybernetics (in the Beer sense) got discovered way...
[2.] maybe one could go faster by trying to more directly cleave to the core philosophical problems.
...
An underemphasized point that I should maybe elaborate more on: a main claim is that there's untapped guidance to be gotten from our partial understanding--at the philosophical level and for the philosophical level. In other words, our preliminary concepts and intuitions and propositions are, I think, already enough that there's a lot of progress to be made by having them talk to each other, so to speak.
OK but what would this even look like?\gen
Toss away ...
Clearly academia has some blind spots, but how big? Do I just have a knack for finding ideas that academia hates, or are the blind spots actually enormous?
From someone who left a corner of it: the blindspots could be arbitrarily large as far as I know, because there seemed to me to be no real explicit culture of Hamming questions/metalooking for anything neglected. You worked on something vaguely similar/related to your advisor's work, because otherwise you can't get connections to people who know how to attack the problem.
As my reacts hopefully implied, this is exactly the kind of clarification I needed - thanks!
Like, bro, I'm saying it can't think. That's the tweet. What thinking is, isn't clear, but That thinking is should be presumed, pending a forceful philosophical conceptual replacement!
Sure, but you're not preaching to the choir at that point. So surely the next step in that particular dance is to stick a knife in the crack and twist?
That is -
..."OK, buddy:
Here's property P (and if you're good, Q and R and...) that [would have to]/[is/are obviously natural and des
> https://www.lesswrong.com/posts/r7nBaKy5Ry3JWhnJT/announcing-iliad-theoretical-ai-alignment-conference#whqf4oJoYbz5szxWc
you didn't invite me so you don't get to have all the nice things, but I did leave several good artifacts and books I recommend lying around. I invite you to make good use of them!
(Minor quibble: I’d be careful about using “should” here, as in “the heart should pump blood”, because “should” is often used in a moral sense. For instance, the COVID-19 spike protein presumably has some function involving sneaking into cells, it “should” do that in the teleological sense, but in the moral sense COVID-19 “should” just die out. I think that ambiguity makes a sentence like “but it might be another thing to say, that the heart should pump blood” sound deeper/more substantive than it is, in this context.
This puts me in mind of what ...
To paraphrase:
Want and have. See and take. Run and chase. Thirst and slake. And if you're thwarted in pursuit of your desire… so what? That's just the way of things, not always getting what you hunger for. The desire itself is still yours, still pure, still real, so long as you don't deny it or seek to snuff it out.
@habryka Forgot to comment on the changes you implemented for soundscape at LH during the mixer - possibly you may want to put a speaker in the Bayes window overlooking the courtyard firepit. People started congregating/pooling there (and notably not at the other firepit next to it!) because it was the locally-quietest location, and then the usual failure modes of an attempted 12-person conversation ensued.
any finite-entropy function
Uh...
most of them are small and probably don’t have the mental complexity required to really grasp three dimensions
Foxes and ferrets strike me as two obvious exceptions here, and indeed, we see both being incredibly good at getting into, out of, and around spaces, sometimes in ways that humans might find unexpected.
As someone who's spent meaningful amounts of time at LH during parties, absolutely yes. You successfully made it architecturally awkward to have large conversations, but that's often cashed out as "there's a giant conversation group in and totally blocking [the Entry Hallway Room of Aumann]/[the lawn between A&B]/[one or another firepit and its surrounding walkways]; that conversation group is suffering from the obvious described failure modes, but no one in it is sufficiently confident or agentic or charismatic to successfully break out into a subgrou...
Why do you need to be certain? Say there's a screen showing a nice "high-level" interface that provides substantial functionality (without directly revealing the inner workings, e.g. there's no shell). Something like that should be practically convincing.
Then whatever that's doing is a constraint in itself, and I can start off by going looking for patterns of activation that correspond to e.g. simple-but-specific mathematical operations that I can actuate in the computer.
...I'm unsure about that, but the more pertinent questions are along the lines of "i
I don't have much to say except that this seems broadly correct and very important in my professional opinion. Generating definitions is hard, and often depends subtly/finely on the kinds of theorems you want to be able to prove (while still having definitions that describe the kind of object you set out to describe, and not have them be totally determined by the theorem you want - that would make the objects meaningless!). Generating frameworks out of whole cloth is harder yet; understanding them is sometimes easiest of all.
Thinking about it more, I want to poke at the foundations of the koan. Why are we so sure that this is a computer at all? What permits us this certainty, that this is a computer, and that it is also running actual computation rather than glitching out?
B: Are you basically saying that it's a really hard science problem?
From a different and more conceit-cooperative angle: it's not just that this is a really hard science problem, it might be a maximally hard science problem. Maybe too hard for existing science to science at! After all, hash functions are ...
You maybe got stuck in some of the many local optima that Nurmela 1995 runs into. Genuinely, the best sphere code for 9 points in 4 dimensions is known to have a minimum angular separation of ~1.408 radians, for a worst-case cosine similarity of about 0.162.
You got a lot further than I did with my own initial attempts at random search, but you didn't quite find it, either.
On @TsviBT's recommendation, I'm writing this up quickly here.
re: the famous graph from https://transformer-circuits.pub/2022/toy_model/index.html#geometry with all the colored bands, plotting "dimensions per feature in a model with superposition", there look to be 3 obvious clusters outside of any colored band and between 2/5 and 1/2, the third of which is directly below the third inset image from the right. All three of these clusters are at 1/(1-S) ~ 4.
A picture of the plot, plus a summary of my thought processes for about the first 30 seconds of lookin...
Probabilities of zero are extremely load-bearing for natural latents in the exact case...
Dumb question: Can you sketch out an argument for why this is the case and/or why this has to be the case? I agree that ideally/morally this should be true, but if we're already accepting a bounded degree of error elsewhere, what explodes if we accept it here?
I'm in a weird situation here: I'm not entirely sure whether the community considers the Learning Theory Agenda to be the same alignment plan as The Plan (which is arguably not a plan at all but he sure thinks about value learning!), and whether I can count things like the class of scalable oversight plans which take as read that "human values" are a specific natural object. Would you at least agree that those first two (or one???) rely on that?
I guess in that case I'd worry that you go and look at the features and come away with some impression of what those features represent and it turns out you're totally wrong? I keep coming back to the example of a text-classifier where you find """the French activation directions""" except it turns out that only one of them is for French (if any at all) and the others are things like "words ending in x and z" or "words spoken by fancy people in these novels and quotes pages".
Like, you might think the more things you know about smart AIs, the easier it would be to build them - where does this argument break?
I mean... it doesn't? I guess I mostly think that either what I'm working on is totally off the capabilities pathway, or if it's somehow on one, then I don't think whatever minor framework improvement or suggestion for a mental frame that I come up with is going to push things all that far? Which I agree is kind of a depressing thing to expect of your work, but I argue that that's the most likely two outcomes here. Does that address that?
Almost certainly this is way too ambitious for me to do, but I don't know what "starting a framework" would look like. I guess I don't have as full an understanding as I'd like of what MATS expects me to come up with/what's in-bounds? I'd want to come up with a paper or something out of this but I'm also not confident in my ability to (for instance) fully specify the missing pieces of John's model. Or even one of his missing pieces.
I had thought that that would be implicit in why I'm picking up those skills/that knowledge? I agree that it's not great that I'm finding that some of my initial ideas for things to do are infeasible or unhelpful such that I don't feel like I have concrete theorems to want to try to prove here, or specific experiments I expect to want to run. I think a lot of next week is going to be reading up on natural latents/abstractions even more deeply than before when I was learning about them previously and trying to find somewhere a proof needs to go.
Got it, thanks. I'll see if I can figure out who that was or where to find that claim. Cheers.