All of DaemonicSigil's Comments + Replies

We probably don't disagree that much. What "original seeing" means is just going and investigating things you're interested in. So doing lengthy research is actually a much more central example of this than coming up with a bold new idea is.

As I say above: "There's not any principled reason why an AI system, even a LLM in particular, couldn't do this."

Thanks for the clarification!  I think some of it is that I find the term "original seeing" to be off-putting. I'm not sure if I got the point of the corresponding blog post.  In general, going forward, I'd recommend people try to be very precise on what they mean here. I'm suspicious that "original seeing" will mean different things to different people. I'd expect that trying to more precisely clarify what tasks or skills involved would make it easier to pinpoint which parts of it are good/bad for LLMs. 

Some experimental data:

There's not really anything wrong with ChatGPT's attempt here, but it happens to have picked the same topic as a recent Numberphile video, and I think it's instructive to compare how they present the same topic:

Answer by DaemonicSigil3627

My view on this is that writing a worthwhile blog post is not only a writing task, but also an original seeing task. You first have to go and find something out in the world and learn about it before you can write about it. So the obstacle is not necessarily reasoning ("look at this weird rock I found" doesn't involve much reasoning, but could make a good blog post), but a lack of things to say.

There's not any principled reason why an AI system, even a LLM in particular, couldn't do this. There is plenty going on in the world to go and find out, even if yo... (read more)

4Seth Herd
I think this is correct. Blogging isn't the easy end of the spectrum, it actually involves solving novel problems of finding new useful viewpoints. But this answer it leaves out answering the core question: what advances will allow LLMs to produce original seeing? If you think about how humans typically produce original seeing, I think there are relatively straightforward ways that an LLM-based cognitive architecture that can direct its own investigations, "think" about what it'd found, and remember (using continuous learning of some sort) what it's found can do the same thing. Or of course sometimes they just pop out a perspective that counts as original seeing relative to most readers. Then the challenge is identifying whether it is really of interest to enough of the target audience, which usually involves some elaborate thinking including chains of thought, goal-directed agency (and "executive function" or metacognition of noticing whether the current CoT is on-task and redirecting it if not, then aggregating all of the on-task CoT's to draw conclusions... It's not clear when this will happen, because there isn't much economic payoff for improving original seeing. This is very much the same process as solving novel problems. LLMs rarely if ever do this. But there's a clearer economic payoff: people want agents that do useful work without constant human intervention by solving minor novel problems of how to get past dead-ends in their current approaches. So I'd predict we get human-level blogging in 1-2 years (that's human-level, which is, on average, terrible - it just has some notable standouts). For more detail, see almost all of my other posts. :)

Even comment-writing is to a large extent an original seeing task, when all the relevent context would otherwise seem to be more straightforward to assemble than when writing the post itself unprompted. A good comment to a post is not a review of the whole post. It finds some point in it, looks at it in a particular way, and finds that there is a relevant observation to be made about it.

Crucially, a comment won't be made at all if such a relevant observation wasn't found, or else you get slop even when the commenter is human ("Great post!"). Chatbots did already achieve parity with a significant fraction of reddit-level comments (but not posts) I think, which are also not worth reading.

I feel like I've heard this before, and can sympathize, but I'm skeptical. I feel like this prescribes an almost magical thinking to how many blog posts are produced. The phrase "original seeing" sounds much more profound than I'm comfortable with for such a discussion. Let's go through some examples: * Lots of Zvi's posts are summaries of content, done in a ways that's fairly formulaic. * A lot of Scott Alexander's posts read to me like, "Here's an interesting area that blog readers like but haven't investigated much. I read a few things about it, and have some takes that make a lot of sense upon some level of reflection." * A lot of my own posts seem like things that wouldn't be too hard to come up with some search process to create.   Broadly, I think that "coming up with bold new ideas" gets too much attention, and more basic things like "doing lengthy research" or "explaining to people the next incremental set of information that they would be comfortable with, in a way that's very well expressed" gets too little.  I expect that future AI systems will get good at going from a long list of [hypotheses of what might make for interesting topics] and [some great areas, where a bit of research provides surprising insights] and similar. We don't really have this yet, but it seems doable to me.  (I similarly didn't agree with the related post)
Some experimental data: There's not really anything wrong with ChatGPT's attempt here, but it happens to have picked the same topic as a recent Numberphile video, and I think it's instructive to compare how they present the same topic:

Thanks for the reply & link. I definitely missed that paragraph, whoops.

IMO even just simple gamete selection would be pretty great for avoiding the worst genetic diseases. I guess tracking nuclei with a microscope is way more feasible than the microwell thing, given how hard it looks to make IVS work at all.


I think the numbers just kind of suck. I didn't go into them much because gamete selection seems largely hypothetical. Like, the procedure here seems kinda expensive per-spermatocyte (guy who divides into 4 sperm). I gesture at how to compute it here:

To give some numbers, consider 10k sperm. We look in the embryo selection powers:

For 10k, i.e. e4.0, you get about 2.7 raw SDs, for embros; for sperm you multiply by the sperm SD... (read more)

Re the "Appendix: Cheap DNA segment sensing" section, just going to throw out a thought that occurred to me (very much a non-expert). Let's say we're doing IVS, and assume we can separate spermatocytes into separate microwells before they undergo meiosis. The starting cells all have a known genome. Then the cell in each microwell divides into 4 cells. If we sequence 3 of them, then we know by process of elimination what the sequence on the 4th cell is, at a very high level of detail, including crossovers, etc. So we kill 3 cells and look at their DNA, and ... (read more)


Yep, this is the standard first "clever person starts thinking about nondestructive sequencing" thing. E.g. I wrote about it here: and Gwern mentions it. It's addressed in the present article here:

In theory one could do "entanglement sequencing", wherein you capture the four meiotic "grandchildren" of a single gametogonium (progenitor stem cell that produces gametes). This is impractical as far as I know, because both oogenesis and spermatogenesis are complex and all but req

... (read more)

This was a fun little exercise. We get many "theory of rationality" posts on this site, so it's very good to also have some chances to practice figuring out confusing things also mixed in. The various coins each teach good lessons about ways the world can surprise you.

Anyway, I think this was an underrated post, and we need more posts in this general category.

Running parallel to the spin axis would be fine, though.

3mako yass
I guess since it sounds like they're going to be about a km long and 20 stories deep there'll be enough room for a nice running track with minimal upspin/downspin sections.

Anthropic shadow isn't a real thing, check this post:

Also, you should care about worlds proportional to the square of their amplitude.

5Christopher King
It's actually interesting to consider why this must be the case. Without it, I concede that maybe some sort of Quantum Anthropic Shadow could be true. I'm thinking it would lead to lots of wacky consequences.
I know this post and have two problems with it: what they call 'anthropic shadow" is not proper term as Bostrom defined anthropic shadow as underestimation of past risks based on the fact of survival in his article this the same name. But it's ok.  The more serious problem is that quantum immortality and angel immortality eventually merges: for example, if we survive 10 LHC failures because of QI, we most likely survive only on those timeline where some alien stops LHC. So both QI and angel immortality can be true and support one another and there is no contradiction.     

Thanks for making the game! I also played it, just didn't leave a comment on the original post. Scored 2751. I played each location for an entire day after building an initial food stockpile, and so figured out the timing of Tiger Forest and Dog Valley. But I also did some fairly dumb stuff, like assuming a time dependence for other biomes. And I underestimated Horse Hills, since when I foraged it for a full day, I got unlucky and only rolled a single large number. For what it's worth, I find these applet things more accessible than a full-on D&D.Sci (... (read more)

Have to divide by number of airships, which probably makes them less safe than planes, if not cars. I think the difficulty is mostly with having a large surface-area exposed to the wind making the ships difficult to control. (Edit: looking at the list on Wikipedia, this is maybe not totally true. A lot of the crashes seem to be caused by equipment failures too.)

Are those things that you care about working towards?

No, and I don't work on airships and have no plans to do so. I mainly just think it's an interesting demonstration of how weak electrostatic forces can be.

I think this is mostly about how weak air is against dielectric breakdown.
On the other hand, the hydrogen pushing against the airship membrane is also an electrostatic force.

Yep, Claude sure is a pretty good coder: Wang Tile Pattern Generator

This took 1 initial write and 5 change requests to produce. The most manual effort I had to do was look at unicode ranges and see which ones had distinctive-looking glyphs in them. (Sorry if any of these aren't in your computer's glyph library.)

I've begun worshipping the sun for a number of reasons. First of all, unlike some other gods I could mention, I can see the sun. It's there for me every day. And the things it brings me are quite apparent all the time: heat, light, food, and a lovely day. There's no mystery, no one asks for money, I don't have to dress up, and there's no boring pageantry. And interestingly enough, I have found that the prayers I offer to the sun and the prayers I formerly offered to 'God' are all answered at about the same 50% rate.

-- George Carlin

Everyone who earns money exerts some control by buying food or whatever else they buy. This directs society to work on producing those goods and services. There's also political/military control, but it's also (a much narrower set of) humans who have that kind of control too.

Actually, this is the sun controlling the world, not humans. The sun exerts control by permitting plants to grow, and their fruit creates an excess of organic energy, which permits animals like humans to live. Humans have rather limited choice here; we can redirect the food by harvesting it and guarding against adversaries, but the best means to do so are heavily constrained by instrumental matters. Locally, there is some control in that people can stop eating food and die, or overeat and become obese. Or they can choose what kinds of food to eat. But this seems more like "control yourself" than "control the world". The farmers can choose how much food to supply, but if a farmer doesn't supply what is needed, then some other farmer elsewhere will supply it, so that's more "control your farm" than "control the world". The world revolves around the sun.

Okay, I'll be the idiot who gives the obvious answer: Yeah, pretty much.

Who, by what metric, in what way?

Very nice post, thanks for writing it.

Your options are numbered when you refer to them in the text, but are listed as bullet points originally. Probably they should also be numbered there!

Now we can get down to the actual physics discussion. I have a bag of fairly unrelated statements to make.

  • The "center of mass moves at constant velocity" thing is actually just as solid as, say, conservation of angular momentum. It's just less famous. Both are consequences of Noether's theorem, angular momentum conservation arising from symmetry under rotations and the

... (read more)
Numbering the options properly is a good idea, done. To answer your points: * This is interesting. Symmetry under rotations gives us conservation of angular momentum. Symmetry under translations conservation of linear momentum. You are saying symmetry under boosts gives conservation of centre of mass velocity. Although in "normal" situations (billiard balls colliding) conservation of centre of mass velocity is a special case of as conservation of linear momentum - which I suppose is why I have not heard of it before. I need to look at this more as I find I am still confused. Intuitively I feel like if translation symmetry is doing momentum for us boost symmetry should relate to a quantity with an extra time-derivative in it somewhere.  There is no symmetry under angular boosts, which I imagine is why fly-wheels (or gyroscopes) allow for an "internal reaction drive" for angular velocity. * I did not know that the kinetic and canonical momentum had different values in other fields. That makes option (1) more believable. * Yes, the k-vector (wavevector) certainly extends by a factor of n. So if you want your definition of "momentum" to be linear in wavevector then you are stuck with Minkowski. * I believe, that at the interface between the water and the air we will have a partial reflection of the light. The reflected component of the light has an evanescent tail associated with it that tunnels into the air gap. If we had more water on the other side of the air gap then the evanescent tail would be converted back into a propagating wave, and the light would not reflect from the first water interface in the first place. As the evanescent tail has a length of the order of a wavelength this means that random gaps between the atoms in water or glass don't mess with the propagating light wave, as the wavelength is so much longer than those tiny gaps they do not contribute. Applying this picture to your question, I think we would expect to interpolate smooth

Good point, the whole "model treats tokens it previously produced and tokens that are part of the input exactly the same" thing and the whole "model doesn't learn across usages" thing are also very important.

When generating each token, they "re-read" everything in the context window before predicting. None of their internal calculations are preserved when predicting the next token, everything is forgotten and the entire context window is re-read again.

Given that KV caching is a thing, the way I chose to phrase this is very misleading / outright wrong in retrospect. While of course inference could be done in this way, it's not the most efficient, and one could even make a similar statement about certain inefficient ways of simulating a person's thoughts.

If I... (read more)

I think it's a very important thing to know about Transformers, as our intuition about these models is that there must be some sort of hidden state or on the fly adaptation, and this is at least potentially true of other models. (For example, in RNNs, it's a useful trick to run the RNN through the 'context window' and then loop back around and input the final hidden state at the beginning, and 'reread' the 'context window' before dealing with new input. Or there's dynamic evaluation, where the RNN is trained on the fly, for much better results, and that is very unlike almost all Transformer uses. And of course, RNNs have long had various kinds of adaptive computation where they can update the hidden state repeatedly on repeated or null inputs to 'ponder'.) But I don't think your rewrite is better, because it's focused on a different thing entirely, and loses the Memento-like aspect of how Transformers work - that there is nothing 'outside' the context window. The KV cache strikes me as quibbling: the KV cache is more efficient, but it works only because it is mathematically identical and is caching the computations which are identical every time. I would just rewrite that as something like,

Let's say we have a bunch of datapoints in that are expected to lie on some lattice, with some noise in the measured positions. We'd like to fit a lattice to these points that hopefully matches the ground truth lattice well. Since just by choosing a very fine lattice we can get an arbitrarily small error without doing anything interesting, there also needs to be some penalty on excessively fine lattices. This is a bit of a strange problem, and an algorithm for it will be presented here.


Since this is a lattice problem, the first question to jump ... (read more)

This might be worth pinning as a top-level post.

Answer by DaemonicSigil20

The amount of entropy in a given organism stays about the same, though I guess you could argue it increases as the organism grows in size. Reason: The organism isn't mutating over time to become made of increasingly high entropy stuff, nor is it heating up. The entropy has to stay within an upper and lower bound. So over time the organism will increase entropy external to itself, while the internal entropy doesn't change very much, maybe just fluctuates within the bounds a bit.

It's probably better to talk about entropy per unit mass, rather than entropy de... (read more)

I mean, actually it is. Plus accumulation of various kinds of damage, experiences, etc. which makes it differ from other organisms. Looking it up, apparently people drop very slightly in temperature when they age, which I guess might dominate the entropy considerations (though I guess that is due to slowly dying, so it also seems compatible with entropy being related to life if reduction in life is related to reduction in entropy). Couldn't it be reasonable to say that entropy increases as a sign of increased vitality associated with growing up to adulthood, and then afters has a mixture of an infinitesimal increasing effect from life experience and a moderate associated wirh vitality breakdown? But if we go by unit mass, shouldn't we count both the entropy in the air and the entropy in the organic matter, since they're both related to the original mass that goes into life, meaning therefore life still increases entropy?
I think the correct unit is "per particle" or "per mole".

Speaking of which, I wonder if multi-modal transformers have started being used by blind people yet. Since we have models that can describe images, I wonder if it would be useful for blind people to have a device with a camera and a microphone and a little button one can press to get it to describe what the camera is seeing. Surely there are startups working on this?

Cool, Facebook is also on this apparently:

Yes. See Be my AI.

a device with a camera and a microphone and a little button

Why the retrofuturistic description of a smartphone?

Found this paper on insecticide costs:

It's from 2000, so anything listed here would be out of patent today.

Here are the costs from the above link: It's worth noting that countries (such as India) have the option of simply not respecting a patent when the use is important and the fees requested are unreasonable. Also, patents aren't international; it's often possible to get around them by simply manufacturing and using a chemical in a different country.

hardening voltage transformers against ionising radiation

Is ionization really the mechanism by which transformers fail in a solar storm? I thought it was that changes in the Earth's magnetic field induced large currents in long transmission lines, overloading the transformers.

Good question!  Will look into it / check more if I have the time. 

Sorry for the self promotion, but some folks may find this post relevant: (ctl-F for "Application: Conditional prediction markets")

tldr: Gives a general framework that would allow people to make this kind of trade with only $N in capital, just as a natural consequence of the trading rules of the market.

Anyway, I definitely agree that Manifold should add the feature you describe! (As for general logical share splitting, well, it would be nice, but probably far too much work to convert the existing codebase over.)

IMO, a very good response, which Eliezer doesn't seem to be interested in making as far as I can tell, is that we should not be making the analogy natural selection <--> gradient descent, but rather, human brain learning algorithm <--> gradient descent ; natural selection <--> us trying to build AI.

So here, the striking thing is that evolution failed to solve the alignment problem for humans. I.e. we have a prior example of strongish general intelligence being created, but no prior examples of strongish general intelligence being aligne... (read more)

Yeah you can kind of stop at "we are already doing natural selection." The devs give us random variation. The conferences and the market give us selection. The population is large, the mutation rate is high, the competition is fierce, and replicating costs $0.25 + 10 minutes.

People here might find this post interesting:

The author argues that search algorithms will play a much larger role in AI in the future than they do today.

Answer by DaemonicSigil156

I remember reading the EJT post and left some comments there. The basic conclusions I arrived at are:

  • The transitivity property is actually important and necessary, one can construct money-pump-like situations if it isn't satisfied. See this comment
  • If we keep transitivity, but not completeness, and follow a strategy of not making choices inconsistent with out previous choices, as EJT suggests, then we no longer have a single consistent utility function. However, it looks like the behaviour can still be roughly described as "picking a utility function at
... (read more)
6Thomas Kwa
Note that if the distribution of utility under the prior is heavy-tailed, you can get infinite utility even with arbitrarily low relative entropy, so the optimal policy is undefined. In the case of goal misspecification, optimization with a KL penalty may be unsafe or get no better utility than the prior.

If you're working with multidimensional tensors (eg. in numpy or pytorch), a helpful pattern is often to use pattern matching to get the sizes of various dimensions. Like this: batch, chan, w, h = x.shape. And sometimes you already know some of these dimensions, and want to assert that they have the correct values. Here is a convenient way to do that. Define the following class and single instance of it:

class _MustBe:
  """ class for asserting that a dimension must have a certain value.
      the class itself is private, one should import a particular obj
... (read more)

Oh, very cool, thanks! Spoiler tag in markdown is:

text here

Heh, sure.

Promote from a function to a linear operator on the space of functions, . The action of this operator is just "multiply by ". We'll similarly define meaning to multiply by the first, second integral of , etc.


Now we can calculate what we get when applying times. The calculation simplifies when we note that all terms are of the form . Result:

Now we apply the above operator to :

The sum terminates b

... (read more)
Very nice! Notice that if you write r=j−k, I as D−1, and play around with binomial coefficients a bit, we can rewrite this as:   (By the way, how do you spoiler tag?)

Use integration by parts:

Then  is another polynomial (of smaller degree), and  is another "nice" function, so we recurse.

This is true, but I'm looking for an explicit, non-recursive formula that needs to handle the general case of the kth anti-derivative (instead of just the first). The solution involves doing something funny with formal power series, like in this post.

Other people have mentioned sites like Mechanical Turk. Just to add another thing in the same category, apparently now people will pay you for helping train language models:

Haven't tried it yet myself, but a roommate of mine has and he seems to have had a good experience. He's mentioned that sometimes people find it hard to get assigned work by their algorithm, though. I did a quick search to see what their reputation was, and it seemed pretty okay:

... (read more)
Thanks, I have done DataAnnotation already a few months back. It's true that it's difficult to get assignments there after you finish the first one or two. They supposedly have tons of work for people who specialize in certain tech roles, but that obviously won't apply to most people. There is also virtually no way to contact anyone who works at DataAnnotation if you have questions. But I have made a few dollars there.

Linkpost for:

Today's interesting number is 961.

Say you're writing a CUDA program and you need to accomplish some task for every element of a long array. Well, the classical way to do this is to divide up the job amongst several different threads and let each thread do a part of the array. (We'll ignore blocks for simplicity, maybe each block has its own array to work on or something.) The method here is as follows:

for (int i = threadIdx.x; i < array_len; i += 32) {
    arr[i] = ...;

So the threads make the foll... (read more)

So once that research is finished, assuming it is successful, you'd agree that many worlds would end up using fewer bits in that case? That seems like a reasonable position to me, then! (I find the partial-trace kinds of arguments that people make pretty convincing already, but it's reasonable not to.)

The other problem is that MWI is up against various subjective and non-realist interpretations, so it's not it's not the case that you can build an ontological model of every interpretation.

MW theories have to specify when and how decoherence occurs. Decoherence isn't simple.

They don't actually. One could equally well say: "Fundamental theories of physics have to specify when and how increases in entropy occur. Thermal randomness isn't simple." This is wrong because once you've described the fundamental laws and they happen to be reversible, and also aren't too simple, increasing entropy from a low entropy initial state is a natural consequence of those laws. Similarly, decoherence is a natural consequence of the laws of quantum mechanics (with a not-too-simple Hamiltonian) applied to a low entropy initial state.

MW has to show that decoherence is a natural consequence, which is the same thing. It can't be taken on faith, any more than entropy should be. Proofs of entropy were supplied a long time ago, proofs of decoherence of a suitable kind, are a work in progress.

Good post, and I basically agree with this. I do think it's good to mostly focus on the experimental implications when talking about these things. When I say "many worlds", what I primarily mean is that I predict that we should never observe a spontaneous collapse, even if we do crazy things like putting conscious observers into superposition, or putting large chunks of the gravitational field into superposition. So if we ever did observe such a spontaneous collapse, that would falsify many worlds.

Amount of calculation isn't so much the concern here as the amount of bits used to implement that calculation. And there's no law that forces the amount of bits encoding the computation to be equal. Copenhagen can just waste bits on computations that MWI doesn't have to do.

In particular, I mentioned earlier that Copenhagen has to have rules for when measurements occur and what basis they occur in. How does MWI incur a similar cost? What does MWI have to compute that Copenhagen doesn't that uses up the same number of bits of source code?

Like, yes, an expect... (read more)

And vice versa. You can do unnecessary calculation under any interpretation, so that's an uninteresting observation. The importantly is that the minimum amount of calculation you have to do get an empirically adequate theory is the same under any interpretation, because interpretations don't change the maths, they just ... interpret it.... differently. In particular, a.follower many worlder has to discard unobserved results in the same way as a Copenhagenist -- it's just that they interpret doing so as the unobserved results existing in another branch, rather than being snipped off by collapse. The maths is the same, the interpretation is different. You can also do the maths without interpreting it, as in Shut Up And Calculate. This gets back to a long-standing confusion between Copenhagen and objective collapse theories (here, I mean, not in the actual physics community). Copenhagen ,properly speaking, only claims that collapse occurs on or before measurement. It also claims that nothing is known about the ontology of.the system before collapse -- it's not the case that anything "is" a wave function. An interpretation of QM doesn't have to have an ontology, and many dont. Which, of course, is another factor that renders the whole Kolmogorov. Complexity approach inoperable. Objective collapse theories like GRW do have to specify when and collapse occurs...but MW theories have to specify when and how decoherence occurs. Decoherence isn't simple.

Right, so we both agree that the randomness used to determine the result of a measurement in Copenhagen, and the information required to locate yourself in MWI is the same number of bits. But the argument for MWI was never that it had an advantage on this front, but rather that Copenhagen used up some extra bits in the machine that generates the output tape in order to implement the wavefunction collapse procedure. (Not to decide the outcome of the collapse, those random bits are already spoken for. Just the source code of the procedure that collapses the ... (read more)

Again: that's some less calculation that the reader of the tape has to do.


If you're talking about the code complexity of "interleaving": If the Turing machine simulates quantum mechanics at all, it already has to "interleave" the representations of states for tiny things like a electrons being in a superposition of spin states or whatever. This must be done in order to agree with experimental results. And then at that point not having to put in extra rules to "collapse the wavefunction" makes things simpler.

If you're talking about the complexity of locating yourself in the computation: Inferring which world you're in is... (read more)

I'm not talking about the code complexity of interleaving the SI's output. I am talking about interpreting the serial output of the SI , as it were. If you account for that , then the total complexity is exactly the same as Copenhagen and that's the point. I'm not a dogmatic Copenhagenist, so that's not a gotcha. Basically , the amount of calculation you have to do to get an empirically adequate theory is the same under any interpretation, because interpretations don't change the maths, they just ... interpret it .....differently. The SI argument for MWI only seems to work because it encourages the reader to neglect the complexity implicit in interpreting the output tape.

This notion of faith seems like an interesting idea, but I'm not 100% sure I understand it well enough to actually apply it in an example.

Suppose Descartes were to say: "Y'know, even if there were an evil Daemon fooling every one of my senses for every hour of the day, I can still know what specific illusions the Daemon is choosing to show me. And hey, actually, it sure does seem like there are some clear regularities and patterns in those illusions, so I can sometimes predict what the Daemon will show me next. So in that sense it doesn't matter whether my... (read more)

To be clear, I'm definitely pretty sympathetic to TurnTrout's type error objection. (Namely: "If the agent gets a high reward for ingesting superdrug X, but did not ingest it during training, then we shouldn't particularly expect the agent to want to ingest superdrug X during deployment, even if it realizes this would produce high reward.") But just rereading what Zack has written, it seems quite different from what TurnTrout is saying and I still stand by my interpretation of it.

  • eg. Zack writes: "obviously the line itself does not somehow contain a repr
... (read more)

It's the same thing for piecewise-linear functions defined by multi-layer parameterized graphical function approximators: the model is the dataset. It's just not meaningful to talk about what a loss function implies, independently of the training data. (Mean squared error of what? Negative log likelihood of what? Finish the sentence!)

This confusion about loss functions...

I don't think this is a confusion, but rather a mere difference in terminology. Eliezer's notion of "loss function" is equivalent to Zack's notion of "loss function" curried with the... (read more)

The issue seems more complex and subtle to me.

It is fair to say that the loss function (when combined with the data) is a stochastic environment (stochastic due to sampling the data), and the effect of gradient descent is to select a policy (a function out of the function space) which performs very well in this stochastic environment (achieves low average loss).

If we assume the function-approximation achieves the minimum possible loss, then it must be the case that the function chosen is an optimal control policy where the loss function (understood as incl... (read more)

Could you give an example of knowledge and skills not being value neutral?

(No need to do so if you're just talking about the value of information depending on the values one has, which is unsurprising. But it sounds like you might be making a more substantial point?)

Fair enough for the alignment comparison, I was just hoping you could maybe correct the quoted paragraph to say "performance on the hold-out data" or something similar.

(The reason to expect more spread would be that training performance can't detect overfitting but performance on the hold-out data can. I'm guessing some of the nets trained in Miller et al did indeed overfit (specifically the ones with lower performance).)

More generally, John Miller and colleagues have found training performance is an excellent predictor of test performance, even when the test set looks fairly different from the training set, across a wide variety of tasks and architectures.

Seems like figure 1 from Miller et al is a plot of test performance vs. "out of distribution" test performance. One might expect plots of training performance vs. "out of distribution" test performance to have more spread.

1Nora Belrose
I doubt there would be much difference, and I think the alignment-relevant comparison is to compare in-distribution but out-of-sample performance to out-of-distribution performance. We can easily do i.i.d. splits of our data, that's not a problem. You might think it's a problem to directly test the model in scenarios where it could legitimately execute a takeover if it wanted to.

In this context, we're imitating some probability distribution, and the perturbation means we're slightly adjusting the probabilities, making some of them higher and some of them lower. The adjustment is small in a multiplicative sense not an additive sense, hence the use of exponentials. Just as a silly example, maybe I'm training on MNIST digits, but I want the 2's to make up 30% of the distribution rather than just 10%. The math described above would let me train a GAN that generates 2's 30% of the time.

I'm not sure what is meant by "the difference from... (read more)

Thanks. Your ΔH looked like ∇Q from gradient descent, but you don't intend to take derivatives, nor maximize x, so I was mistaken.  

Perturbation Theory in Machine Learning

Linkpost for:

In quantum mechanics there is this idea of perturbation theory, where a Hamiltonian is perturbed by some change to become . As long as the perturbation is small, we can use the technique of perturbation theory to find out facts about the perturbed Hamiltonian, like what its eigenvalues should be.

An interesting question is if we can also do perturbation theory in machine learning. Suppose I am training a GAN, a diffuser, or some other machine lea... (read more)

What is the difference between a perturbation and the difference from a gradient in SGD? Both seem to do the same thing. I'm just pattern-matching and don't know too much of either.

I don't think we should consider the centroid important in describing the LLM's "ontology". In my view, the centroid just points in the direction of highest density of words in the LLM's space of concepts. Let me explain:

The reason that embeddings are spread out is to allow the model to distinguish between words. So intuitively, tokens with largeish dot product between them correspond to similar words. Distinguishability of tokens is a limited resource, so the training process should generally result in a distribution of tokens that uses this resource in a... (read more)

If there's a 100 tokens for snow, it probably indicates that it's a particularly important concept for that language.
Load More