Tahp - LessWrong

Renormalization Redux: QFT Techniques for AI Interpretability

I consider the lattice to be a regulator as well, but, semantics aside, thank you for the example.

Renormalization Redux: QFT Techniques for AI Interpretability

Field theorist here. You talk about renormalization as a thing which can smooth over unimportant noise, which basically matches my understanding, but you haven't explicitly named your regulator. A regulator may be a useful concept to have in interpretability, but I have no idea if it is common in the literature.

In QFT, our issue is that we go to calculate things that are measurable and finite, but we calculate horrible infinities. Obviously those horrible infinities don't match reality, and they often seem to be coming from some particular thing we don't care about that much in our theory, so we find a way to poke it out of the theory. (To be clear, this means that our theories are wrong, and we're going to modify them until they work.) The tool by which you remove irrelevant things which cause divergences is called a regulator. A typical regulator is a momentum cutoff. You go to do the integral over all real momenta which your Feynman diagram demands, and you find that it's infinite, but if you only integrate the momenta up to a certain value, the integral is finite. Of course, now you have a bunch of weird constants sitting around which depend of the value of the cutoff. This is where renormalization comes in. You notice that there are a bunch of parameters, which are generally coupling constants, and these parameters have unknown values which you have to go out into the universe and measure. If you cleverly redefine those constants to be some "bare constant" added to a "correction" which depends on the cutoff, you can do your cutoff integral and set the "correction" to be equal to whatever it needs to be to get rid of all the terms which depend on your cutoff. (edit for clarity: This is the thing that I refer to when I say "renormalization." Cleverly redefining bare parameters to get rid of unphysical effects of a regulator.) By this two step dance, you have taken your theoretical uncertainty about what happens at high momenta and found a way to wrap it up in the values of your coupling constants, which are the free parameters which you go and measure in the universe anyway. Of course, now your coupling constants are different if you choose a different regulator or a different renormalization scheme to remove it, but physicists have gotten used to that.

So you can't just renormalize, you need to define a regulator first. You can even justify your regulator. It is a typical justification for a momentum cutoff that you're using a perturbative theory which is only valid at low energy scales. So what's the regulator for AI interpretability? Why are you justified in regulating in this way? It seems like you might be pointing at regulators when you talk about 1/w and d/w, but you might also be talking about orders in a perturbation expansion, which is a different thing entirely.

Tahp's Shortform

Tahp3mo20

A decision-theoretic case for a land value tax.

You can basically only take income tax by threatening people. "Give me 40% of your earnings or I put you in prison." It is the nicest type of threatening! Stable governments have a stellar reputation for only doing it once per year and otherwise not escalating the extortion. You gain benefit from the stable civilization supported by such stable governments because they use your taxes to pay for it. But there's no reason for the government to put you in prison except for the fact that they expect you to give them money not to. By participating, you are showing that you will respond to threats, which is an incentive to extract more wealth from you. If enough people understood decision theory and were dissatisfied by the uses the government put their money to, they could refuse to pay and the prison system wouldn't be big enough to deal with it. Oops, it's time to overthrow the government.

Under a better land value tax, the consequence for not paying your taxes is that the government takes the land away and gives it to someone else. They aren't threatening you, they're just reassigning their commitment to protect the interests of the person who uses the land over to a user who will pay them for the service. Of course, people can still all refuse to do it if they don't like the uses to which government puts their money, and from the point of view of the person paying taxes, it's still pretty much a case of "pay up or something bad will happen to you," so some would argue that the difference is mostly academic. That said, I really prefer to have a government which does not have "devise ways to make people miserable for the purpose of making them miserable" (you know, prison as a threat) as a load-bearing element of its mechanisms of perpetuating itself.

This argument flagrantly stolen from planecrash: https://www.projectlawful.com/replies/1721794#reply-1721794 Of course planecrash also offers an argument for what gives a hypothetical government the right to claim ownership for the land: https://www.projectlawful.com/replies/1773744#reply-1773744 I was inspired to write this by Richard Ngo's definition of unconditional love at https://x.com/richardmcngo/status/1872107000479568321 and the context of that post.

The Intelligence Curse

Tahp3mo51

I think your point has some merit in the world where AI is useful and intelligent enough to overcome the sticky social pressure to employ humans but hasn't killed us all yet. That said, I think AI will most likely kill us all in that 1-5 year window after becoming cheaper, faster, and more reliable than humans at most economic activity, and I think you have to convince me that I'm wrong about that before I start worrying about humans not hiring me because AI is smarter than I am. However I want to complain about this particular point you made because I don't think it's literally true:

Powerful actors don’t care about you out of the goodness of their heart.

One of the reasons why AI alignment is harder than people think, is because they say stuff like this and think AI doesn't care about people in the way that powerful actors don't care about people. This is generally not true. You cannot in general pay a legislator $400 to kill a person who pays no taxes and doesn't vote. That is impressive when you think about it. You can argue that they fear reputational damages or going to prison, but I truly think that if you took away the consequences, $400 would not be enough money to make most legislators overcome their distaste for killing another human being with their bare hands. Some of them really truly want to make society better, even if they aren't very effective at it. Call it noblesse oblige if you want, but it's in their utility function to do things which aren't just give the state more money or gain more personal power. The people who steer large organizations have goodness in their hearts, however little, and thus the organizations they steer do too, even if only a little. Moloch hasn't won yet. America the state is willing to let a lot of elderly people rot, but America wasn't in fact willing to let Covid rip, even though that might have stopped the collapse of many tax-generating businesses, and most people who generate taxes would have survived. I don't think that's because the elderly people who overwhelmingly would have been killed by that are an important voting constituency for the party which pushed hardest for lockdown.

AI which knows it won't get caught and literally only cares about tax revenue and power will absolutely kill anyone who isn't useful to them for $400. That's $399 worth of power they didn't have before if killing someone costs $1 of attention. I don't particularly want to live in a world where 1% percent of people are very wealthy and everyone else is dying of poverty because they've been replaced by AI, but that's a better world than the one I expect where literally every human is killed because, for example, those so-called "reliable" AIs doing all of the work humans used to do as of yesterday liked paperclips more than we thought and start making them today.

Lucius Bushnaq's Shortform

Tahp3mo1713

Thank you. As a physicist, I wish I had an easy way to find papers which say "I tried this kind of obvious thing you might be considering and nothing interesting happened."

The Field of AI Alignment: A Postmortem, and What To Do About It

Tahp4mo10

My current job is only offered to me on the condition that I am doing physics research. I have some flexibility to do other things at the same time though. The insights and resources you list seem useful to me, so thank you.

The Field of AI Alignment: A Postmortem, and What To Do About It

Tahp4mo411

I am a physics PhD student. I study field theory. I have a list of projects I've thrown myself at with inadequate technical background (to start with) and figured out. I've convinced a bunch of people at a research institute that they should keep giving me money to solve physics problems. I've been following LessWrong with interest for years. I think that AI is going to kill us all, and would prefer to live for longer if I can pull it off. So what do I do to see if I have anything to contribute to alignment research? Maybe I'm flattering myself here, but I sound like I might be a person of interest for people who care about the pipeline. I don't feel like a great candidate because I don't have any concrete ideas for AI research topics to chase down, but it sure seems like I might start having ideas if I worked on the problem with somebody for a bit. I'm apparently very ok with being an underpaid gopher to someone with grand theoretical ambitions while I learn the material necessary to come up with my own ideas. My only lead to go on is "go look for something interesting in MATS and apply to it" but that sounds like a great way to end up doing streetlight research because I don't understand the field. Ideally, I guess I would have whatever spark makes people dive into technical research in a pretty low-status field for no money for long enough to produce good enough research which convinces people to pay their rent while they keep doing more, but apparently the field can't find enough of those that it's unwilling to look for other options.

I know what to do to keep doing physics research. My TA assignment effectively means that I have a part-time job teaching teenagers how to use Newton's laws so I can spend twenty or thirty hours a week coding up quark models. I did well on a bunch of exams to convince an institution that I am capable of the technical work required to do research (and, to be fair, I provide them with 15 hours per week of below-market-rate intellectual labor which they can leverage into tuition that more than pays my salary), so now I have a lot of flexibility to just drift around learning about physics I find interesting while they pay my rent. If someone else is willing to throw 30,000 dollars per year at me to think deeply about AI and get nowhere instead of thinking deeply about field theory to get nowhere, I am not aware of them. Obviously the incentives are perverse to just go around throwing money at people who might be good at AI research, so I'm not surprised that I've only found one potential money spigot for AI research, but I had so many to choose from for physics.

Everything you care about is in the map

Tahp4mo10

The simulation is not reality, so it can have hidden variables, it just can't simulate in-system observers knowing about the hidden variables. I think quantum mechanics experiments should still have the same observed results within the system as long as you use the right probability distributions over on-site interactions. You could track Everett branches if you want to have many possible worlds, but the idea is just to get one plausible world, so it's not relevant to the thought experiment.

The point is that I have every reason to believe that a single-level ruleset could produce a map which all of our other maps could align with to the same degree as the actual territory. I agree that my approach is reductionist. I'm not ready to comment on LDSL

Everything you care about is in the map

Tahp4mo10

From the inside, it feels like I want to know what's going on as a terminal value. I have often compared my desire to study physics to my desire to understand how computers work. I was never satisfied by the "it's just ones and zeros" explanation, which is not incorrect, but also doesn't help me understand why this object is able to turn code into programs. I needed to have examples of how you can build logic gates into adders and so on and have the tiers of abstraction that go from adders, etc to CPU instructions to compilers to applications, and I had a nagging confusion about using computers for years until I understood that chain at least a little bit. There is a satisfaction which comes with the dissolution of that nagging confusion which I refer to as joy.

There's a lot to complain about when it comes to public education in the United States, but I at least felt like I got a good set of abstractions with which to explain my existence, which was a chain that went roughly Newtonian mechanics on top of organs on top of cells on top of proteins on top of DNA on top of chemistry on top of electromagnetism and quantum mechanics, the latter of which wasn't explained at all. I studied physics in college, and the only things I got out of it were a new toolset and an intuitive understanding for how magnets work. In graduate school, I actually completed the chain of atoms on top of standard model on top of field theory on top of quantum mechanics in a way that felt satisfying. Now I have a few hanging threads, which include that I understand how matter is built out of fields on top of spacetime, but I don't understand what spacetime actually is, and also the universe is full of dark matter which I don't have an explanation for.

Everything you care about is in the map

Tahp4mo30

I'm putting in my reaction to your original comment as I remember it in case it provides useful data for you. Please do not search for subtext or take this as a request for any sort of response; I'm just giving data at the risk of oversharing because I wonder if my reaction is at all indicative of the people downvoting.

I thought about downvoting because your comment seemed mean-spirited. I think the copypasta format and possibly the flippant use of an LLM made me defensive. I mostly decided I was mistaken about it being mean spirited because I don't think that you would post a mean comment on a post like this based on my limited-but-nonzero interaction with you. At that point, I either couldn't see what mixing epistemology in with the pale blue dot speech added to the discussion, or it didn't resonate with me, so I stopped thinking about it and left the comment alone.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments