All of Tahp's Comments + Replies

1Lauren Greenspan
For me, this question of the relevant scale(s) is the main point of introducing this work. d/w is one example of a cutoff, and one associated with the data distribution is another, but more work needs to be done to understand how to relate potentially different theoretical descriptions (for example, how these two cutoffs work together). We also mention the 'lattice as regulator' as a natural cut-off for physical systems, and hope to find similarly natural scales in real-world AI systems. 

Field theorist here. You talk about renormalization as a thing which can smooth over unimportant noise, which basically matches my understanding, but you haven't explicitly named your regulator. A regulator may be a useful concept to have in interpretability, but I have no idea if it is common in the literature.

In QFT, our issue is that we go to calculate things that are measurable and finite, but we calculate horrible infinities. Obviously those horrible infinities don't match reality, and they often seem to be coming from some particular thing we don't c... (read more)

2Dmitry Vaintrob
This is where this question of "scale" comes in. I want to add that (at least morally/intuitively) we are also thinking about discrete systems like lattices, and then instead of a regulator you have a coarsegraining or a "blocking transformation", which you have a lot of freedom to choose. For example in PDLT, the object that plays the role of coarsegraining is the operation that takes a probability distribution on neurons and applies a single-layer NN to it.

A decision-theoretic case for a land value tax.

You can basically only take income tax by threatening people. "Give me 40% of your earnings or I put you in prison." It is the nicest type of threatening! Stable governments have a stellar reputation for only doing it once per year and otherwise not escalating the extortion. You gain benefit from the stable civilization supported by such stable governments because they use your taxes to pay for it. But there's no reason for the government to put you in prison except for the fact that they expect you to give th... (read more)

I think your point has some merit in the world where AI is useful and intelligent enough to overcome the sticky social pressure to employ humans but hasn't killed us all yet. That said, I think AI will most likely kill us all in that 1-5 year window after becoming cheaper, faster, and more reliable than humans at most economic activity, and I think you have to convince me that I'm wrong about that before I start worrying about humans not hiring me because AI is smarter than I am. However I want to complain about this particular point you made because I don... (read more)

4tangerine
Indeed not directly, but when the inferential distance increases it quickly becomes more palatable. For example, most people would rather buy a $5 T-shirt that was made by a child for starvation wages on the other side of the world, instead of a $100 T-shirt made locally by someone who can afford to buy a house with their salary. And many of those same T-shirt buyers would bury their head in the sand when made aware of such a fact. If I can tell an AI to increase profits, incidentally causing the AI to ultimately kill a bunch of people, I can at least claim a clean conscience by saying that wasn't what I intended, even though it happened just the same. In practice, legislators do this sort of thing routinely. They pass legislation that causes harm—sometimes a lot of harm—and sleep soundly.

Thank you. As a physicist, I wish I had an easy way to find papers which say "I tried this kind of obvious thing you might be considering and nothing interesting happened."

My current job is only offered to me on the condition that I am doing physics research. I have some flexibility to do other things at the same time though. The insights and resources you list seem useful to me, so thank you.

4Linda Linsefors
Ok, in that case I want to give you this post as inspiration. Changing the world through slack & hobbies — LessWrong

I am a physics PhD student. I study field theory. I have a list of projects I've thrown myself at with inadequate technical background (to start with) and figured out. I've convinced a bunch of people at a research institute that they should keep giving me money to solve physics problems. I've been following LessWrong with interest for years. I think that AI is going to kill us all, and would prefer to live for longer if I can pull it off. So what do I do to see if I have anything to contribute to alignment research? Maybe I'm flattering myself here, but I... (read more)

5Linda Linsefors
Is it an option to keep your current job but but spend your research hours on AI Safety instead of quarks? Is this something that would appealing to you + acceptable to your employer? Given the current AI safety funding situation, I would strongly reccomend not giving up your current income.  I think that a lot of the pressure towards street light research comes from the funding situation. The grants are short and to stay in the game you need visible results quickly.  I think MATS could be good, if you can treat it as exploration, but not so good if you're in a hurry to get a job or a grant directly afterwards. Since MATS is 3 months of full time, it might not fit into your schedule (without quitting your job). Maybe instead try SPAR. Or see here for more options. Or you can skip the training program route, and just start reading on your own. There's lots and lots of AI safety reding lists. I reccomend this one for you. @Lucius Bushnaq who created  and maintains it, also did quark physics, before switching to AI Safety. But if you don't like it, there are more options here under "Self studies". In general, the funding situation in AI safety is pretty shit right now, but other than that, there are so many resources to help people get started. It's just a matter of choosing where to start.  
2nc
I am surprised that you find theoretical physics research less tight funding-wise than AI alignment [is this because the paths to funding in physics are well-worn, rather than better resourced?]. This whole post was a little discouraging. I hope that the research community can find a way forward.
6plex
If you're mobile (able to be in the UK) and willing to try a different lifestyle, consider going to the EA hotel aka CEEALAR, they offer free food and accomodation for a bunch of people, including many people working on AI safety. Alternatively, taking a quick look at https://www.aisafety.com/funders, the current best options are maybe LTFF, OpenPhil, CLR, or maybe AE Studios?

You could consider doing MATS as "I don't know what to do, so I'll try my hand at something a decent number of apparent experts consider worthwhile and meanwhile bootstrap a deep understanding of this subfield and a shallow understanding of a dozen other subfields pursued by my peers." This seems like a common MATS experience and I think this is a good thing.

The first step would probably be to avoid letting the existing field influence you too much. Instead, consider from scratch what the problems of minds and AI are, how they relate to reality and to other problems, and try to grab them with intellectual tools you're familiar with. Talk to other physicists and try to get into exploratory conversation that does not rely on existing knowledge. If you look at the existing field, look at it like you're studying aliens anthropologically.

9yams
[was a manager at MATS until recently and want to flesh out the thing Buck said a bit more] It’s common for researchers to switch subfields, and extremely common for MATS scholars to get work doing something different from what they did at MATS. (Kosoy has had scholars go on to ARC, Neel scholars have ended up in scalable oversight, Evan’s scholars have a massive spread in their trajectories; there are many more examples but it’s 3 AM.) Also I wouldn’t advise applying to something that seems interesting; I’d advise applying for literally everything (unless you know for sure you don’t want to work with Neel, since his app is very time intensive). The acceptance rate is ~4 percent, so better to maximize your odds (again, for most scholars, the bulk of the value is not in their specific research output over the 10 week period, but in having the experience at all). Also please see Ryan’s replies to Tsvi on the talent needs report for more notes on the street lighting concern as it pertains to MATS. There’s a pretty big back and forth there (I don’t cleanly agree with one side or the other, but it might be useful to you).

It sounds like you should apply for the PIBBSS Fellowship! (https://pibbss.ai/fellowship/)

Going to MATS is also an opportunity to learn a lot more about the space of AI safety research, e.g. considering the arguments for different research directions and learning about different opportunities to contribute. Even if the "streetlight research" project you do is kind of useless (entirely possible), doing MATS is plausibly a pretty good option.

The simulation is not reality, so it can have hidden variables, it just can't simulate in-system observers knowing about the hidden variables. I think quantum mechanics experiments should still have the same observed results within the system as long as you use the right probability distributions over on-site interactions. You could track Everett branches if you want to have many possible worlds, but the idea is just to get one plausible world, so it's not relevant to the thought experiment.

The point is that I have every reason to believe that a single-lev... (read more)

From the inside, it feels like I want to know what's going on as a terminal value. I have often compared my desire to study physics to my desire to understand how computers work. I was never satisfied by the "it's just ones and zeros" explanation, which is not incorrect, but also doesn't help me understand why this object is able to turn code into programs. I needed to have examples of how you can build logic gates into adders and so on and have the tiers of abstraction that go from adders, etc to CPU instructions to compilers to applications, and I had a ... (read more)

2tailcalled
How about geology, ecology and history? It seems like you are focused on mechanisms rather than contents.

I'm putting in my reaction to your original comment as I remember it in case it provides useful data for you. Please do not search for subtext or take this as a request for any sort of response; I'm just giving data at the risk of oversharing because I wonder if my reaction is at all indicative of the people downvoting.

I thought about downvoting because your comment seemed mean-spirited. I think the copypasta format and possibly the flippant use of an LLM made me defensive. I mostly decided I was mistaken about it being mean spirited because I don't think ... (read more)

3Raemon
Nod.  Fwiw I mostly just thought it was funny in a way that was sort of neutral on "is this a reasonable frame or not?". It was the first thing I thought of as soon as I read your post title. (I think it's both true that in an important sense everything we care about is in the Map, and also true in an important sense that it's not, and in the ways it was true it felt like a kind of legitimately poignant rewrite that felt like it helped me appreciate your post, and insofar as it was false it seemed hilarious (non-meanspiritedly, just in a "it's funny that so many lines from the original remain reasonable sentences when you reframe it as about epistemology"))

I think I see what you're saying, let me try to restate it:

If the result you are predicting is course-grained enough, then there exist models which give a single prediction with probability so close to one that you might as well just take the model as truth.

2Noosphere89
Yes, and as a contrapositive, if you had enough computing power, you could narrow down the set of models to 1 for even arbitrarily fine-grained predictions.

I appreciate your link to your posts on Linear Diffusion of Sparse Lognormals. I'll take a look later. My responses to your other points are essentially reductionist arguments, so I suspect that's a crux.

That said, I'm using "quantum mechanics" to mean "some generalization of the standard model" in many places. In practice, the actual experimental predictions of the standard model are something like probability distributions over the starting and ending momentum states of particles before and after they interact at the same place at the same time, so I don... (read more)

2tailcalled
I think this still has the ambiguity that I am complaining about. As an analogy, consider the distinction between: * Some population of rabbits that is growing over time due to reproduction * The Fibonacci sequence as a model of the growth dynamics of this population * A computer program computing or mathematician deriving the numbers in or properties of this sequence The first item in this list is meant to be analogous to the quantum mechanics qua the universe, as in it is some real-world entity that one might hypothesize acts according to certain rules, but exists regardless. The second is a Platonic mathematical object that one might hypothesize matches the rules of the real-world entity. And the third are actual instantiations of this Platonic mathematical object in reality. I would maybe call these "the territory", "the hypothetical map" and "the actual map", respectively. Wouldn't this fail for metals, quantum computing, the double slit experiment, etc.? By switching back and forth between quantum and classical, it seems like you forbid any superpositions/entanglement/etc. on a scale larger than your classical lattice size. The standard LessWrongian approach is to just bite the bullet on the many worlds interpretation (which I have some philosophical quibbles with, but those quibbles aren't so relevant to this discussion, I think, so I'm willing to grant the many worlds interpretation if you want). Anyway, more to the point, this clearly cannot be done with the actual map, and the hypothetical map does not actually exist, so my position is that while this may help one understand the notion that there is an rule that perfectly constrains  the world, the thought experiment does not actually work out. Somewhat adjacently, your approach to this is reductionistic, viewing large entities as being composed of unfathomably many small entities. As part of LDSL I'm trying to wean myself off of reductionism, and instead take large entities to be more fundamental

In what way? I find myself disagreeing vehemently, so I would appreciate an example.

Maps are territory in the sense that the territory is the substrate on which minds with maps run, but one of my main points here is that our experience is all map, and I don't think any human has ever had a map which remotely resembles the substrate on which we all run.

2Noosphere89
I'm making a general comment, but yes what I mean is that in some idealized cases, you can model the territory under consideration well enough to make the map-territory distinction illusory. Of course, this requires a lot, lot more compute than we usually have.

This is tangential to what I'm saying, but it points at something that inspired me to write this post. Eliezer Yudkowsky says things like the universe is just quarks, and people say "ah, but this one detail of the quark model is wrong/incomplete" as if it changes his argument when it doesn't. His point, so far as I understand it, is that the universe runs on a single layer somewhere, and higher-level abstractions are useful to the extent that they reflect reality. Maybe you change your theories later so that you need to replace all of his "quark" and "quan... (read more)

2tailcalled
My in-depth response to the rationalist-reductionist-empiricist worldview is Linear Diffusion of Sparse Lognormals. Though there's still some parts of it I need to write. The main objection I have here is that "single layer" is not so much the true rules of reality so much as it is the subset of rules that are unobjectionable due to applying everywhere and every time. It's like the minimal conceivable set of rules. I'd argue the practical rules of the world are determined not just by the idealized rules, but also by the big entities within the world. The simplest example is outer space; it acts as a negentropy source and is the reason we can assume that e.g. electrons go into the lowest orbitals (whereas if e.g. outer space was full of hydrogen, it would undergo fusion, bombard us with light, and turn the earth into a plasma instead). More elaborate examples would be e.g. atmospheric oxygen, whose strong reactivity leads to a lot of chemical reactions, or even e.g. thinking of people as economic agents means that economic trade opportunities get exploited. It's sort of conceivable that quantum mechanics describes the dynamics as a function of the big entities, but we only really have strong reasons to believe so with respect to the big entities we know about, rather than all big entities in general. (Maybe there are some entities that are sufficiently constant that they are ~impossible to observe.) But in the context of your original post, everything you care about is large scale, and in particular the territory itself is large scale. It's not a statement about quantum mechanics if you view quantum mechanics as a Platonic mathematical ideal, or if you use "quantum mechanics" to refer to the universe as it really is, but it is a statement about quantum mechanics if you view it as a collection of models that are actually used. Maybe we should have three different terms to distinguish the three?

It may be that generating horrible counterfactual lines of thought for the purpose of rejecting them is necessary for getting better outcomes. To the extent that you have a real dichotomy here, I would say that the input/output mapping is the thing that matters. I want all humans to not end up worse off for inventing AI.

That said, humans may end up worse off by our own metrics if we make AI that is itself suffering terribly based off of its internal computation or it is generating ancestor torture simulations or something. Technically that is an alignment ... (read more)

I'm doing a physics PhD, and you're making me feel better about my coding practices. I appreciate your explicit example as well, as I'm interested in trying my hand at ML research and curious about what it looks like in terms of toolsets and typical sort-of-thing-one-works-on. I want to chime in down here in the comments to assure people that at least one horrible coder in a field which has nothing to do with machine learning (most of the time) thinks that the sentiment of this post is true. I admit that I'm biased by having very little formal CS training,... (read more)

1Oliver Daniels
thanks for the detailed (non-ML) example!  exactly the kind of thing I'm trying to get at

I'd agree with you, because I'm a full-time student, but I'm doing research part-time in practice because I'm losing half my time to working as a TA to pay my rent. Part of me wonders if I could find a real job and slow-roll the PhD.

1Mo Putera
What about just not pursuing a PhD and instead doing what OP did? With the PhD you potentially lose #1 in  which is where much of the impact comes from, especially if you subscribe to a multiplicative view of impact.

I think we're both saying the same thing here, except that the thing I'm saying implies that I would bet for Eliezer being pessimistic about this. My point was that I have a lot of pessimism that people would code something wrong even if we knew what we were trying to code, and this is where a lot of my doom comes from. Beyond that, I think we don't know what it is we're trying to code up, and you give some evidence for that. I'm not saying that if we knew how to make good AI, it would still fail if we coded it perfectly. I'm saying we don't know how to ma... (read more)

I might as well check out the panel discussion. I didn't know about it.

I think I listened to the Hotz debate. The highlight of that one was when Hotz implied that he was using an LLM to drive a car, Yudkowsky freaks out a bit, and Hotz clarifies that he means the architecture for his learning algorithm is basically the same as an LLM.

I suspect the Destiny discussion is qualitatively similar to the Dwarkesh one.

At this point, maybe I should just read old MIRI papers.

I think that our laws of physics are in part a product of our perception, but I need to clarify what I mean by that. I doubt space or time are fundamental pieces in whatever machine code runs our universe, but that doesn't mean that you can take perception-altering drugs and travel through time. I think that somehow the fact that human intelligence was built on the evolutionary platform of DNA means that any physics we come up with has to build up to atoms which have the chemical properties that make DNA work. Physics doesn't have to describe everything, i... (read more)

This is unoriginal, but any argument that smart AI is dangerous by default is also an argument that aliens are dangerous by default. If you want to trade with aliens, you should preemptively make it hard enough to steal all of your stuff so that gains from trade are worthwhile even if you meet aliens that don't abstractly care about other sentient beings.

I don't think you're being creative enough about solving the problem cheaply, but I also don't think this particular detail is relevant to my main point. Now you've made me think more about the problem, here's me making a few more steps toward trying to resolve my confusion:

The idea with instrumental convergence is that smart things with goals predictably go hard with things like gathering resources and increasing odds of survival before the goal is complete which are relevant to any goal. As a directionally-correct example for why this could be lethal, hu... (read more)

Oops, I meant cellular, and not molecular. I'm going to edit that.

I can come up with a story in which AI takes over the world. I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing? I see a path where instrumental convergence leads anything going hard enough to want to put all of the atoms on the most predictable path it can dictate. I think the thing that I don't get is what principle it is that makes anything useful g... (read more)

2localdeity
Erm... For preventing nuclear war on the scale of decades... I don't know what you have in mind for how it would disable all the nukes, but a one-off breaking of all the firing mechanisms isn't going to work.  They could just repair/replace that once they discovered the problem.  You could imagine some more drastic thing like blowing up the conventional explosives on the missiles so as to utterly ruin them, but in a way that doesn't trigger the big chain reaction.  But my impression is that, if you have a pile of weapons-grade uranium, then it's reasonably simple to make a bomb out of it, and since uranium is an element, no conventional explosion can eliminate that from the debris.  Maybe you can melt it, mix it with other stuff, and make it super-impure? But even then, the U.S. and Russia probably have stockpiles of weapons-grade uranium.  I suspect they could make nukes out of that within a few months.  You would have to ruin all the stockpiles too. And then there's the possibility of mining more uranium and enriching it; I feel like this would take a few years at most, possibly much less if one threw a bunch of resources into rushing it.  Would you ruin all uranium mines in the world somehow? No, it seems to me that the only ways to reliably rule out nuclear war involve either using overwhelming physical force to prevent people from using or making nukes (like a drone army watching all the uranium stockpiles), or being able to reliably persuade the governments of all nuclear powers in the world to disarm and never make any new nukes.  The power to do either of these things seems tantamount to the power to take over the world.

Be careful. Physics seems to be translation invariant, but space is not. You can drop the ball in and out of the cave and its displacement over time will be the same, but you can definitely tell whether it is in the cave or out of the cave. You can set your zero point anywhere, but that doesn’t mean that objects in space move when you change your zero point. Space is isotropic. There’s no discernible difference between upward, sideways, or diagonal, but if you measure the sideways distance between two houses to be 40 meters, a person who called your “sidew... (read more)

2tailcalled
Formally, I mean that translation commutes with time-evolution. (Maybe "translation-equivariant" would be a better term? Idk, am not a physicist.) I guess my story could have been better written to emphasize the commutativity aspect.

The main idea seems good: if you're in a situation where you think you might be in the process of being deceived by an AI, do not relax when the AI provides great evidence that it is not deceiving you. The primary expected outputs of something really good at deception should be things which don't look like deception.

Some of the things in the post don't seem general enough to me, so I want to try to restate them.

Test 1 I like. If you understand all of the gears, you should understand the machine.

Test 2 I like. Tweak the model in a way that should make it wo... (read more)

The ad market amounts to an auction for societal control. An advertisement is an instrument by which an entity attempts to change the future behavior of many other entities. Generally it is an instrument for a company to make people buy their stuff. There is also political advertising, which is an instrument to make people take actions in support of a cause or person seeking power. Advertising of any type is not known for making reason-based arguments. I recall in an interview with the author that this influence/prediction market was a major objection to t... (read more)

There’s also timeless decision theory to consider. A rational agent should take other rational agents into consideration when choosing actions. If I choose to go vegan, it stands to reason that similarly acting moral agents would also choose that course. If many (but importantly not all) people want to be vegan, then demand for vegan foods goes up. If demand for vegan food goes up, then suppliers make more vegan food and have an incentive to make it cheaper and tastier. If vegan food is cheaper and tastier, than more people who were on the fence about vega... (read more)