Field theorist here. You talk about renormalization as a thing which can smooth over unimportant noise, which basically matches my understanding, but you haven't explicitly named your regulator. A regulator may be a useful concept to have in interpretability, but I have no idea if it is common in the literature.
In QFT, our issue is that we go to calculate things that are measurable and finite, but we calculate horrible infinities. Obviously those horrible infinities don't match reality, and they often seem to be coming from some particular thing we don't c...
A decision-theoretic case for a land value tax.
You can basically only take income tax by threatening people. "Give me 40% of your earnings or I put you in prison." It is the nicest type of threatening! Stable governments have a stellar reputation for only doing it once per year and otherwise not escalating the extortion. You gain benefit from the stable civilization supported by such stable governments because they use your taxes to pay for it. But there's no reason for the government to put you in prison except for the fact that they expect you to give th...
I think your point has some merit in the world where AI is useful and intelligent enough to overcome the sticky social pressure to employ humans but hasn't killed us all yet. That said, I think AI will most likely kill us all in that 1-5 year window after becoming cheaper, faster, and more reliable than humans at most economic activity, and I think you have to convince me that I'm wrong about that before I start worrying about humans not hiring me because AI is smarter than I am. However I want to complain about this particular point you made because I don...
I am a physics PhD student. I study field theory. I have a list of projects I've thrown myself at with inadequate technical background (to start with) and figured out. I've convinced a bunch of people at a research institute that they should keep giving me money to solve physics problems. I've been following LessWrong with interest for years. I think that AI is going to kill us all, and would prefer to live for longer if I can pull it off. So what do I do to see if I have anything to contribute to alignment research? Maybe I'm flattering myself here, but I...
You could consider doing MATS as "I don't know what to do, so I'll try my hand at something a decent number of apparent experts consider worthwhile and meanwhile bootstrap a deep understanding of this subfield and a shallow understanding of a dozen other subfields pursued by my peers." This seems like a common MATS experience and I think this is a good thing.
The first step would probably be to avoid letting the existing field influence you too much. Instead, consider from scratch what the problems of minds and AI are, how they relate to reality and to other problems, and try to grab them with intellectual tools you're familiar with. Talk to other physicists and try to get into exploratory conversation that does not rely on existing knowledge. If you look at the existing field, look at it like you're studying aliens anthropologically.
Going to MATS is also an opportunity to learn a lot more about the space of AI safety research, e.g. considering the arguments for different research directions and learning about different opportunities to contribute. Even if the "streetlight research" project you do is kind of useless (entirely possible), doing MATS is plausibly a pretty good option.
The simulation is not reality, so it can have hidden variables, it just can't simulate in-system observers knowing about the hidden variables. I think quantum mechanics experiments should still have the same observed results within the system as long as you use the right probability distributions over on-site interactions. You could track Everett branches if you want to have many possible worlds, but the idea is just to get one plausible world, so it's not relevant to the thought experiment.
The point is that I have every reason to believe that a single-lev...
From the inside, it feels like I want to know what's going on as a terminal value. I have often compared my desire to study physics to my desire to understand how computers work. I was never satisfied by the "it's just ones and zeros" explanation, which is not incorrect, but also doesn't help me understand why this object is able to turn code into programs. I needed to have examples of how you can build logic gates into adders and so on and have the tiers of abstraction that go from adders, etc to CPU instructions to compilers to applications, and I had a ...
I'm putting in my reaction to your original comment as I remember it in case it provides useful data for you. Please do not search for subtext or take this as a request for any sort of response; I'm just giving data at the risk of oversharing because I wonder if my reaction is at all indicative of the people downvoting.
I thought about downvoting because your comment seemed mean-spirited. I think the copypasta format and possibly the flippant use of an LLM made me defensive. I mostly decided I was mistaken about it being mean spirited because I don't think ...
I appreciate your link to your posts on Linear Diffusion of Sparse Lognormals. I'll take a look later. My responses to your other points are essentially reductionist arguments, so I suspect that's a crux.
That said, I'm using "quantum mechanics" to mean "some generalization of the standard model" in many places. In practice, the actual experimental predictions of the standard model are something like probability distributions over the starting and ending momentum states of particles before and after they interact at the same place at the same time, so I don...
In what way? I find myself disagreeing vehemently, so I would appreciate an example.
Maps are territory in the sense that the territory is the substrate on which minds with maps run, but one of my main points here is that our experience is all map, and I don't think any human has ever had a map which remotely resembles the substrate on which we all run.
This is tangential to what I'm saying, but it points at something that inspired me to write this post. Eliezer Yudkowsky says things like the universe is just quarks, and people say "ah, but this one detail of the quark model is wrong/incomplete" as if it changes his argument when it doesn't. His point, so far as I understand it, is that the universe runs on a single layer somewhere, and higher-level abstractions are useful to the extent that they reflect reality. Maybe you change your theories later so that you need to replace all of his "quark" and "quan...
It may be that generating horrible counterfactual lines of thought for the purpose of rejecting them is necessary for getting better outcomes. To the extent that you have a real dichotomy here, I would say that the input/output mapping is the thing that matters. I want all humans to not end up worse off for inventing AI.
That said, humans may end up worse off by our own metrics if we make AI that is itself suffering terribly based off of its internal computation or it is generating ancestor torture simulations or something. Technically that is an alignment ...
I'm doing a physics PhD, and you're making me feel better about my coding practices. I appreciate your explicit example as well, as I'm interested in trying my hand at ML research and curious about what it looks like in terms of toolsets and typical sort-of-thing-one-works-on. I want to chime in down here in the comments to assure people that at least one horrible coder in a field which has nothing to do with machine learning (most of the time) thinks that the sentiment of this post is true. I admit that I'm biased by having very little formal CS training,...
I think we're both saying the same thing here, except that the thing I'm saying implies that I would bet for Eliezer being pessimistic about this. My point was that I have a lot of pessimism that people would code something wrong even if we knew what we were trying to code, and this is where a lot of my doom comes from. Beyond that, I think we don't know what it is we're trying to code up, and you give some evidence for that. I'm not saying that if we knew how to make good AI, it would still fail if we coded it perfectly. I'm saying we don't know how to ma...
I might as well check out the panel discussion. I didn't know about it.
I think I listened to the Hotz debate. The highlight of that one was when Hotz implied that he was using an LLM to drive a car, Yudkowsky freaks out a bit, and Hotz clarifies that he means the architecture for his learning algorithm is basically the same as an LLM.
I suspect the Destiny discussion is qualitatively similar to the Dwarkesh one.
At this point, maybe I should just read old MIRI papers.
I think that our laws of physics are in part a product of our perception, but I need to clarify what I mean by that. I doubt space or time are fundamental pieces in whatever machine code runs our universe, but that doesn't mean that you can take perception-altering drugs and travel through time. I think that somehow the fact that human intelligence was built on the evolutionary platform of DNA means that any physics we come up with has to build up to atoms which have the chemical properties that make DNA work. Physics doesn't have to describe everything, i...
This is unoriginal, but any argument that smart AI is dangerous by default is also an argument that aliens are dangerous by default. If you want to trade with aliens, you should preemptively make it hard enough to steal all of your stuff so that gains from trade are worthwhile even if you meet aliens that don't abstractly care about other sentient beings.
I don't think you're being creative enough about solving the problem cheaply, but I also don't think this particular detail is relevant to my main point. Now you've made me think more about the problem, here's me making a few more steps toward trying to resolve my confusion:
The idea with instrumental convergence is that smart things with goals predictably go hard with things like gathering resources and increasing odds of survival before the goal is complete which are relevant to any goal. As a directionally-correct example for why this could be lethal, hu...
Oops, I meant cellular, and not molecular. I'm going to edit that.
I can come up with a story in which AI takes over the world. I can also come up with a story where obviously it's cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing? I see a path where instrumental convergence leads anything going hard enough to want to put all of the atoms on the most predictable path it can dictate. I think the thing that I don't get is what principle it is that makes anything useful g...
Be careful. Physics seems to be translation invariant, but space is not. You can drop the ball in and out of the cave and its displacement over time will be the same, but you can definitely tell whether it is in the cave or out of the cave. You can set your zero point anywhere, but that doesn’t mean that objects in space move when you change your zero point. Space is isotropic. There’s no discernible difference between upward, sideways, or diagonal, but if you measure the sideways distance between two houses to be 40 meters, a person who called your “sidew...
The main idea seems good: if you're in a situation where you think you might be in the process of being deceived by an AI, do not relax when the AI provides great evidence that it is not deceiving you. The primary expected outputs of something really good at deception should be things which don't look like deception.
Some of the things in the post don't seem general enough to me, so I want to try to restate them.
Test 1 I like. If you understand all of the gears, you should understand the machine.
Test 2 I like. Tweak the model in a way that should make it wo...
The ad market amounts to an auction for societal control. An advertisement is an instrument by which an entity attempts to change the future behavior of many other entities. Generally it is an instrument for a company to make people buy their stuff. There is also political advertising, which is an instrument to make people take actions in support of a cause or person seeking power. Advertising of any type is not known for making reason-based arguments. I recall in an interview with the author that this influence/prediction market was a major objection to t...
There’s also timeless decision theory to consider. A rational agent should take other rational agents into consideration when choosing actions. If I choose to go vegan, it stands to reason that similarly acting moral agents would also choose that course. If many (but importantly not all) people want to be vegan, then demand for vegan foods goes up. If demand for vegan food goes up, then suppliers make more vegan food and have an incentive to make it cheaper and tastier. If vegan food is cheaper and tastier, than more people who were on the fence about vega...
I consider the lattice to be a regulator as well, but, semantics aside, thank you for the example.