I have such a strong intuitive opposition to the Internal Reaction Drive that I agree with your conclusion that we should update away from any theory which allows it. Then again, perhaps it is impossible to build such a drive for the merely practical reason that any material with a positive or negative index of refraction will absorb enough light to turn the drive into an expensive radiator.
Especially given the recent Nobel prize announcement, I think the most concerning piece of information is that there are cultural forces from within the physics community discouraging people from trying to answer the question at all.
You need abstractions to think and plan at all with limited compute, not just to speak. I would guess that plenty animals which are incapable of speaking also mentally rely on abstractions. For instance, when foraging for apples, I suspect an animal probably has a mental category for apples, and treats them as the same kind of thing rather than completely unrelated configurations of atoms.
The planet Mercury is a pretty good source of material:
Mass: kg (which is about 70% iron)
Radius: m
Volume: m^3
Density: kg/m^3
Orbital radius: m
A spherical shell around the sun at roughly same radius as Mercury's orbit would have a surface area of m^2, and spreading out Mercury's volume over this area gives a thickness of about 1.4 mm. This means Mercury alone provides ample material for collecting all of the Sun's energy via reflect...
A better way to do the memory overwrite experiment is to prepare a list of what’s in the box to match each of ten possible numbers, then have someone provide a random number while your short term memory doesn’t work and see if you can successfully overwrite the memory that corresponds to that number (as measured by correctly guessing the number much later).
I’m confused. I know that it is like something to be me (this is in some sense the only thing I know for sure). It seems like there rules which shape the things I experience, and some of those rules can be studied (like the laws of physics). We are good enough at understanding some of these rules to predict certain systems with a high degree of accuracy, like how an asteroid will orbit a star or how electrons will be pushed through a wire by a particular voltage in a circuit. But I have no way to know or predict if it is like something to be a fish or GPT-...
I am not so sure it will be possible to extract useful work towards solving alignment out of systems we do not already know how to carefully steer. I think that substantial progress on alignment is necessary before we know how to build things that actually want to help us advance the science. Even if we built something tomorrow that was in principle smart enough to do good alignment research, I am concerned we don’t know how to make it actually do that rather than, say, imitate more plausible-sounding but incorrect ideas. The fact that appending silly phra...
Please note that the graph of per capita war deaths is on a log scale. The number moves over several orders of magnitude. One could certainly make the case that local spikes were sometimes caused by significant shifts in the offense-defense balance (like tanks and planes making offense easier for a while at the beginning of WWII). These shifts are pushed back to equilibrium over time, but personally I would be pretty unhappy about, say, deaths from pandemics spiking 4 orders of magnitude before returning to equilibrium.
This random Twitter person says that it can't. Disclaimer: haven't actually checked for myself.
https://chat.openai.com/share/36c09b9d-cc2e-4cfd-ab07-6e45fb695bb1
Here is me playing against GPT-4, no vision required. It does just fine at normal tic-tac-toe, and figures out anti-tic-tac-toe with a little bit of extra prompting.
GPT-4 can follow the rules of tic-tac-toe, but it cannot play optimally. In fact it often passes up opportunities for wins. I've spent about an hour trying to get GPT-4 to play optimal tic-tac-toe without any success.
Here's an example of GPT-4 playing sub-optimally: https://chat.openai.com/share/c14a3280-084f-4155-aa57-72279b3ea241
Here's an example of GPT-4 suggesting a bad move for me to play: https://chat.openai.com/share/db84abdb-04fa-41ab-a0c0-542bd4ae6fa1
I suppose what I'm trying to point to is some form of the outer alignment problem. I think we may end up with AIs that are aligned with human organizations like corporations more than individual humans. The reason for this is that corporations or militaries which employ more ruthless AIs will, over time, accrue more power and resources. It's not so much explicit (i.e. violent) competition, but rather the gradual tendency for systems which are power-seeking and resource-maximizing to end up with more power and resources over time. If we allow for the creati...
Yeah. I think a key point that is often overlooked is that even if powerful AI is technically controllable, i.e. we solve inner alignment, that doesn't mean society will handle it safely. I think by default it looks like every company and military is forced to start using a ton of AI agents (or they will be outcompeted by someone else who does). Competition between a bunch of superhuman AIs that are trying to maximize profits or military tech seems really bad for us. We might not lose control all at once, but rather just be gradually outcompeted by machines, where "gradually" might actually be pretty quick. Basically, we die by Moloch.
Yeah, in general, we are pretty compute limited and should stick to good heuristics for most kinds of problems. I do think that most people rely too much on heuristics, so for the average person the useful lesson is "actually stop and think about things once in a while", but I can see how the opposite problem may sometimes arise in this community.
I find it useful to distinguish between epistemic and instrumental rationality. You're talking about instrumental rationality – and it could be instrumentally useful to convince someone of your beliefs, to teach them to think clearly, or to actively mislead them.
Epistemic rationality, on the other hand, means trying to have true beliefs, and in this case it's better to teach someone to fish than to force them to accept your fish.
In the doomsday argument, we are the random runner. If the runner with only 10 people behind him assumed his position was randomly selected, and tried to estimate the total number of runners, he would be very wrong. We could very well be that runner near the back of the race; we weren't randomly selected to be at the back, we just are, and the fact that there are ten people behind us doesn't give us meaningful information about the total number of runners.
Okay, suppose I was born in Teenytown, a little village on the island nation of Nibblenest. The little one-room schoolhouse in Teenytown isn't very advanced, so no one ever teaches me that there are billions of other people living in all the places I've never heard of. Now, I might think to myself, the world must be very small – surely, if there were billions of people living in millions of towns and cities besides Teenytown, it would be very unlikely to be born in Teenytown; therefore, Teenytown must be one of the only villages on Earth.
Clearly, this is a...
I think the claim that we basically understand the universe is misleading. I'm especially unconvinced by your vague explanation of consciousness; I don't think we have anything close to an empirically supported mechanistic model that makes good predictions. I personally have significant uncertainty regarding what kinds of things can have subjective experiences, or why they do.
This also feels like a good opportunity to say that the Doomsday argument has never made much sense to me; it has always felt wrong to me to treat being “me” as a random sample of obs...
Planes would not be required for stratospheric injection of SO2. It could in theory be done much more cheaply with balloons: https://caseyhandmer.wordpress.com/2023/06/06/we-should-not-let-the-earth-overheat/
Exactly, it has always felt wrong to me to treat being “me” as a random sample of observers. I couldn’t be anyone except me. If the future has trillions of humans or no humans, the person which is me will feel the same way in either case. I find the doomsday argument absurd because it treats my perspective as a random sample, which feels like a type error.
Indeed. I think about this type of thing often when I consider the concept of superhuman AI - when I spend hours stuck on a problem with a simple solution or forget something important, it’s not hard to imagine an algorithm much smarter than me that just doesn’t make those mistakes. I think the bar really isn’t that high for improving substantially on human cognition. Our brains have to operate under very strict energy constraints, but I can easily imagine a machine which performs a lot better than me by applying more costly but effective algorithms and us...
We are quite similar! I was also accepted to Harvard REA – exactly one year ago – and was too lazy mentally drained by the application process to apply to MIT after that. I arrived intending to study physics, but I've since realized AI safety is a much more important and exciting problem to work on. Seems like you got there a bit sooner than I did! HAIST is a wonderful community, and also a great resource for finding upskilling and research opportunities.
I've only been here for a semester, so take this with a grain of salt, but I don't think you shou...
Kudos for releasing a concept of a plan! Some thoughts:
Regarding the first safety case:
- The amount of progress in mech interp required to make the first safety case suitable seems overly optimistic to me; I basically think that most of the "limitations" are in fact pretty serious. However, I appreciate that the attempt to include concrete requirements.
- I believe that getting good results from the following experiments might be particularly unrealistic:
- "In order to robustify our evals against sandbagging, we ran an experiment where we steered one or more trut
... (read more)