I have such a strong intuitive opposition to the Internal Reaction Drive that I agree with your conclusion that we should update away from any theory which allows it. Then again, perhaps it is impossible to build such a drive for the merely practical reason that any material with a positive or negative index of refraction will absorb enough light to turn the drive into an expensive radiator.
Especially given the recent Nobel prize announcement, I think the most concerning piece of information is that there are cultural forces from within the physics community discouraging people from trying to answer the question at all.
You need abstractions to think and plan at all with limited compute, not just to speak. I would guess that plenty animals which are incapable of speaking also mentally rely on abstractions. For instance, when foraging for apples, I suspect an animal probably has a mental category for apples, and treats them as the same kind of thing rather than completely unrelated configurations of atoms.
The planet Mercury is a pretty good source of material:
Mass: kg (which is about 70% iron)
Radius: m
Volume: m^3
Density: kg/m^3
Orbital radius: m
A spherical shell around the sun at roughly same radius as Mercury's orbit would have a surface area of m^2, and spreading out Mercury's volume over this area gives a thickness of about 1.4 mm. This means Mercury alone provides ample material for collecting all of the Sun's energy via reflecting light – very thin spinning sheets could act as a swarm of orbiting reflectors that focus sunlight onto large power plants or mirrors that direct it to elsewhere in the solar system. Spinning sheets could be made somewhere between 1-100 μm thick, with thicker cables or supports for additional strength, perhaps 1-10 km wide, and navigate using radiation pressure (using cables that bend the sheet, perhaps). Something like or mirrors would be enough to intercept and redirect all of the sun's light.
The gravitational binding energy of Mercury is on the order of J, or on the order of an hour of the Sun's output. This means in theory the time it takes for a new mirror to pay it's own manufacturing energy cost is in principle quite small; if each kg of material from Mercury is enough to make on the order of 1-100 square meters of mirror, then it will pay for itself in somewhere between minutes and hours (there are roughly 10,000 w/m^2 of solar energy at Mercury's orbit, and each kg of material on average requires on the order of J to remove). Only 40-80 doublings are required to consume the whole planet depending on how thick the mirrors are and how much material is used to start the process. Even with many orders of magnitude of overhead to account for inefficiency and heat dissipation, I believe Mercury could be disassembled to cover the entire sun with reflectors on the order of years and perhaps as quickly as months; certainly within decades.
A better way to do the memory overwrite experiment is to prepare a list of what’s in the box to match each of ten possible numbers, then have someone provide a random number while your short term memory doesn’t work and see if you can successfully overwrite the memory that corresponds to that number (as measured by correctly guessing the number much later).
I’m confused. I know that it is like something to be me (this is in some sense the only thing I know for sure). It seems like there rules which shape the things I experience, and some of those rules can be studied (like the laws of physics). We are good enough at understanding some of these rules to predict certain systems with a high degree of accuracy, like how an asteroid will orbit a star or how electrons will be pushed through a wire by a particular voltage in a circuit. But I have no way to know or predict if it is like something to be a fish or GPT-4. I know that physical alterations to my brain seem to affect my experience, so it seems like there is a mapping from physical matter to experiences. I do not know precisely what this mapping is, and this indeed seems like a hard problem. In what sense do you disagree with my framing here?
Oh good catch, I missed that. Thanks!
I am not so sure it will be possible to extract useful work towards solving alignment out of systems we do not already know how to carefully steer. I think that substantial progress on alignment is necessary before we know how to build things that actually want to help us advance the science. Even if we built something tomorrow that was in principle smart enough to do good alignment research, I am concerned we don’t know how to make it actually do that rather than, say, imitate more plausible-sounding but incorrect ideas. The fact that appending silly phrases like “I’ll tip $200” improves the probability of receiving correct code from current LLMs indicates to me that we haven’t succeeded at aligning them to maximally want to produce correct code when they are capable of doing so.
How does Harry know the name “Lucius Malfoy”?
Kudos for releasing a concept of a plan! Some thoughts:
Regarding the first safety case:
I'm more excited about the control safety case, but have a few nitpicks:
I haven't read the last safety case yet, but may have some thoughts on it later. I am most excited about control at the moment, in part due to concerns that interpretability won't advance far enough to suffice for a safety case by the time we develop transformative AI.