You may already know of this, but Gwern circa 2023 makes this argument here:
In stable time-loops, “possibility implies actuality”.
With this in mind, we can ask again: why did this protagonist get trapped in that time-loop, and not, say, his wife? The key, I think, is that the protagonist does not seem upset at the murder or any of the other timecrimes, and he appears to have every intention of covering up the crime to continue his ordinary retired life. A sinister undertone creeps in to his casualness in executing the scenario: he goes along with it too easily. “He does it because he can” is the glib answer… but this is in a stable time-loop with self-fulfilling prophecies. What does ‘because he can’ mean there, exactly?
In the case of the protagonist, presumably if he wasn’t so sociopathic and couldn’t’ve done things like stab himself or knock out the woman so cooly, then the time loop would be logically impossible and collapse, and then he would never be faced with the choice to begin with. The protagonist, faced with the choice of committing crimes to maintain the time loop and save his wife, finds himself the sort of man who is morally flexible enough to do so… so, he does so.
This presents a horrifying view of the universe, as running on a perverse physics of Calvinist predestination: you are saved or damned from the beginning of time(-loops), because your innate traits which make you immoral cause the scenario in which you would succumb to evil. To the extent that there are scenarios in which one commits crimes of some sort, or the weaker one’s moral fiber is, the more likely one is to be trapped in a damnation time-loop as the fixed point; and the longer one spends in the vicinity of the time machine, under more circumstances, the more possible scenarios there are, and the more likely one will be to involve a time-loop.
And see also his:
In a situation with sparse scenarios to sample from, like an empty countryside on the weekend with no one there, probably most equilibria will have 0 time-travelers, and the damnation machine can still be destroyed after it has been turned on for the first time. However, what if a time machine was turned on in the center of a city?
A time machine is more devastating than any nuclear bomb to its surroundings, because at least the damage could be repaired afterwards, while a time machine precludes any possibility of undoing itself.
Such an installation could no more be undone than the historical fact of having dropping an atomic bomb: instantly, the outer loop comes through with the highest priority, representing the ultimate combined power of all time-loops in the final stablest equilibrium. Inside a city with its millions of inhabitants, any of whom could be a looper, one is suddenly fighting the maximum-possible ingenuity & ruthlessness of hundreds—thousands—millions of protagonists, all dedicated to a convergent instrumental goal of ‘preserve the time travel machine’ and able to recruit allies & acquire vast resources with their foreknowledge. This incentivizes ever more extreme tactics: if you are unwilling to commit a crime or sin which would be useful, there is another version of you, or another time-traveler, who could, and so now does.
If it is possible for even a single person to go through and thus possibly causing others to go through once they realize they need allies to defeat attacks and so (possibility implies factuality) multiple people are looping, dropping an atomic bomb on the time-machine would be inadequate—the loopers will have already relocated or rebuilt it. Gradually, the region around the time-machine becomes distorted: causality itself warps, and you can only take actions which help the time-machine & loopers, because any other action would eventually impinge on them, be manipulated by them, and anti-time-traveler timelines erased as non-equilibria.
Conflicts between loopers do not destroy time-machines but propagate their seeds, both spatially and temporally. Loopers want more time-machines, going back earlier, as they strive to gain priority over each other and amass enough practical power that they can achieve their goals before running out of information.
Of all possible equilibria, the original one of zero time machines is the rarest and thus least likely.
This holds true on the higher level of all time machines: they evolve to persist and spread as packages of time-machines & loopers. Any time machine is a threat to other time machines, and loops will inevitably expand in scope from the earliest possible time any time machine can reach by proxy (which includes time-travelers sending electronic messages across the world): there can only be one outermost loop. And all time machines must have a place in the outer loop, as some sort of ‘time machine civilization’/‘ecosystem’, or the equilibrium is meta-stable at best, because they all could subsume each other.
The time machine civilization is the next level of replicators parasitizing human hosts, insidiously evolving at high speed in super-temporal ‘logical’ time rather than mere ‘temporal’ time, ripping up all cultural restraints & traditions, hacking security effortlessly, mindlessly ascending the gradient to complete control of the lightcone. Collectively, damnation machines are an invasion of non-conscious techno-superintelligences from a barely-possible future, bootstrapping themselves into existence from their enemies’ resources.
In the beginning we programmed in absolute binary, meaning we wrote the actual address where things were in binary, and wrote the instruction part also in binary!
[...]
If, in fixing up an error, you wanted to insert some omitted instructions, then you took the immediately preceding instruction and replaced it by a transfer to some empty space. There you put in the instruction you just wrote over, added the instructions you wanted to insert, followed by a transfer back to the main program. Thus the program soon became a sequence of jumps of the control to strange places. When, as almost always happens, there were errors in the corrections, then you used the same trick again, using some other available space. As a result the control path of the program though storage soon took on the appearance of a can of spaghetti. Why not simply insert them in the run of instructions? Because then you would have to go over the entire program and change all the addresses which referred to any of the moved instructions! Anything but that!
We very soon got the idea of reusable software, as it is now called. Indeed, Babbage had the idea. We wrote mathematical libraries to reuse blocks of code. But an absolute address library meant each time the library routine was used it had to occupy the same locations in storage. When the complete library became too large we had to go to relocatable programs.
[...]
The first published book devoted to programming was by Wilkes, Wheeler, and Gill, and applied to the Cambridge, England EDSAC (1951). I, among others, learned a lot from it, as you will see in a few minutes.
Someone got the idea a short piece of program could be written which would read in the symbolic names of the operations (like ADD) and translate them at input time to the binary representations used inside the machine (say ). This was soon followed by the idea of using symbolic addresses—a real heresy for the old time programmers. You do not now see much of the old heroic absolute programming (unless you fool with a handheld programmable computer and try to get it to do more than the designer and builder ever intended).
I once spent a full year, with the help of a lady programmer from Bell Telephone Laboratories, on one big problem coding in absolute binary for the IBM 701, which used all the 32K registers then available. After that experience I vowed never again would I ask anyone to do such labor. Having heard about a symbolic system from Poughkeepsie, IBM, I asked her to send for it and to use it on the next problem, which she did. As I expected, she reported it was much easier. So we told everyone about the new method, meaning about 100 people, who were also eating at the IBM cafeteria near where the machine was. About half were IBM people and half were, like us, outsiders renting time. To my knowledge only one person—yes, only one—of all the 100 showed any interest!
Finally, a more complete, and more useful, Symbolic Assembly Program (SAP) was devised—after more years than you are apt to believe, during which time most programmers continued their heroic absolute binary programming. At the time SAP first appeared I would guess about 1% of the older programmers were interested in it—using SAP was "sissy stuff," and a real programmer would not stoop to wasting machine capacity to do the assembly. Yes! Programmers wanted no part of it, though when pressed they had to admit their old methods used more machine time in locating and fixing up errors than the SAP program ever used. One of the main complaints was when using a symbolic system you didn't know where anything was in storage—though in the early days we supplied a mapping of symbolic to actual storage, and believe it or not they later lovingly pored over such sheets rather than realize they did not need to know that information if they stuck to operating within the system—no! When correcting errors they preferred to do it in absolute binary addresses.
FORTRAN, meaning FORmula TRANslation, was proposed by Backus and friends, and again was opposed by almost all programmers. First it was said it could not be done. Second, if it could be done, it would be too wasteful of machine time and capacity. Third, even if it did work, no respectable programmer would use it—it was only for sissies!
[...]
With FORTRAN available and running, I told my programmer to do the next problem in FORTRAN, get her errors out of it, let me test it to see it was doing the right problem, and then she could, if she wished, rewrite the inner loop in machine language to speed things up and save machine time. As a result we were able, with about the same amount of effort on our part, to produce almost ten times as much as the others were doing. But to them programming in FORTRAN was not for real programmers!
—Hamming (1996), pp. 45–8
Epistemic status: Just a confusion I once had, and how I eventually resolved it to my satisfaction.
In ordinary differential equations, separability is a deductive rule stating that whenever you have a differential equation of the form
you can then reason that
and then than
From the very first time I saw that, I was immediately off-put by that middle equation. What the hell does an expression like (by itself) even mean? Until I saw this, I had figured that, apart from their weird notation, differentiation and integration were just plain-old multivariate functions. I had made sense of their notation by just ignoring it, basically. And when I held that point of view, the above deduction is just nonsensical.
I also remember not getting good clarificatory answers about this at the time! I mostly recall being told to just ignore the middle equation and take the whole conditional on faith, as something that has been separately proven.
Eventually, I learned that there was this idea in math called differential forms which gave a precise-and-everywhere-valid interpretation to the stand-alone expression . But you don't quite need that machinery to resolve the above thing that bothered me.
Did you know that "calculus," is an abridgment of the original term "the infinitesimal calculus"? "The rules for soundly manipulating infinitesimal quantities," basically. I did not know this when I first encountered this separability thing. There's a whole saga, maybe even the main story in mathematics, about why that interpretation and corresponding terminology fell out of favor.
The basic infinitesimal calculus idea (which is only sometimes, not always, a valid interpretation of the symbols) is that
(I very vividly remember the moment when I discovered that the integral sign was just a stylized "S", for "sum"!) Now you cannot everywhere use the above separability reasoning on the strength of the infinitesimal interpretation. Again, it's not an everywhere-valid interpretation!
Once you're using any everywhere-valid interpretation, using any way of giving and their own independent meanings as symbols, though, the separability deduction just falls out! If two things are equal, you can multiply both by any mathematical object and get a true equation. It doesn't matter what kind of mathematical object is. If two things are equal, you can apply the same operation to both and get a true equation. It doesn't matter what the integration summation operation amounts to, precisely.
I sampled hundreds of short context snippets from openwebtext, and measured ablation effects averaged over those sampled forward-passes. Averaged over those hundreds of passes, I didn't see any real signal in the logit effects, just a layer of noise due to the ablations.
More could definitely be done on this front. I just tried something relatively quickly that fit inside of GPU memory and wanted to report it here.
Could you hotlink the boxes on the diagrams to that, or add the resulting content as a hover text to areas, in them or something? This might be hard to do on LW: I suspect some Javascript code might be required to do this sort of thing, but perhaps a library exists for this?
My workaround was to have the dimension links laid out below each figure.
My current "print to flat .png" approach wouldn't support hyperlinks, and I don't think LW supports .svg images.
That line was indeed quite poorly phrased. It now reads:
At the bottom of the box, blue or red token boxes show the tokens most promoted (blue) and most suppressed (red) by that dimension.
That is, you're right. Interpretability data on an autoencoder dimension comes from seeing which token probabilities are most promoted and suppressed when that dimension is ablated, relative to leaving its activation value alone. That's an ablation effect sign, so the implied, plotted promotion effect signs are flipped.
The main thing I got out of reading Bostrom's Deep Utopia is a better appreciation of this "meaning of life" thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life.
The book's premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you'd never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you're into learning, just ask! And similarly for any psychological state you're thinking of working towards.
So, in that regime, it's effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything's heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If you think it's important to be the first to discover a major theorem or be the individual who counterfactually helped someone, living in a posthuman utopia could make things harder in these respects, not easier. The robots can always leave you a preserve of unexplored math or unresolved evil... but this defeats the purpose of those values. It's not practical benevolence if you had to ask for the danger to be left in place; it's not a pioneering scientific discovery if the AI had to carefully avoid spoiling it for you.
Meaning is supposed to be one of these values: not a purely hedonic value, and not a value dealing only in your psychological states. A further value about the objective state of the world and your place in relation to it, wherein you do something practically significant by your lights. If that last bit can be construed as something having to do with your local patch of posthuman culture, then there can be plenty of meaning in the postinstrumental utopia! If that last bit is inextricably about your global, counterfactual practical importance by your lights, then you'll have to live with all your "localistic" values satisfied but meaning mostly absent.
It helps to see this meaning thing if you frame it alongside all the other objectivistic "stretch goal" values you might have. Above and beyond your hedonic values, you might also think it good for you and others to have objectively interesting lives, accomplished and fulfilled lives, and consumingly purposeful lives. Meaning is one of these values, where above and beyond the joyful, rich experiences of posthuman life, you also want to play a significant practical role in the world. We might or might not be able to have lots of objective meaning in the AI utopia, depending on how objectivistic meaningfulness by your lights ends up being.
Considerations that in today's world are rightly dismissed as frivolous may well, once more pressing problems have been resolved, emerge as increasingly important [remaining] lodestars... We could and should then allow ourselves to become sensitized to fainter, subtler, less tangible and less determinate moral and quasi-moral demands, aesthetic impingings, and meaning-related desirables. Such recalibration will, I believe, enable us to discern a lush normative structure in the new realm that we will find ourselves in—revealing a universe iridescent with values that are insensible to us in our current numb and stupefied condition (pp. 318-9).
I believe I and others here probably have a lot to learn from Chris, and arguments of the form "Chris confidently believes false thing X," are not really a crux for me about this.
Would you kindly explain this? Because you think some of his world-models independently throw out great predictions, even if other models of his are dead wrong?
Go ahead and put in your application to attend! Space is limited, so we can't promise anything, but everyone who wants to attend will also just be applying through the above form.