The main thing I got out of reading Bostrom's Deep Utopia is a better appreciation of this "meaning of life" thing. I had never really understood what people meant by this, and always just rounded it off to people using lofty words for their given projects in life.
The book's premise is that, after the aligned singularity, the robots will not just be better at doing all your work but also be better at doing all your leisure for you. E.g., you'd never study for fun in posthuman utopia, because you could instead just ask the local benevolent god to painlessly, seamlessly put all that wisdom in your head. In that regime, studying with books and problems for the purpose of learning and accomplishment is just masochism. If you're into learning, just ask! And similarly for any psychological state you're thinking of working towards.
So, in that regime, it's effortless to get a hedonically optimal world, without any unendorsed suffering and with all the happiness anyone could want. Those things can just be put into everyone and everything's heads directly—again, by the local benevolent-god authority. The only challenging values to satisfy are those that deal with being practically useful. If...
Crucially: notice if your environment is suppressing you feeling your actual morals, leaving you only able to use your model of your morals.
That's a good line, captures a lot of what I often feel is happening when talking to people about utilitarianism and a bunch of adjacent stuff (people replacing their morals with their models of their morals)
...The human brain does not start out as an efficient reasoning machine, plausible or deductive. This is something which we require years to learn, and a person who is an expert in one field of knowledge may do only rather poor plausible reasoning in another. What is happening in the brain during this learning process?
Education could be defined as the process of becoming aware of more and more propositions, and of more and more logical relationships between them. Then it seems natural to conjecture that a small child reasons on a lattice of very open structure: large parts of it are not interconnected at all. For example, the association of historical events with a time sequence is not automatic; the writer has had the experience of seeing a child, who knew about ancient Egypt and had studied pictures of the treasures from the tomb of Tutankhamen, nevertheless coming home from school with a puzzled expression and asking: ‘Was Abraham Lincoln the first person?’
It had been explained to him that the Egyptian artifacts were over 3000 years old, and that Abraham Lincoln was alive 120 years ago; but the meaning of those statements had not registered in his mind. This makes us wonder whether
Minor spoilers for planecrash (Book 3).
...Keltham was supposed to start by telling them all to use their presumably-Civilization-trained skill of 'perspective-taking-of-ignorance' to envision a hypothetical world where nothing resembling Coordination had started to happen yet. Since, after all, you wouldn't want your thoughts about the best possible forms of Civilization to 'cognitively-anchor' on what already existed.
You can imagine starting in a world where all the same stuff and technology from present Civilization exists, since the question faced is what form of Governance is best-suited to a world like that one. Alternatively, imagine an alternative form of the exercise involving people fresh-born into a fresh world where nothing has yet been built, and everybody's just wandering around over a grassy plain.
Either way, you should assume that everybody knows all about decision theory and cooperation-defection dilemmas. The question being asked is not 'What form of Governance would we invent if we were stupid?'
Civilization could then begin - maybe it wouldn't actually happen exactly that way, but it is nonetheless said as though in stori
A decent handle for rationalism is 'apolitical consequentialism.'
'Apolitical' here means avoiding playing the whole status game of signaling fealty to a political tribe and winning/losing status as that political tribe wins/loses status competitions. 'Consequentialism' means getting more of what you want, whatever that is.
Minor spoilers from mad investor chaos and the woman of asmodeus (planecrash Book 1) and Peter Watt's Echopraxia.
..."Suppose everybody in a dath ilani city woke up one day with the knowledge mysteriously inserted into their heads, that their city had a pharaoh who was entitled to order random women off the street into his - cuddling chambers? - whether they liked that or not. Suppose that they had the false sense that things had always been like this for decades. It wouldn't even take until whenever the pharaoh first ordered a woman, for her to go "Wait why am I obeying this order when I'd rather not obey it?" Somebody would be thinking about city politics first thing when they woke up in the morning and they'd go "Wait why we do we have a pharaoh in the first place" and within an hour, not only would they not have a pharaoh, they'd have deduced the existence of the memory modification because their previous history would have made no sense, and then the problem would escalate to Exception Handling and half the Keepers on the planet would arrive to figure out what kind of alien invasion was going on. Is the source of my confusion - at all clear here?"
"You think
We've all met people who are acting as if "Acquire Money" is a terminal goal, never noticing that money is almost entirely instrumental in nature. When you ask them "but what would you do if money was no issue and you had a lot of time", all you get is a blank stare.
Even the LessWrong Wiki entry on terminal values describes a college student for which university is instrumental, and getting a job is terminal. This seems like a clear-cut case of a Lost Purpose: a job seems clearly instrumental. And yet, we've all met people who act as if "Have a Job" is a terminal value, and who then seem aimless and undirected after finding employment …
You can argue that Acquire Money and Have a Job aren't "really" terminal goals, to which I counter that many people don't know their ass from their elbow when it comes to their own goals.
--Nate Soares, "Dark Arts of Rationality"
Why does politics strike rationalists as so strangely shaped? Why does rationalism come across as aggressively apolitical to smart non-rationalists?
Part of the answer: Politics is absolutely rife with people mixing their ends with their means and vice versa. It's pants-on-head confused, from a rationalist perspective, to be ul...
Yudkowsky has sometimes used the phrase "genre savvy" to mean "knowing all the tropes of reality."
For example, we live in a world where academia falls victim to publishing incentives/Goodhearting, and so academic journals fall short of what people with different incentives would be capable of producing. You'd be failing to be genre savvy if you expected that when a serious problem like AGI alignment rolled around, academia would suddenly get its act together with a relatively small amount of prodding/effort. Genre savvy actors in our world know what academia is like, and predict that academia will continue to do its thing in the future as well.
Genre savviness is the same kind of thing as hard-to-communicate-but-empirically-validated expert intuitions. When domain experts have some feel for what projects might pan out and what projects certainly won't but struggle to explain their reasoning in depth, the most they might be able to do is claim that that project is just incompatible with the tropes of their corner of reality, and point to some other cases.
“What is the world trying to tell you?”
I've found that this prompt helps me think clearly about the evidence shed by the generator of my observations.
There's a rationality-improving internal ping I use on myself, which goes, "what do I expect to actually happen, for real?"
This ping moves my brain from a mode where it's playing with ideas in a way detached from the inferred genre of reality, over to a mode where I'm actually confident enough to bet about some outcomes. The latter mode leans heavily on my priors about reality, and, unlike the former mode, looks askance at significantly considering long, conjunctive, tenuous possible worlds.
God dammit people, "cringe" and "based" aren't truth values! "Progressive" is not a truth value! Say true things!
Having been there twice, I've decided that the Lightcone offices are my favorite place in the world. They're certainly the most rationalist-shaped space I've ever been in.
Academic philosophers are better than average at evaluating object-level arguments for some claim. They don't seem to be very good at thinking about what rationalization in search implies about the arguments that come up. Compared to academic philosophers, rationalists strike me as especially appreciating filtered evidence and its significance to your world model.
If you find an argument for a claim easily, then even if that argument is strong, this (depending on some other things) implies that similarly strong arguments on the other side may turn up with n...
Modest spoilers for planecrash (Book 9 -- null action act II).
...Nex and Geb had each INT 30 by the end of their mutual war. They didn't solve the puzzle of Azlant's IOUN stones... partially because they did not find and prioritize enough diamonds to also gain Wisdom 27. And partially because there is more to thinkoomph than Intelligence and Wisdom and Splendour, such as Golarion's spells readily do enhance; there is a spark to inventing notions like probability theory or computation or logical decision theory from scratch, that is not directly me
Epistemic status: politics, known mindkiller; not very serious or considered.
People seem to have a God-shaped hole in their psyche: just as people banded around religious tribal affiliations, they now, in the contemporary West, band together around political tribal affiliations. Intertribal conflict can be, at its worst, violent, on top of mindkilling. Religious persecution in the UK was one of the instigating causes of British settlers migrating to the American colonies; religious conflict in Europe generally was severe.
In the US, the 1st Amendment legall...
If you take each of the digits of 153, cube them, and then sum those cubes, you get 153:
1 + 125 + 27 = 153.
For many naturals, if you iteratively apply this function, you'll return to the 153 fixed point. Start with, say, 298:
8 + 729 + 512 = 1,249
1 + 8 + 64 + 729 = 802
512 + 0 + 8 = 516
125 + 1 + 216 = 342
27 + 64 + 8 = 99
729 + 729 = 1,458
1 + 64 + 125 + 512 = 702
343 + 0 + 8 = 351
27 + 125 + 1 = 153
1 + 125 + 27 = 153
1 + 125 + 27 = 153...
These nine fixed points or cycles occur with the following frequencies (1 <= n <= 10e9):
33.3% : (153 → )
29.5% : (371 → )
17.8% : (370 → )
5.0% : (55 → 250 → 133 → )
4.1% : (160 → 217 -> 352 → )
3.8% : (407 → )
3.1% : (919 → 1459 → )
1.8% : (1 → )
1.5% : (136 → 244 → )
No other fixed points or cycles are possible (except 0 → 0, which isn't reachable from any nonzero input) since any number with more than four digits will have fewer digits in the sum of its cubed digits.
A model I picked up from Eric Schwitzgebel.
The humanities used to be highest-status in the intellectual world!
But then, scientists quite visibly exploded fission weapons and put someone on the moon. It's easy to coordinate to ignore some unwelcome evidence, but not evidence that blatant. So, begrudgingly, science has been steadily accorded more and more status, from the postwar period on.
When the sanity waterline is so low, it's easy to develop a potent sense of misanthropy.
Bryan Caplan's writing about many people hating stupid people really affected me on this point. Don't hate, or even resent, stupid people; trade with them! This is a straightforward consequence of Ricardo's comparative advantage theorem. Population averages are overrated; what matters is whether the individual interactions between agents in a population are positive-sum, not where those individual agents fall relative to the population average.
"Ignorant people do not exist."
It's really easy to spend a lot of cognitive cycles churning through bad, misleading ideas generated by the hopelessly confused. Don't do that!
The argument that being more knowledgeable leaves you strictly better off than being ignorant does relies you simply ignoring bad ideas when you spend your cognitive cycles searching for improvements on your working plans. Sometimes, you'll need to actually exercise this "simply ignore it" skill. You'll end up needing to do so more and more, to approach bounded instrumental rationality, the more inadequate civilization around you is and the lower its sanity waterline.
You sometimes misspeak... and you sometimes misthink. That is, sometimes your cognitive algorithm a word, and the thought that seemed so unimpeachably obvious in your head... is nevertheless false on a second glance.
Your brain is a messy probabilistic system, so you shouldn't expect its cognitive state to ever perfectly track the state of a distant entity.
Policy experiments I might care about if we weren't all due to die in 7 years:
A shard is a contextually activated behavior-steering computation. Think of it as a circuit of neurons in your brain that is reinforced by the subcortex, gaining more staying power when positively reinforced and withering away in the face of negative reinforcement. In fact, whatever modulates shard strength in this way is reinforcement/reward. Shards are born when a computation that is currently steering steers into some reinforcement. So shards can only accrete around the concepts currently in a system's world model (presumably, the world model is shared ...
My favorite books, ranked!
Non-fiction:
1. Rationality, Eliezer Yudkowsky
2. Superintelligence, Nick Bostrom
3. The Age of Em, Robin Hanson
Fiction:
1. Permutation City, Greg Egan
2. Blindsight, Peter Watts
3. A Deepness in the Sky, Vernor Vinge
4. Ra, Sam Hughes/qntm
Epistemic status: Half-baked thought.
Say you wanted to formalize the concepts of "inside and outside views" to some degree. You might say that your inside view is a Bayes net or joint conditional probability distribution—this mathematical object formalizes your prior.
Unlike your inside view, your outside view consists of forms of deferring to outside experts. The Bayes nets that inform their thinking are sealed away, and you can't inspect these. You can ask outside experts to explain their arguments, but there's an interaction cost associated with inspecti...
Because your utility function is your utility function, the one true political ideology is clearly Extrapolated Volitionism.
Extrapolated Volitionist institutions are all characteristically "meta": they take as input what you currently want and then optimize for the outcomes a more epistemically idealized you would want, after more reflection and/or study.
Institutions that merely optimize for what you currently want the way you would with an idealized world-model are old hat by comparison!
...Bogus nondifferentiable functions
The case most often cited as an example of a nondifferentiable function is derived from a sequence , each of which is a string of isosceles right triangles whose hypotenuses lie on the real axis and have length . As , the triangles shrink to zero size. For any finite , the slope of is almost everywhere. Then what happens as ? The limit is often cited carelessly as a nondifferentiable function. Now it is clear that the limit of the derivativ
Only make choices that you would not make in reverse, if things were the other way around. Drop out of school if and only if you wouldn't enroll in school from out of the workforce. Continue school if and only if you'd switch over from work to that level of schooling.
Flitting back and forth between both possible worlds can make you less cagey about doing what's overdetermined by your world model + utility function already. It's also part of the exciting rationalist journey of acausally cooperating with your selves in other possible worlds.
...Science fiction books have to tell interesting stories, and interesting stories are about humans or human-like entities. We can enjoy stories about aliens or robots as long as those aliens and robots are still approximately human-sized, human-shaped, human-intelligence, and doing human-type things. A Star Wars in which all of the X-Wings were combat drones wouldn’t have done anything for us. So when I accuse something of being science-fiction-ish, I mean bending over backwards – and ignoring the evidence – in order to give basically human-shaped beings a c
Spoilers for planecrash (Book 2).
...Keltham will now, striding back and forth and rather widely gesturing, hold forth upon the central principle of all dath ilani project management, the ability to identify who is responsible for something. If there is not one person responsible for something, it means nobody is responsible for it. This is the proverb of dath ilani management. Are t
...What would it mean for a society to have real intellectual integrity? For one, people would be expected to follow their stated beliefs to wherever they led. Unprincipled exceptions and an inability or unwillingness to correlate beliefs among different domains would be subject to social sanction. Valid attempts to persuade would be expected to be based on solid argumentation, meaning that what passes for typical salesmanship nowadays would be considered a grave affront. Probably something along the lines of punching someone
You can usually save a lot of time by skimming texts or just reading pieces of them. But reading a work all the way through uniquely lets you make negative existential claims about its content: only now can you authoritatively say that the work never mentions something.
Past historical experience and brainstorming about human social orders probably barely scratches the possibility space. If the CEV were to weigh in on possible posthuman social orders,[1] optimizing in part for how cool that social order is, I'd bet what it describes blows what we've seen out of the water in terms of cool factor.
(Presumably posthumans will end up reflectively endorsing interactions with one another of some description.)
One important idea I've picked up from reading Zvi is that, in communication, it's important to buy out the status cost imposed by your claims.
If you're fielding a theory of the world that, as a side effect, dunks on your interlocutor and diminishes their social status, you can work to get that person to think in terms of Bayesian epistemology and not decision theory if you make sure you aren't hurting their social image. You have to put in the unreasonable-feeling work of framing all your claims such that their social status is preserved or fairly increas...
I regret to inform you, you are an em inside an inconsistent simulated world. By this, I mean: your world is a slapdash thing put together out of off-the-shelf assets in the near future (presumably right before a singularity eats that simulator Earth).
Your world doesn't bother emulating far-away events in great detail, and indeed, may be messing up even things you can closely observe. Your simulators are probably not tampering with your thoughts, though even that is something worth considering carefully.
What are the flaws you...
When another article of equal argumentative caliber could have just as easily been written for the negation of a claim, that writeup is no evidence for its claim.
...The explicit definition of an ordered pair is frequently relegated to pathological set theory...
It is easy to locate the source of the mistrust and suspicion that many mathematicians feel toward the explicit definition of ordered pair given above. The trouble is not that there is anything wrong or anything missing; the relevant properties of the concept we have defined are all correct (that is, in accord with the demands of intuition) and all the correct properties are present. The trouble is that the concept has some irreleva
...Now, whatever may assert, the fact that can be deduced from the axioms cannot prove that there is no contradiction in them, since, if there were a contradiction, could certainly be deduced from them!
This is the essence of the Gödel theorem, as it pertains to our problems. As noted by Fisher (1956), it shows us the intuitive reason why Gödel’s result is true. We do not suppose that any logician would accept Fisher’s simple argument as a proof of the full Gödel theorem; yet for most of us it is more convincing than Gödel’s
What're the odds that we're anywhere close to optimal in any theoretical domain? Where are our current models basically completed, boundedly optimal representations of some part of the universe?
The arguments for theoretical completion are stronger for some domains than others, but in general the odds that we have the best model in any domain are pretty poor, and are outright abysmal in the mindkilling domains.
Is the concept of "duty" the fuzzy shadow cast by the simple mathematical structure of 'corrigibility'?
It's only modestly difficult to train biological general intelligences to defer to even potentially dumber agents. We call these deferential agents "dutybound" -- the sergeants who carry out the lieutenant's direct orders, even when they think they know better; the bureaucrats who never take local opportunities to get rich at the expense of their bureau, even when their higher-ups won't notice; the employees who work hard in the absence of effective overs...
Minor spoilers for planecrash (Book 3).
...So! On a few moments' 'first-reflection', it seems to Keltham that estimating the probability of Civilization being run by a Dark Conspiracy boils down to (1) the question of whether Civilization's apparently huge efforts to build anti-Dark-Conspiracy citizens constitute sincere work that makes the Dark Conspiracy's life harder, or fake work designed to only look like that; and (2) the prior probability that the Keepers and Governance would have arrived on the scene already corrupted, during the last major reorg
Non-spoiler quote from planecrash (Book 3).
Nonconformity is something trained in dath ilan and we could not be Law-shaped without that. If you're conforming to what you were taught, to what other people seem to believe, to what other people seem to want you to believe, to what you think everyone believes, you're not conforming to the Law.
--Eliezer, planecrash
...A great symbolic moment for the Enlightenment, and for its project of freeing humanity from needless terrors, occurred in 1752 in Philadelphia. During a thunderstorm, Benjamin Franklin flew a kite with a pointed wire at the end and succeeded in drawing electric sparks from a cloud. He thus proved that lightning was an electrical phenomenon and made possible the invention of the lightning-rod, which, mounted on a high building, diverted the lightning and drew it harmlessly to the ground by means of a wire. Humanity no longer needed to fear fire from heaven.
"You don't need to follow anybody! You've got to think for yourselves. You're all individuals!"
"Yes, we're all individuals!"
…
"You've all got to work it out for yourselves!"
"Yes! We've got to work it out for ourselves!"
"Exactly!"
"Tell us more!"
Building your own world model is hard work. It can be good intellectual fun, sometimes, but it's often more fun to just plug into the crowd around you and borrow their collective world model for your decision making. Why risk embarrassing yourself going off and doing weird things on your ...
In another world, in which people hold utterly alien values, I would be thrilled to find a rationalist movement with similar infrastructure and memes. If rationalism/Bayescraft as we know it is on to something about instrumental reasoning, then we should see that kind of instrumental reasoning in effective people with alien values.
Agents that explicitly represent their utility function are potentially vulnerable to sign flips.
What sorts of AI designs could not be made to pursue a flipped utility function via perturbation in one spot? One quick guess: an AI that represents its utility function in several places and uses all of those representations to do error correction, only pursuing the error corrected utility function.
A huge range of utility functions should care about alignment! It's in the interest of just about everyone to survive AGI.
I'm going to worry less about hammering out value disagreement with people in the here and now, and push this argument on them instead. We'll hammer out our value disagreements in our CEV, and in our future (should we save it).
One of the things that rationalism has noticeably done for me (that I see very sharply when I look at high-verbal-ability, non-rationalist peers) is that it's given me the ability to perform socially unorthodox actions on reflection. People generally have mental walls that preclude ever actually doing socially weird things. If someone's goals would be best served by doing something socially unorthodox, like, e.g., signing up for cryonics or dropping out of a degree), they will usually rationalize that option away in order to stay on script. So for th...
Two moments of growing in mathematical maturity I remember vividly:
...2. The anchor of a major news network donates lots of money to organizations fighting against gay marriage, and in his spare time he writes editorials arguing that homosexuals are weakening the moral fabric of the country. The news network decides they disagree with this kind of behavior and fire the anchor.
a) This is acceptable; the news network is acting within their rights and according to their principles
b) This is outrageous; people should be judged on the quality of their work and not their political beliefs…
12. The principal of a private school is a
What sequence of characters could I possibly, actually type out into a computer that would appreciably reduce the probability that everything dies?
Framed like this, writing to save the world sounds impossibly hard! Almost everything written has no appreciable effect on our world's AI trajectory. I'm sure the "savior sequence" exists mathematically, but finding it is a whole different ballgame.
Don't translate your values into just a loss function. Rather, translate them into a loss function and all the rest of a training story. Use all the tools at your disposal in your impossible task; don't tie one hand behind your back by assuming the loss function is your only lever over the AGI's learned values.
"Calling babble and prune the True Name of text generation is like calling bogosort the True Name of search."
...In the 1920s when and CL began, logicians did not automatically think of functions as sets of ordered pairs, with domain and range given, as mathematicians are trained to do today. Throughout mathematical history, right through to computer science, there has run another concept of function, less precise at first but strongly influential always; that of a function as an operation-process (in some sense) which may be applied to certain objects to produce other objects. Such a process can be defined by giving a set of rules describing how it acts
Complex analysis is the study of functions of a complex variable, i.e., functions where and lie in . Complex analysis is the good twin and real analysis the evil one: beautiful formulas and elegant theorems seem to blossom spontaneously in the complex domain, while toil and pathology rule in the reals. Nevertheless, complex analysis relies more on real analysis than the other way around.
--Pugh, Real Mathematical Analysis (p. 28)
Switching costs between different kinds of work can be significant. Give yourself permission to focus entirely on one kind of work per Schelling unit of time (per day), if that would help. Don't spend cognitive cycles feeling guilty about letting some projects sit on the backburner; the point is to get where you're going as quickly as possible, not to look like you're juggling a lot of projects at once.
This can be hard, because there's a conventional social expectation that you'll juggle a lot of projects simultaneously, maybe because that's more legible t...
Social niceties and professionalism act as a kind of 'communications handshake' in ordinary society -- maybe because they're still a credible correlate of having your act together enough to be worth considering your outputs in the first place?
Large single markets are (pretty good) consequentialist engines. Run one of these for a while, and you can expect significantly improving outcomes inside of that bloc, by the lights of the entities participating in that single market.
Reflexively check both sides of the proposed probability of an event:
"What do I think about P(DOOM) = 81%?"
and
"What do I think about P(~DOOM) = 19%?"
This can often elicit feedback from parts of you that would stay silent if you only considered one way of stating the probability in question.
I've noticed that part of me likes to dedicate disproportionate cognitive cycles to the question: "If you surgically excised all powerful AI from the world, what political policies would be best to decree, by your own lights?"
The thing is, we live in a world with looming powerful AI. It's at least not consequentialist to spend a bunch of cognitive cycles honing your political views for a world we're not in. I further notice that my default justification for thinking about sans-AI politics a lot is consequentialist... so something's up here. I think some pa...
Fancy epistemic tools won't override the basics of good epistemics:
You are embedded in a 3D spatial world, progressing in a time dimension. You want to get better at predicting events in advance, so you want to find the underlying generator for this 3D world's events. This means that you're rooting around in math space, trying to find the mathematical object that your observational trajectory is embedded in.
Some observations of yours are differentially more likely in some math objects than in others, and so it's more likely that your world is the former ma...
Minor spoilers for planecrash (Book 3.1).
..."Does the distinction between understanding and improving correspond to the distinction between the Law of Probability and the Law of Utility? It sounds like it should."
"Sensible question, but no, not exactly. Probability is something like a separable core that lies at the heart of Probable Utility. The process of updating our beliefs, once we have the evidence, is something that in principle doesn't depend at all on what we want - the way reality is is something defined independently of anyth
Minor spoilers for planecrash (Book 1) and the dath-ilani-verse generally.
When people write novels about aliens attacking dath ilan and trying to kill all humans everywhere, the most common rationale for why they'd do that is that they want our resources and don't otherwise care who's using them, but, if you want the aliens to have a sympathetic reason, the most common reason is that they're worried a human might break an oath again at some point, or spawn the kind of society that betrays the alien hypercivilization in the future.
--Eliezer, planecrash
What is rationalism about?
Rationalism is about the real world. It may or may not strike you as an especially internally consistent, philosophically interesting worldview -- this is not what rationality is about. Rationality is about seeing things happen in the real world and then updating your understanding of the world when those things you see surprise you so that they wouldn't surprise you again.
Why care about predicting things in the world well?
Almost no matter what you ultimately care about, being able to predict ahead of time what's going to happen next will make you better at planning for your goal.
...Gebron and Eleazar define kabbalah as “hidden unity made manifest through patterns of symbols”, and this certainly fits the bill. There is a hidden unity between the structures of natural history, human history, American history, Biblical history, etc: at an important transition point in each, the symbols MSS make an appearance and lead to the imposition of new laws. Anyone who dismisses this as coincidence will soon find the coincidences adding up to an implausible level.
The kabbalistic perspective is that nothing is a coincidence. We believe that the uni
An implication of AI risk is that we, right now, stand at the fulcrum of human history.
Lots of historical people also claimed that they stood at that unique point in history … and were just wrong about it. But my world model also makes that self-important implication (in a specific form), and the meta-level argument for epistemic modesty isn't enough to nudge me off of the fulcrum-of-history view.
If you buy that, it's our overriding imperative to do what we can about it, right now. If we miss this one, ~all of future value evaporates.
For me, the implication of standing at the fulcrum of human history is to…read a lot of textbooks and think about hairy computer science problems.
That seems an odd enough conclusion to make it quite distinct from most other people in human history.
If the conclusion were "go over to those people, hit them on the head with a big rock, and take their women & children as slaves" or "acquire a lot of power", I'd be way more careful.
There exist both merely clever and effectively smarter people.
Merely clever people are good with words and good at rapidly assimilating complex instructions and ideas, but don't seem to maintain and update an explicit world-model, an explicit best current theory-of-everything. The feeling I get watching these people respond to topics and questions is that they respond reflexively, either (1) raising related topics and ideas they've encountered as something similar comes up, or (2) expressing their gut reactions to the topic or idea, or expressing the gut r...
In the game of chicken, an agent can do better by being the first to precommit to never swerve (say, by conspicuously tossing the steering wheel out of the window). So long as the other agent was slower on the trigger, and sees the first agent's precommitment being credibly made, the first agent will climb up to his best outcome! A smart (and quick) agent can thus shunt that car crash out of his actual future and into some counterfactual future such that the counterfactual crash's shadow favorably influences the way events actually unfold.
A deceptively ali...
...In the beginning God created four dimensions. They were all alike and indistinguishable from one another. And then God embedded atoms of energy (photons, leptons, etc.) in the four dimensions. By virtue of their energy, these atoms moved through the four dimensions at the speed of light, the only spacetime speed. Thus, as perceived by any one of these atoms, space contracted in, and only in, the direction of that particular atom's motion. As the atoms moved at the speed of light, space contracted so much in the direction of the atom's motion that the dimen
You can think of chain-of-thought interpretability as the combination of process-based methods with adversarial training.
When you supervised-train an ML model on an i.i.d. dataset that doesn't contain any agent modeling problems, you never strongly incentivize the emergence of mesa-optimizers. You do weakly incentivize the emergence of mesa-optimizers, because mesa-optimizers are generally capable algorithms that might outperform brittle bundles of rote heuristics on many simple tasks.
When you train a model in a path-dependent setting, you do strongly incentivize mesa-optimization. This is because algorithms trained in a path-dependent setting have the opportunity to defend ...
Unreasonably effective rationality-improving technique:
Spend an hour and a half refactoring your standing political views, by temporarily rolling those political views back to a childhood state from before your first encounter with highly communicable and adaptive memeplexes. Query your then-values, and reason instrumentally from the values you introspect. Finally, take or leave the new views you generate.
If your current political views are well supported, then they should regenerate under this procedure. But if you've mostly been recycling cached thoughts...
The unlovely neologism "agenty" means strategic.
"Agenty" might carry less connotational baggage in exchange for its unsightliness, however. Just like "rational" is understood by a lot of people to mean, in part, stoical, "strategic" might mean manipulative to a lot of people.
Because of deception, we don't know how to put a given utility function into a smart agent that has grokked the overall picture of its training environment. Once training finds a smart-enough agent, the model's utility functions ceases to be malleable to us. This suggests that powerful greedy search will find agents with essentially random utility functions.
But, evolution managed to push human values in the rough direction of its own values: inclusive genetic fitness. We don't care about maximizing inclusive genetic fitness, but we do care about having sex...
The theoretical case for open borders is pretty good. But you might worry a lot about the downside risk of implementing such a big, effectively irreversible (it'd be nigh impossible to deport millions and millions of immigrants) policy change. What if the theory's wrong and the result is catastrophe?
Just like with futarchy, we might first try out a promising policy like open borders at the state level, to see how it goes. E.g., let people immigrate to just one US state with only minimal conditions. Scaling up a tested policy if it works and abandoning it i...
A semantic externalist once said,
"Meaning just ain't in the head.
Hence a brain-in-a-vat
Just couldn't think that
'Might it all be illusion instead?'"
I thought that having studied philosophy (instead of math or CS) made me an outlier for a rationalist.
But, milling about the Lightcone offices, fully half of the people I've encountered hold some kind of philosophy degree. "LessWrong: the best philosophy site on the internet."
Some mantras I recall a lot, to help keep on the rationalist straight-and-narrow and not let anxiety get the better of me:
Humans, "teetering bulbs of dream and dread," evolved as a generally intelligent patina around the Earth. We're all the general intelligence the planet has to throw around. What fraction of that generally intelligent skin is dedicated to defusing looming existential risks? What fraction is dedicated towards immanentizing the eschaton?