This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute. 

MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."

I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions: * By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value? * A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way." To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why? * To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI? * Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare as important consideration? What if we build a misaligned AI? * Do you think that, to a first approximation, most of the possible value of the future happens in worlds that are optimized for something that resembles your current or idealized values? How bad is it to mostly sacrifice each of these? (What if the future world's values are similar to yours, but is only kinda effectual at pursuing them? What if the world is optimized for something that's only slightly correlated with your values?) How likely are these various options under an aligned AI future vs. an unaligned AI future?
Elizabeth14h163
0
Check my math: how does Enovid compare to to humming? Nitric Oxide is an antimicrobial and immune booster. Normal nasal nitric oxide is 0.14ppm for women and 0.18ppm for men (sinus levels are 100x higher). journals.sagepub.com/doi/pdf/10.117… Enovid is a nasal spray that produces NO. I had the damndest time quantifying Enovid, but this trial registration says 0.11ppm NO/hour. They deliver every 8h and I think that dose is amortized, so the true dose is 0.88. But maybe it's more complicated. I've got an email out to the PI but am not hopeful about a response clinicaltrials.gov/study/NCT05109…   so Enovid increases nasal NO levels somewhere between 75% and 600% compared to baseline- not shabby. Except humming increases nasal NO levels by 1500-2000%. atsjournals.org/doi/pdf/10.116…. Enovid stings and humming doesn't, so it seems like Enovid should have the larger dose. But the spray doesn't contain NO itself, but compounds that react to form NO. Maybe that's where the sting comes from? Cystic fibrosis and burn patients are sometimes given stratospheric levels of NO for hours or days; if the burn from Envoid came from the NO itself than those patients would be in agony.  I'm not finding any data on humming and respiratory infections. Google scholar gives me information on CF and COPD, @Elicit brought me a bunch of studies about honey.   With better keywords google scholar to bring me a bunch of descriptions of yogic breathing with no empirical backing. There are some very circumstantial studies on illness in mouth breathers vs. nasal, but that design has too many confounders for me to take seriously.  Where I'm most likely wrong: * misinterpreted the dosage in the RCT * dosage in RCT is lower than in Enovid * Enovid's dose per spray is 0.5ml, so pretty close to the new study. But it recommends two sprays per nostril, so real dose is 2x that. Which is still not quite as powerful as a single hum. 
Neil 3h30
0
Poetry and practicality I was staring up at the moon a few days ago and thought about how deeply I loved my family, and wished to one day start my own (I'm just over 18 now). It was a nice moment. Then, I whipped out my laptop and felt constrained to get back to work; i.e. read papers for my AI governance course, write up LW posts, and trade emails with EA France. (These I believe to be my best shots at increasing everyone's odds of survival). It felt almost like sacrilege to wrench myself away from the moon and my wonder. Like I was ruining a moment of poetry and stillwatered peace by slamming against reality and its mundane things again. But... The reason I wrenched myself away is directly downstream from the spirit that animated me in the first place. Whether I feel the poetry now that I felt then is irrelevant: it's still there, and its value and truth persist. Pulling away from the moon was evidence I cared about my musings enough to act on them. The poetic is not a separate magisterium from the practical; rather the practical is a particular facet of the poetic. Feeling "something to protect" in my bones naturally extends to acting it out. In other words, poetry doesn't just stop. Feel no guilt in pulling away. Because, you're not.
A tension that keeps recurring when I think about philosophy is between the "view from nowhere" and the "view from somewhere", i.e. a third-person versus first-person perspective—especially when thinking about anthropics. One version of the view from nowhere says that there's some "objective" way of assigning measure to universes (or people within those universes, or person-moments). You should expect to end up in different possible situations in proportion to how much measure your instances in those situations have. For example, UDASSA ascribes measure based on the simplicity of the computation that outputs your experience. One version of the view from somewhere says that the way you assign measure across different instances should depend on your values. You should act as if you expect to end up in different possible future situations in proportion to how much power to implement your values the instances in each of those situations has. I'll call this the ADT approach, because that seems like the core insight of Anthropic Decision Theory. Wei Dai also discusses it here. In some sense each of these views makes a prediction. UDASSA predicts that we live in a universe with laws of physics that are very simple to specify (even if they're computationally expensive to run), which seems to be true. Meanwhile the ADT approach "predicts" that we find ourselves at an unusually pivotal point in history, which also seems true. Intuitively I want to say "yeah, but if I keep predicting that I will end up in more and more pivotal places, eventually that will be falsified". But.... on a personal level, this hasn't actually been falsified yet. And more generally, acting on those predictions can still be positive in expectation even if they almost surely end up being falsified. It's a St Petersburg paradox, basically. Very speculatively, then, maybe a way to reconcile the view from somewhere and the view from nowhere is via something like geometric rationality, which avoids St Petersburg paradoxes. And more generally, it feels like there's some kind of multi-agent perspective which says I shouldn't model all these copies of myself as acting in unison, but rather as optimizing for some compromise between all their different goals (which can differ even if they're identical, because of indexicality). No strong conclusions here but I want to keep playing around with some of these ideas (which were inspired by a call with @zhukeepa). This was all kinda rambly but I think I can summarize it as "Isn't it weird that ADT tells us that we should act as if we'll end up in unusually important places, and also we do seem to be in an incredibly unusually important place in the universe? I don't have a story for why these things are related but it does seem like a suspicious coincidence."
There was this voice inside my head that told me that since I got Something to protect, relaxing is never ok above strict minimum, the goal is paramount, and I should just work as hard as I can all the time. This led me to breaking down and being incapable to work on my AI governance job for a week, as I just piled up too much stress. And then, I decided to follow what motivated me in the moment, instead of coercing myself into working on what I thought was most important, and lo and behold! my total output increased, while my time spent working decreased. I'm so angry and sad at the inadequacy of my role models, cultural norms, rationality advice, model of the good EA who does not burn out, which still led me to smash into the wall despite their best intentions. I became so estranged from my own body and perceptions, ignoring my core motivations, feeling harder and harder to work. I dug myself such deep a hole. I'm terrified at the prospect to have to rebuild my motivation myself again.

Popular Comments

Recent Discussion

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

I don't buy this, the curvedness of the sea is obvious to sailors, e.g. you see the tops of islands long before you see the beach, and indeed to anyone who has ever swum across a bay! Inland peoples might be able to believe the world is flat, but not anyone with boats.

2Alexander Gietelink Oldenziel30m
Singular Learning Theory is another way of "talking about the breadth of optima" in the same sense that Newton's Universal Law of Gravitation is another way of "talking about Things Falling Down". 
2Alexander Gietelink Oldenziel1h
Don't forget Wallace !
2Alexander Gietelink Oldenziel1h
Yes, beautiful example ! Van Leeuwenhoek was the one-man ASML of the 17th century. In this case, we actually have evidence to the counterfactual impact as other lensmakers trailed van Leeuwenhoek by many decades. It's plausible that high-precision measurement and fabrication is the key bottleneck in most technological and scientific progress- it's difficult to oversell the importance of van Leeuwenhoek. 

I refuse to join any club that would have me as a member.

— Groucho Marx

Alice and Carol are walking on the sidewalk in a large city, and end up together for a while.

"Hi, I'm Alice! What's your name?"

Carol thinks:

If Alice is trying to meet people this way, that means she doesn't have a much better option for meeting people, which reduces my estimate of the value of knowing Alice. That makes me skeptical of this whole interaction, which reduces the value of approaching me like this, and Alice should know this, which further reduces my estimate of Alice's other social options, which makes me even less interested in meeting Alice like this.

Carol might not think all of that consciously, but that's how human social reasoning tends to...

gjm37m20

It looks to me as if, of the four "root causes of social relationships becoming more of a lemon market" listed in the OP, only one is actually anything to do with lemon-market-ness as such.

The dynamic in a lemon market is that you have some initial fraction of lemons but it hardly matters what that is because the fraction of lemons quickly increases until there's nothing else, because buyers can't tell what they're getting. It's that last feature that makes the lemon market, not the initial fraction of lemons. And I think three of the four proposed "root c... (read more)

Epistemic status: party trick

Why remove the prior

One famed feature of Bayesian inference is that it involves prior probability distributions. Given an exhaustive collection of mutually exclusive ways the world could be (hereafter called ‘hypotheses’), one starts with a sense of how likely the world is to be described by each hypothesis, in the absence of any contingent relevant evidence. One then combines this prior with a likelihood distribution, which for each hypothesis gives the probability that one would see any particular set of evidence, to get a posterior distribution of how likely each hypothesis is to be true given observed evidence. The prior and the likelihood seem pretty different: the prior is looking at the probability of the hypotheses in question, whereas the likelihood is looking at...

Razied1h20

Most of the weird stuff involving priors comes into being when you want posteriors over a continuous hypothesis space, where you get in trouble because reparametrizing your space changes the form of your prior, so a uniform "natural" prior is really a particular choice of parametrization. Using a discrete hypothesis space avoids big parts of the problem.

6Jsevillamol5h
I've been tempted to do this sometime, but I fear the prior is performing one very important role you are not making explicit: defining the universe of possible hypothesis you consider. In turn, defining that universe of probabilities defines how bayesian updates look like. Here is a problem that arises when you ignore this: https://www.lesswrong.com/posts/R28ppqby8zftndDAM/a-bayesian-aggregation-paradox

Warning: This post might be depressing to read for everyone except trans women. Gender identity and suicide is discussed. This is all highly speculative. I know near-zero about biology, chemistry, or physiology. I do not recommend anyone take hormones to try to increase their intelligence; mood & identity are more important.

Why are trans women so intellectually successful? They seem to be overrepresented 5-100x in eg cybersecurity twitter, mathy AI alignment, non-scam crypto twitter, math PhD programs, etc.

To explain this, let's first ask: Why aren't males way smarter than females on average? Males have ~13% higher cortical neuron density and 11% heavier brains (implying   more area?). One might expect males to have mean IQ far above females then, but instead the means and medians are similar:

Left. Right.

My theory...

I don't understand why you need to invoke testosterone. Transgender brain is special, for example, transgender women have immunity to visual illusions. Anecdotally, I have friends with gender identity problems who do not make gender transition because it's costly and they don't have it this hard, they are STEM-level smart and they are not susceptible to visual illusions. So, assuming that this phenomenon exists (I don't quite believe your twitter statistics), it's likely explainable by transwomen innate brain structure.

The other weirdness in your hypothesi... (read more)

1Ebenezer Dukakis3h
Some other possibilities: * Pretty people self-select towards interests and occupations that reward beauty. If you're pretty, you're more likely to be popular in high school, which interferes with the dedication necessary to become a great programmer. * A big reason people are prettier in LA is they put significant effort into their appearance -- hair, makeup, orthodontics, weight loss, etc. Perhaps hunter/gatherer tribes had gender-based specialization of labor. If men are handling the hunting and tribe defense which requires the big muscles, there's less need for women to pay the big-muscle metabolic cost.
12AprilSR4h
All the smart trans girls I know were also smart prior to HRT.
2Sabiola4h
LOL! I don't think women's clothing is less itchy (my husband's isn't any itchier than mine), but even if it were, that advantage would be totally negated by most women having to wear a bra.
Ben1h20

Very interesting. It sounds like your "third person view from nowhere" vs the "first person view from somewhere" is very similar to something I was thinking about recently. I called them "objectively distinct situations" in contrast with "subjectively distinct situations". My view is that most of the anthropic arguments that "feel wrong" to me are built on trying to make me assign equal probability to all subjectively distinct scenarios, rather than objective ones. eg. A replication machine makes it so there are two of me, then "I" could be either of them,... (read more)

2Wei Dai11h
This is particularly weird because your indexical probability then depends on what kind of bet you're offered. In other words, our marginal utility of money differs from our marginal utility of other things, and which one do you use to set your indexical probability? So this seems like a non-starter to me... (ETA: Maybe it changes moment by moment as we consider different decisions, or something like that? But what about when we're just contemplating a philosophical problem and not trying to make any specific decisions?) Yes, didn't want to just say "acausal trade" in case threats/war is also a big thing.
2Richard_Ngo10h
It seems pretty weird to me too, but to steelman: why shouldn't it depend on the type of bet you're offered? Your indexical probabilities can depend on any other type of observation you have when you open your eyes. E.g. maybe you see blue carpets, and you know that world A is 2x more likely to have blue carpets. And hearing someone say "and the bet is denominated in money not time" could maybe update you in an analogous way. I mostly offer this in the spirit of "here's the only way I can see to reconcile subjective anticipation with UDT at all", not "here's something which makes any sense mechanistically or which I can justify on intuitive grounds".
6Wei Dai10h
I added this to my comment just before I saw your reply: Maybe it changes moment by moment as we consider different decisions, or something like that? But what about when we're just contemplating a philosophical problem and not trying to make any specific decisions? Ah I see. I think this is incomplete even for that purpose, because "subjective anticipation" to me also includes "I currently see X, what should I expect to see in the future?" and not just "What should I expect to see, unconditionally?" (See the link earlier about UDASSA not dealing with subjective anticipation.) ETA: Currently I'm basically thinking: use UDT for making decisions, use UDASSA for unconditional subjective anticipation, am confused about conditional subjective anticipation as well as how UDT and UDASSA are disconnected from each other (i.e., the subjective anticipation from UDASSA not feeding into decision making). Would love to improve upon this, but your idea currently feels worse than this...

I took the Reading the Mind in the Eyes Test test today. I got 27/36. Jessica Livingston got 36/36.

Reading expressions is almost mind reading. Practicing reading expressions should be easy with the right software. All you need is software that shows a random photo from a large database, asks the user to guess what it is, and then informs the user what the correct answer is. I felt myself getting noticeably better just from the 36 images on the test.

Short standardized tests exist to test this skill, but is there good software for training it? It needs to have lots of examples, so the user learns to recognize expressions instead of overfitting on specific pictures.

Paul Ekman has a product, but I don't know how good it is.

I (to my own surprise) got an "above average" score when I took this test a few years back, which I attribute mostly to the lack of emotional and circumstantial 'noise' in the images. I don't think being able to tell what is being emoted by a professional actor told to display exactly one (1) emotion, with no mediating factors, has much connection with being able to read actual people.

(. . . though a level-2 version with tags like "excited but hesitant" or "proud and angry" or "cheerful; unrelatedly, lowkey seasick" could actually be extremely useful, now I think on it.)

1ö14h
I kind of implicitly assumed we are not talking about missing the obvious stuff (like someone staring at you angrily in a 1 to 1 conversation). That would probably best be explicitly learned by flashcards. Everything but basic emotions has a lot of hidden states and the tracking becomes much more of a thing. But that state is not all that hidden. You actually know a lot about the people in your life.  The hard part is coming up with enough hypotheses and not separating true from false. I call it to myself 'generating social conspiracy theories' to get rid of my inhibition to state a bad theory. Whatever you come up with usually will not be too bad. Evaluating the truth of 'my colleague is stressed' is usually easy. But it will make you aware that they are or aren't and how that influences their behavior. That is what you actually learn and what will make you aware of their stress in the future. I never felt like there is a lack of 'obvious' things to become aware of. Either things are so interconnected that everything is kind of accessible with enough layers of such perceptions, or I am playing on too basic of a level of this game to get to interesting cases. I feel like I am learning some deep art, so I am probably a total beginner to something most are much more capable at just by using their intuition..  The disappointing part of course is that reading strangers minds is hard with huge error bars and reading huge parts of the mind of close people is basically expected. I might be arguing something totally besides Lsusrs original point, but I do not think that facial expressions carry very far and this (cognitive empathy) does the thing he seems to be after. 
2lsusr13h
To clarify: I am looking specifically for a tool that trains me to read facial expressions—especially eye expressions—better. This is exactly what I am after.
1Answer by joec16h
I think with a decent training set, this could make a pretty nice Anki deck. The difficulty in this would be getting the data and accurate emotional expression labels. A few ideas: 1. Pay highschool/college drama students to fake expressions. The quality of the data would be limited by their acting skill, but you could get honest labels. 2. Gather up some participants and expose them to a variety of things, taking pictures of them under different emotional states. This could run into the problem of people misreporting their actual emotional state. Learning with these might make the user more susceptible to deception. 3. Screenshot expressions from movies/videos where the emotional state of the subjects are clear from context.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
This is a linkpost for https://dynomight.net/seed-oil/

A friend has spent the last three years hounding me about seed oils. Every time I thought I was safe, he’d wait a couple months and renew his attack:

“When are you going to write about seed oils?”

“Did you know that seed oils are why there’s so much {obesity, heart disease, diabetes, inflammation, cancer, dementia}?”

“Why did you write about {meth, the death penalty, consciousness, nukes, ethylene, abortion, AI, aliens, colonoscopies, Tunnel Man, Bourdieu, Assange} when you could have written about seed oils?”

“Isn’t it time to quit your silly navel-gazing and use your weird obsessive personality to make a dent in the world—by writing about seed oils?”

He’d often send screenshots of people reminding each other that Corn Oil is Murder and that it’s critical that we overturn our lives...

EGI1h10

Yeah, I'd be willing to bet that too.

1David Cato2h
I wish you the best and look forward to hearing how it goes.
2romeostevensit5h
If some some pre-modern hominids ate high animal diets, and some populations of humans did, and that continued through history, I wouldn't call that relatively recent. I'm not the same person making the claim that there is overwhelming evidence that saturated fats can't possibly be bad for you. I'm making a much more restricted claim.
1denkenberger6h
I don't have a strong opinion because I think there's huge uncertainty in what is healthy. But for instance, my intuition is that a plant-based meat that had very similar nutritional characteristics as animal meat would be about as healthy (or unhealthy) as the meat itself. The plant-based meat would be ultra-processed. But one could think of the animal meat as being ultra-processed plants, so I guess one could think that that is the reason that animal meat is unhealthy?

Note: It seems like great essays should go here and be fed through the standard LessWrong algorithm. There is possibly a copyright issue here, but we aren't making any money off it either. What follows is a full copy of "This is Water" by David Foster Wallace his 2005 commencement speech to the graduating class at Kenyon College.

Greetings parents and congratulations to Kenyon’s graduating class of 2005. There are these two young fish swimming along and they happen to meet an older fish swimming the other way, who nods at them and says “Morning, boys. How’s the water?” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes “What the hell is water?”

This is...

2Nathan Young3h
Can I check that I've understood it. Roughly, the essay urges one to be conscious of each passing thought, to see it and kind of head it off at the tracks - "feeling angry?" "don't!". But the comment argues this is against what CBT says about feeling our feelings. What about Sam Harris' practise of meditation which seems focused on seeing and noticing thoughts, turning attention back on itself. I had a period last night of sort of "intense consciousness" where I felt very focused on the fact I was conscious. It. wasn't super pleasant, but it was profound. I can see why one would want to focus on that but also why it might be a bad idea.
2cousin_it2h
To me it's less about thoughts and more about emotions. And not about doing it all the time, but only when I'm having some intense emotion and need to do something about it. For example, let's say I'm angry about something. I imagine there's a knob in my mind: make the emotion stronger or weaker. (Or between feeling it less, and feeling it more.) What I usually do is turn the knob up. Try to feel the emotion more completely and in more detail, without trying to push any of it away. What usually happens next is the emotion kinda decides that it's been heard and goes away: a few minutes later I realize that whatever I was feeling is no longer as intense or urgent. Or I might even forget it entirely and find my mind thinking of something else. It's counterintuitive but it's really how it works for me; been doing it for over a decade now. It's the closest thing to a mental cheat code that I know.
2Nathan Young2h
Do you find it dampens good emotions. Like if you are deeply in love and feel it does it diminish the experience?

I think for good emotions the feel-it-completely thing happens naturally anyway.

This post brings together various questions about the college application process, as well as practical considerations of where to apply and go. We are seeing some encouraging developments, but mostly the situation remains rather terrible for all concerned.

Application Strategy and Difficulty

Paul Graham: Colleges that weren’t hard to get into when I was in HS are hard to get into now. The population has increased by 43%, but competition for elite colleges seems to have increased more. I think the reason is that there are more smart kids. If so that’s fortunate for America.

Are college applications getting more competitive over time?

Yes and no.

  1. The population size is up, but the cohort size is roughly the same.
  2. The standard ‘effort level’ of putting in work and sacrificing one’s childhood and gaming
...
xpym2h10

Indeed, from what I see there is consensus that academic standards on elite campuses are dramatically down, likely this has a lot to do with the need to sustain holistic admissions.

As in, the academic requirements, the ‘being smarter’ requirement, has actually weakened substantially. You need to be less smart, because the process does not care so much if you are smart, past a minimum. The process cares about… other things.

So, the signalling value of their degrees should be decreasing accordingly, unless one mainly intends to take advantage of the proces... (read more)

4Wei Dai8h
Some of my considerations for college choice for my kid, that I suspect others may also want to think more about or discuss: 1. status/signaling benefits for the parents (This is probably a major consideration for many parents to push their kids into elite schools. How much do you endorse it?) 2. sex ratio at the school and its effect on the local "dating culture" 3. political/ideological indoctrination by professors/peers 4. workload (having more/less time/energy to pursue one's own interests)
3Jacob G-W10h
I'm assuming the recent protests about the Gaza war: https://www.nytimes.com/live/2024/04/24/us/columbia-protests-mike-johnson
2Wei Dai10h
Is this actually true? China has (1) (affirmative action via "Express and objective (i.e., points and quotas)") for its minorities and different regions and FWICT the college admissions "eating your whole childhood" problem over there is way worse. Of course that could be despite (1) not because of it, but does make me question whether (3) ("Implied and subjective ('we look at the whole person').") is actually far worse than (1) for this.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA