This post is a not a so secret analogy for the AI Alignment problem. Via a fictional dialog, Eliezer explores and counters common questions to the Rocket Alignment Problem as approached by the Mathematics of Intentional Rocketry Institute. 

MIRI researchers will tell you they're worried that "right now, nobody can tell you how to point your rocket’s nose such that it goes to the moon, nor indeed any prespecified celestial destination."

I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions: * By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value? * A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI, we can and will use it to figure out what we should use it for, and then we will use it in that way." To what extent is aligned AI going well contingent on something like this happening, and how likely do you think it is to happen? Why? * To what extent is your belief that aligned AI would go well contingent on some sort of assumption like: my idealized values are the same as the idealized values of the people or coalition who will control the aligned AI? * Do you care about AI welfare? Does your answer depend on whether the AI is aligned? If we built an aligned AI, how likely is it that we will create a world that treats AI welfare as important consideration? What if we build a misaligned AI? * Do you think that, to a first approximation, most of the possible value of the future happens in worlds that are optimized for something that resembles your current or idealized values? How bad is it to mostly sacrifice each of these? (What if the future world's values are similar to yours, but is only kinda effectual at pursuing them? What if the world is optimized for something that's only slightly correlated with your values?) How likely are these various options under an aligned AI future vs. an unaligned AI future?
Elizabeth11h163
0
Check my math: how does Enovid compare to to humming? Nitric Oxide is an antimicrobial and immune booster. Normal nasal nitric oxide is 0.14ppm for women and 0.18ppm for men (sinus levels are 100x higher). journals.sagepub.com/doi/pdf/10.117… Enovid is a nasal spray that produces NO. I had the damndest time quantifying Enovid, but this trial registration says 0.11ppm NO/hour. They deliver every 8h and I think that dose is amortized, so the true dose is 0.88. But maybe it's more complicated. I've got an email out to the PI but am not hopeful about a response clinicaltrials.gov/study/NCT05109…   so Enovid increases nasal NO levels somewhere between 75% and 600% compared to baseline- not shabby. Except humming increases nasal NO levels by 1500-2000%. atsjournals.org/doi/pdf/10.116…. Enovid stings and humming doesn't, so it seems like Enovid should have the larger dose. But the spray doesn't contain NO itself, but compounds that react to form NO. Maybe that's where the sting comes from? Cystic fibrosis and burn patients are sometimes given stratospheric levels of NO for hours or days; if the burn from Envoid came from the NO itself than those patients would be in agony.  I'm not finding any data on humming and respiratory infections. Google scholar gives me information on CF and COPD, @Elicit brought me a bunch of studies about honey.   With better keywords google scholar to bring me a bunch of descriptions of yogic breathing with no empirical backing. There are some very circumstantial studies on illness in mouth breathers vs. nasal, but that design has too many confounders for me to take seriously.  Where I'm most likely wrong: * misinterpreted the dosage in the RCT * dosage in RCT is lower than in Enovid * Enovid's dose per spray is 0.5ml, so pretty close to the new study. But it recommends two sprays per nostril, so real dose is 2x that. Which is still not quite as powerful as a single hum. 
A tension that keeps recurring when I think about philosophy is between the "view from nowhere" and the "view from somewhere", i.e. a third-person versus first-person perspective—especially when thinking about anthropics. One version of the view from nowhere says that there's some "objective" way of assigning measure to universes (or people within those universes, or person-moments). You should expect to end up in different possible situations in proportion to how much measure your instances in those situations have. For example, UDASSA ascribes measure based on the simplicity of the computation that outputs your experience. One version of the view from somewhere says that the way you assign measure across different instances should depend on your values. You should act as if you expect to end up in different possible future situations in proportion to how much power to implement your values the instances in each of those situations has. I'll call this the ADT approach, because that seems like the core insight of Anthropic Decision Theory. Wei Dai also discusses it here. In some sense each of these views makes a prediction. UDASSA predicts that we live in a universe with laws of physics that are very simple to specify (even if they're computationally expensive to run), which seems to be true. Meanwhile the ADT approach "predicts" that we find ourselves at an unusually pivotal point in history, which also seems true. Intuitively I want to say "yeah, but if I keep predicting that I will end up in more and more pivotal places, eventually that will be falsified". But.... on a personal level, this hasn't actually been falsified yet. And more generally, acting on those predictions can still be positive in expectation even if they almost surely end up being falsified. It's a St Petersburg paradox, basically. Very speculatively, then, maybe a way to reconcile the view from somewhere and the view from nowhere is via something like geometric rationality, which avoids St Petersburg paradoxes. And more generally, it feels like there's some kind of multi-agent perspective which says I shouldn't model all these copies of myself as acting in unison, but rather as optimizing for some compromise between all their different goals (which can differ even if they're identical, because of indexicality). No strong conclusions here but I want to keep playing around with some of these ideas (which were inspired by a call with @zhukeepa). This was all kinda rambly but I think I can summarize it as "Isn't it weird that ADT tells us that we should act as if we'll end up in unusually important places, and also we do seem to be in an incredibly unusually important place in the universe? I don't have a story for why these things are related but it does seem like a suspicious coincidence."
Neil 7m10
0
Poetry and practicality I was staring up at the moon a few days ago and thought about how deeply I loved my family, and wished to one day start my own (I'm just over 18 now). It was a nice moment. Then, I whipped out my laptop and felt constrained to get back to work; i.e. read papers for my AI governance course, write up LW posts, and trade emails with EA France. (These I believe to be my best shots at increasing everyone's odds of survival). It felt almost like sacrilege to wrench myself away from the moon and my wonder. Like I was ruining a moment of poetry and stillwatered peace by slamming against reality and its mundane things again. But... The reason I wrenched myself away is directly downstream from the spirit that animated me in the first place. Whether I feel the poetry now that I felt then is irrelevant: it's still there, and its value and truth persist. Pulling away from the moon was evidence I cared about my musings enough to act on them. The poetic is not a separate magisterium from the practical; rather the practical is a particular facet of the poetic. Feeling "something to protect" in my bones naturally extends to acting it out. In other words, poetry doesn't just stop. Feel no guilt in pulling away. Because, you're not.
There was this voice inside my head that told me that since I got Something to protect, relaxing is never ok above strict minimum, the goal is paramount, and I should just work as hard as I can all the time. This led me to breaking down and being incapable to work on my AI governance job for a week, as I just piled up too much stress. And then, I decided to follow what motivated me in the moment, instead of coercing myself into working on what I thought was most important, and lo and behold! my total output increased, while my time spent working decreased. I'm so angry and sad at the inadequacy of my role models, cultural norms, rationality advice, model of the good EA who does not burn out, which still led me to smash into the wall despite their best intentions. I became so estranged from my own body and perceptions, ignoring my core motivations, feeling harder and harder to work. I dug myself such deep a hole. I'm terrified at the prospect to have to rebuild my motivation myself again.

Popular Comments

Recent Discussion

Warning: This post might be depressing to read for everyone except trans women. Gender identity and suicide is discussed. This is all highly speculative. I know near-zero about biology, chemistry, or physiology. I do not recommend anyone take hormones to try to increase their intelligence; mood & identity are more important.

Why are trans women so intellectually successful? They seem to be overrepresented 5-100x in eg cybersecurity twitter, mathy AI alignment, non-scam crypto twitter, math PhD programs, etc.

To explain this, let's first ask: Why aren't males way smarter than females on average? Males have ~13% higher cortical neuron density and 11% heavier brains (implying   more area?). One might expect males to have mean IQ far above females then, but instead the means and medians are similar:

Left. Right.

My theory...

Why is nobody in San Francisco pretty? Hormones make you pretty but dumb (pretty faces don't usually pay rent in SF). Why is nobody in Los Angeles smart? Hormones make you pretty but dumb. (Sincere apologies to all residents of SF & LA.)

Some other possibilities:

  • Pretty people self-select towards interests and occupations that reward beauty. If you're pretty, you're more likely to be popular in high school, which interferes with the dedication necessary to become a great programmer.

  • A big reason people are prettier in LA is they put significant

... (read more)
8AprilSR15m
All the smart trans girls I know were also smart prior to HRT.
2Sabiola1h
LOL! I don't think women's clothing is less itchy (my husband's isn't any itchier than mine), but even if it were, that advantage would be totally negated by most women having to wear a bra.
6kromem2h
Your hypothesis is ignoring environmental factors. I'd recommend reading over the following paper: https://journals.sagepub.com/doi/10.1177/2332858416673617 A few highlights: In a lot of ways the way you are looking at the topic perpetuates a rather unhealthy assumption of underlying biological differences in competency that avoids consideration of contributing environmental privileges and harms. You can't just hand wave aside the inherent privilege of presenting male during early childhood education in evaluating later STEM performance. Rather than seeing the performance gap of trans women over women presenting that way from birth as a result of a hormonal advantage, it may be that what you are actually ending up measuring is the performance gap resulting from the disadvantage placed upon women due to early education experiences being treated differently from the many trans women who had been presenting as boys during those grades. i.e. Perhaps all women could have been doing quite a lot better in STEM fields if the world treated them the way it treated boys during Kindergarten through early grades and what we need socially isn't hormone prescriptions but serious adjustments to presumptions around gender and biologically driven competencies.

Note: It seems like great essays should go here and be fed through the standard LessWrong algorithm. There is possibly a copyright issue here, but we aren't making any money off it either. What follows is a full copy of "This is Water" by David Foster Wallace his 2005 commencement speech to the graduating class at Kenyon College.

Greetings parents and congratulations to Kenyon’s graduating class of 2005. There are these two young fish swimming along and they happen to meet an older fish swimming the other way, who nods at them and says “Morning, boys. How’s the water?” And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes “What the hell is water?”

This is...

2Nathan Young7m
Thanks. And i appreciate that LessWrong is a space where mods feel empowered to do this, since it’s the right call.
8cousin_it39m
There's an amazing HN comment that I mention everytime someone links to this essay. It says don't do what the essay says, you'll make yourself depressed. Instead do something a bit different, and maybe even opposite. Let's say for example you feel annoyed by the fat checkout lady. DFW advises you to step over your annoyance, imagine the checkout lady is caring for her sick husband, and so on. But that kind of approach to your own feelings will hurt you in the long run, and maybe even seriously hurt you. Instead, the right thing is to simply feel annoyed at the checkout lady. Let the feeling come and be heard. After it's heard, it'll be gone by itself soon enough. Here's the whole comment, to save people the click: And another comment from a different person making the same point:

Wow, well i’m glad i posted it here.

3yanni3h
If you're into podcasts, the Very Bad Wizards guys did an ep on this essay, which I enjoyed: https://verybadwizards.com/episode/episode-227-a-terrible-master-david-foster-wallaces-this-is-water

The history of science has tons of examples of the same thing being discovered multiple time independently; wikipedia has a whole list of examples here. If your goal in studying the history of science is to extract the predictable/overdetermined component of humanity's trajectory, then it makes sense to focus on such examples.

But if your goal is to achieve high counterfactual impact in your own research, then you should probably draw inspiration from the opposite: "singular" discoveries, i.e. discoveries which nobody else was anywhere close to figuring out. After all, if someone else would have figured it out shortly after anyways, then the discovery probably wasn't very counterfactually impactful.

Alas, nobody seems to have made a list of highly counterfactual scientific discoveries, to complement wikipedia's list of multiple discoveries.

To...

dr_s6m20

Maybe it's the other way around, and it's the Chinese elite who was unusually and stubbornly conservative on this, trusting the wisdom of their ancestors over foreign devilry (would be a pretty Confucian thing to do). The Greeks realised the Earth was round from things like seeing sails appear over the horizon. Any sailing peoples thinking about this would have noticed sooner or later.

Kind of a long shot, but did Polynesian people have ideas on this, for example?

2dr_s16m
Democritus also has a decent claim to that for being the first to imagine atoms and materialism altogether.
2tailcalled2h
I would say "the thing that contains the inheritance particles" rather than "the inheritance particle". "Particulate inheritance" is a technical term within genetics and it refers to how children don't end up precisely with the mean of their parents' traits (blending inheritance), but rather with some noise around that mean, which particulate inheritance asserts is due to the genetic influence being separated into discrete particles with the children receiving random subsets of their parent's genes. The significance of this is that under blending inheritance, the genetic variation between organisms within a species would be averaged away in a small number of generations, which would make evolution by natural selection ~impossible (as natural selection doesn't work without genetic variation).
2tailcalled2h
Isn't singular learning theory basically just another way of talking about the breadth of optima?
Neil 7m10

Poetry and practicality

I was staring up at the moon a few days ago and thought about how deeply I loved my family, and wished to one day start my own (I'm just over 18 now). It was a nice moment.

Then, I whipped out my laptop and felt constrained to get back to work; i.e. read papers for my AI governance course, write up LW posts, and trade emails with EA France. (These I believe to be my best shots at increasing everyone's odds of survival).

It felt almost like sacrilege to wrench myself away from the moon and my wonder. Like I was ruining a moment of poetr... (read more)

TL;DR

Tacit knowledge is extremely valuable. Unfortunately, developing tacit knowledge is usually bottlenecked by apprentice-master relationships. Tacit Knowledge Videos could widen this bottleneck. This post is a Schelling point for aggregating these videos—aiming to be The Best Textbooks on Every Subject for Tacit Knowledge Videos. Scroll down to the list if that's what you're here for. Post videos that highlight tacit knowledge in the comments and I’ll add them to the post. Experts in the videos include Stephen Wolfram, Holden Karnofsky, Andy Matuschak, Jonathan Blow, Tyler Cowen, George Hotz, and others. 

What are Tacit Knowledge Videos?

Samo Burja claims YouTube has opened the gates for a revolution in tacit knowledge transfer. Burja defines tacit knowledge as follows:

Tacit knowledge is knowledge that can’t properly be transmitted via verbal or written instruction, like the ability to create

...
quailia11m10

Networking, Relationship building, both professional and personal, I'm sure there are overlaps. And echoing another request: Sales

1Amadeus Pagel12h
You've already mentioned cooking as an example and this is definitely something I'd like to imiprove in. I looked up how to crack eggs:  How to clip nails: https://www.tiktok.com/@jonijawne/video/7212337177772952838?q=cut%20nails&t=1713988543560 How to improve posture:

About a year ago I decided to try using one of those apps where you tie your goals to some kind of financial penalty. The specific one I tried is Forfeit, which I liked the look of because it’s relatively simple, you set single tasks which you have to verify you have completed with a photo.

I’m generally pretty sceptical of productivity systems, tools for thought, mindset shifts, life hacks and so on. But this one I have found to be really shockingly effective, it has been about the biggest positive change to my life that I can remember. I feel like the category of things which benefit from careful planning and execution over time has completely opened up to me, whereas previously things like this would be largely down to the...

2CronoDAS13h
I have a bad history of not being responsive to the threat of punishment. When I have an aversive task, and the consequences for not doing that task suddenly get much worse, I start acting like the punishment is inevitable and am even less likely to actually do the task. In other words, I fail the "gun to the head test" quite dramatically. Guy with a gun: I'm going to shoot you if you haven't changed the sheets on your bed by tomorrow. Me: AAH I'M GOING TO DIE I'TS NO GOOD I MIGHT AS WELL SPEND THE DAY LYING IN BED PLAYING VIDEO GAMES BECAUSE I'M GOING TO GET SHOT TOMORROW SOMEONE CALL THE FUNERAL HOME AND MAKE PLANS TELL MY FAMILY I LOVE THEM Guy with a gun: You know, you could always just... change the sheets? ME: THE THOUGHT HAS OCCURRED TO ME BUT I'M TOO UPSET RIGHT NOW ABOUT THE FACT THAT I'M GOING TO DIE TOMORROW BECAUSE THE SHEETS WEREN'T CHANGED TO ACTUALLY GO AND CHANGE THEM Also I have a bad history with this kind of thing in general - one thing that I was always bothered by when I was in school and college was that the only motivation I really had for doing my work was to avoid bad consequences - I was so sick of spending my life making myself miserable in order to avoid things that ought to be even worse. I also have a hard time being motivated by money: bad consequences for having insufficient money have the problem I've already described, and, well, video games are cheap. (In case you're wondering, no, I don't work, and my parents still support me financially.) So when I think of commitment apps, I tend to react to them as entirely downside: I don't expect my behavior to change very much, and I do expect to predictably lose money. :(
dreeves15m10

I think this is a persuasive case that commitment devices aren't good for you. I'm very interested in how common this is, and if there's a way you could reframe commit devices to avoid this psychological reaction to them. One idea is to focus on incentive alignment that avoids the far end of the spectrum. With Beeminder in particular, you could set a low pledge cap and then focus on the positive reinforcement of keeping your graph pretty by keeping the datapoints on the right side of the red line.

3Elizabeth15h
I curated this post because 1. this is a rare productivity system post that made me consider actually implementing it. Right now I can’t because my energy levels are too variable, but if that weren’t true I would definitely be trying it. 2. lots of details, on lots of levels. Things like “I fail 5% of the time” and then translating that too “therefore i price things such that if I could pay 5% of the failure fee to just have it done, I would do so.” 3. Practical advice like “yes verification sometimes takes a stupid amount of time, the habit is nonetheless worth it” or “arrange things to verify the day after”  
2Josh Mitchell9h
Thanks, Elizabeth! Really has helped us out :)
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Before we get started, this is your quarterly reminder that I have no medical credentials and my highest academic credential is a BA in a different part of biology (with a double major in computer science). In a world with a functional medical system no one would listen to me. 

Tl;dr povidone iodine probably reduces viral load when used in the mouth or nose, with corresponding decreases in symptoms and infectivity. The effect size could be as high as 90% for prophylactic use (and as low as 0% when used in late illness), but is probably much smaller. There is a long tail of side-effects. No study I read reported side effects at clinically significant levels, but I don’t think they looked hard enough. There are other gargle...

Another possible risk: Accidentally swallowing the iodine. This happened to me. I was using a squeezable nasal irrigation device, I squirted some of the mixture into my mouth, and it went right down my throat. I called Poison Control, followed their instructions (IIRC they told me to consume a lot of starchy food, I think maybe I took some activated charcoal too), and ended up being fine.

3Sergii9h
What about estimating LLM capabilities from the length of a sequence of numbers that it can reverse? I used prompts like: "please reverse 4 5 8 1 1 8 1 4 4 9 3 9 3 3 3 5 5 2 7 8" "please reverse 1 9 4 8 6 1 3 2 2 5" etc... Some results: - Llama2 starts making mistakes after 5 numbers - Llama3 can do 10, but fails at 20 - GPT-4 can do 20 but fails at 40 The followup questions are: - what should be the name of this metric? - are the other top-scoring models like Claude similar? (I don't have access) - any bets on how many numbers will GPT-5 be able to reverse? - how many numbers should AGI be able to reverse? ASI? can this be a Turing test of sorts?
p.b.38m10

In psychometrics this is called "backward digit span".

It seems to me worth trying to slow down AI development to steer successfully around the shoals of extinction and out to utopia.

But I was thinking lately: even if I didn’t think there was any chance of extinction risk, it might still be worth prioritizing a lot of care over moving at maximal speed. Because there are many different possible AI futures, and I think there’s a good chance that the initial direction affects the long term path, and different long term paths go to different places. The systems we build now will shape the next systems, and so forth. If the first human-level-ish AI is brain emulations, I expect a quite different sequence of events to if it is GPT-ish.

People genuinely pushing for AI speed over care (rather than just feeling impotent) apparently think there is negligible risk of bad outcomes, but also they are asking to take the first future to which there is a path. Yet possible futures are a large space, and arguably we are in a rare plateau where we could climb very different hills, and get to much better futures.

LessOnline

A Festival of Writers Who are Wrong on the Internet

May 31 - Jun 2, Berkeley, CA