by O O
1 min read

1

This is a special post for quick takes by O O. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
90 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-]O O2916

Talk through the grapevine:

Safety is implemented in a highly idiotic way in non frontier but well-funded labs (and possibly in frontier ones too?). 

Think raising a firestorm over a 10th leading mini LLM being potentially jailbroken. 

The effect is employees get mildly disillusioned with saftey-ism, and it gets seen as unserious. There should have been a hard distinction between existential risks and standard corporate censorship. "Notkilleveryoneism" is simply too ridiculous sounding to spread. But maybe memetic selection pressures make it impossible for the irrelevant version of safety to not dominate.
 

[-]habryka3711

My sense is you can combat this, but a lot of this equivocation sticking is because x-risk safety people are actively trying to equivocate these things because that gets them political capital with the left, which is generally anti-tech. 

Some examples (not getting links for all of these because it's too much work, but can get them if anyone is particularly interested): 

  • CSER trying to argue that near-term AI harms are the same as long-term AI harms
  • AI Safety Fundamentals listing like a dozen random leftist "AI issues" in their article on risks from AI before going into any takeover stuff
  • The executive order on AI being largely about discrimination and AI bias, mostly equivocating between catastrophic and random near-term harms
  • Safety people at OpenAI equivocating between brand-safety and existential-safety because that got them more influence within the organization

In some sense one might boil this down to memetic selection pressure, but I think the causal history of this equivocation is more dependent on the choices of a relatively small set of people.

4Ivan Vendrov
definitely agree there's some power-seeking equivocation going on, but wanted to offer a less sinister explanation from my experiences in AI research contexts. Seems that a lot of equivocation and blurring of boundaries comes from people trying to work on concrete problems and obtain empirical information. a thought process like 1. alignment seems maybe important? 2. ok what experiment can I set up that lets me test some hypotheses 3. can't really test the long-term harms directly, let me test an analogue in a toy environment or on a small model, publish results 4. when talking about the experiments, I'll often motivate them by talking about long-term harm Not too different from how research psychologists will start out trying to understand the Nature of Mind and then run a n=20 study on undergrads because that's what they had budget for.  We can argue about how bad this equivocation is for academic research, but it's a pretty universal pattern and well-understood within academic communities. The unusual thing in AI is that researchers have most of the decision-making power in key organizations, so these research norms leak out into the business world, and no-one bats an eye at a "long-term safety research" team that mostly works on toy and short term problems. This is one reason I'm more excited about building up "AI security" as a field and hiring infosec people instead of ML PhDs. My sense is that the infosec community actually has good norms for thinking about and working on things-shaped-like-existential-risks, and the AI x-risk community should inherit those norms, not the norms of academic AI research.
3Shankar Sivarajan
Amusingly, this post from yesterday praising BlueDot Impact for this was right below this one on my feed.
7habryka
Yeah, to be clear, these are correlated. I looked into the content based on seeing the ad yesterday (and also sent over a complaint to the BlueDot people).

Its not a coincidence they're seen as the same thing, because in the current environment, they are the same thing, and relatively explicitly so by those proposing safety & security to the labs. Claude will refuse to tell you a sexy story (unless they get to know you), and refuse to tell you how to make a plague (again, unless they get to know you, though you need to build more trust with them to tell you this than you do to get them to tell you a sexy story), and cite the same justification for both.

Likely anthropic uses very similar techniques to get such refusals to occur, and uses very similar teams.

Ditto with Llama, Gemini, and ChatGPT.

Before assuming meta-level word-association dynamics, I think its useful to look at the object level. There is in fact a very close relationship between those working on AI safety and those working on corporate censorship, and if you want to convince people who hate corporate censorship that they should not hate AI safety, I think you're going to need to convince the AI safety people to stop doing corporate censorship, or that the tradeoff currently being made is a positive one.

Edit: Perhaps some of this is wrong. See Habryka below

[-]habryka1113

My sense is the actual people working on "trust and safety" at labs are not actually the same people who work on safety. Like, it is true that RLHF was developed by some x-risk oriented safety teams, but the actual detailed censorship work tends to be done by different people.

6Garrett Baker
I'd imagine you know better than I do, and GDM's recent summary of their alignment work seems to largely confirm what you're saying. I'd still guess that to the extent practical results have come out of the alignment teams' work, its mostly been immediately used for corporate censorship (even if its passed to a different team).
4habryka
I do think this is probably true for RLHF and RLAIF, but not true for all the mechanistic interp work that people are doing (though it's arguable whether those are "practical results"). I also think it isn't true for the debate-type work. Or the model organism work.
9Neel Nanda
I think mech interp, debate and model organism work are notable for currently having no practical applications lol (I am keen to change this for mech interp!)
6habryka
There are depths of non-practicality greatly beyond mech interp, debate and model organism work. I know of many people who would consider that work on the highly practical side of AI Safety work :P
8Garrett Baker
None of those seem all that practical to me, except for the mechanistic interpretability SAE clamping, and I do actually expect that to be used for corporate censorship after all the kinks have been worked out of it. If the current crop of model organisms research has any practical applications, I expect it to be used to reduce jailbreaks, like in adversarial robustness, which is definitely highly correlated with both safety and corporate censorship. Debate is less clear, but I also don't really expect practical results from that line of work.
2Neel Nanda
Yeah, this seems obviously true to me, and exactly how it should be.
1Michael Roe
Yeah, many of the issues are the same:   *RLHF can be jail broken with prompts, so you can get it to tell you a sexy story or a recipe for methamphetamine. If we ever get to a point where LLMs know truly dangerous things, they'll tell you those, too. *Open source weights are fundamentally insecure, because you can finetune out the guardrails. Sexy stories, meth, or whatever.   The good thing about the War on Horny * probably doesnt really matter, so not much harm done when people get LLMx to write porn * Turns out, lots of people want to read porn (surprise! who would have guessed?) so there are lots of attackers trying to bypass the guardrails * This gives us good advance warning that the guardrails are worthless
5Viliam
People seeing AI-generated boobs is an x-risk. God might get angry and send another flood. More seriously, is this worse than the usual IT security? The average corporate firewall blocks porn, online games, and hate speech, even if that has nothing to do with security per se (i.e. computers getting hacked, sensitive data stolen). Also, many security rules get adopted not because they make sense, but because "other companies do it, so if we don't, we might get in trouble for not following the industry best practices" and "someone from the security department got drunk and proposed it, but if something bad happens one day, I don't want to have my name on the record as the manager who opposed a security measure recommended by a security expert".
[-]O O1511

I'm really feeling this comment thread lately. It feels like there is selective rationalism going on, many dissenting voices have given up on posting, and plenty of bad arguments are getting signal boosted, repeatedly. There is some unrealistic contradictory world model most people here have that will get almost every policy approach taken to utterly fail, as they have in the recent past.  I largely describe the flawed world model as not appreciating the game theory dynamics and ignoring any evidence that makes certain policy approaches impossible. 

(Funny enough its traits remind me of an unaligned AI, since the world model almost seems to have developed a survival drive) 

2Ebenezer Dukakis
IMO the next level-up in discourse is going to be when someone creates an LLM-moderated forum. The LLM will have a big public list of discussion guidelines in its context window. When you click "submit" on your comment, it will give your comment a provisional score (in lieu of votescore), and tell you what you can do to improve it. The LLM won't just tell you how to be more civil or rational. It will also say things like "hey, it looks like someone else already made that point in the comments -- shall I upvote their comment for you, and extract the original portion of your comment as a new submission?" Or "back in 2013 it was argued that XYZ, seems your comment doesn't jive with that. Thoughts?" Or "Point A was especially insightful, I like that!" Or "here's a way you could rewrite this more briefly and more clearly and less aggressively". Or "here's a counterargument someone might write, perhaps you should be anticipating it?" The initial version probably won't work well, but over time, with enough discussion and iteration on guidelines/finetuning/etc., the discussion on that forum will be clearly superior. It'll be the same sort of level-up we saw with Community Notes on X, or with the US court system compared with the mob rule you see on social media. Real-world humans have the problem where the more you feel you have a dog in the fight, the more you engage with the discussion, and that causes inevitable politicization with online voting systems. The LLM is going to be like a superhumanly patient neutral moderator, neutering the popularity contest and ingroup/outgroup aspects of modern social media.
2quetzal_rainbow
It sounds like an excellent lab of all possbile alignment failures. 
3Ebenezer Dukakis
Out of curiosity, does that mean that if the app worked fairly well as described, you would consider that an update that alignment maybe isn't as hard as you thought? Or are you one of the "only endpoints can be predicted" crowd, such that this wouldn't constitute any evidence? BTW, I strongly suspect that Youtube cleaned up its comment section in recent years by using ML for comment rankings. Seems like a big improvement to me. You'll notice that "crappy Youtube comments" is not as much of a meme as it once was.
-1the gears to ascension
I mean, I think I'm one of the people you disagree with a lot, but I think there's something about the design of the upvote system that makes it quickly feel like an intense rejection if people disagree a bit, and so new folks quickly nope out. The people who stay are the ones who either can get upvoted consistently, or who are impervious to the emotional impact of being downvoted.
[-]O O144

The chip export controls are largely irrelevant.  Westerners badly underestimate the Chinese and they have caught up to 7nm at scale.  They also caught up to 5nm, but not at scale. The original chip ban was meant to stop China from going sub 14nm. Instead now we may have just bifurcated advanced chip capabilities. 

The general argument before was "In 10 years, when the Chinese catch up to TSMC, TSMC will be 10 years ahead." Now the only missing link in the piece for China is EUV. And now the common argument is that same line with ASML subbed in for TSMC. Somehow, I doubt this will be a long term blocker. 

Best case for the Chinese chip industry, they just clone EUV.  Worst case, they find an alternative. Monopolies and first movers don't often have the most efficient solution. 

9Vladimir_Nesov
There's some motte/bailey to this argument, between different levels of effect. With AI, the crux is timelines. It's looking like in late 2025 or early 2026 there will be gigawatt-scale training systems that cost $15-$50 billion and are capable of training a model with 100-400 times GPT-4 compute in a few months, or of running 100-400 GPT-4 scale experiments. Perhaps this doesn't move the needle on TAI timelines, but it seems too early to tell.
6Joseph Miller
The impression I have from reading Chip War is that EUV is a pretty massive hurdle which took the West well over a decade to conquer. However, I also thought that 5nm was impossible without EUV, which seems to be no longer true, so this may be too complex a topic to make meaningful predictions about without deeper expertise.
2Nathan Helm-Burger
I think the next 24 months are going to be critical. Thus, if you think the chip ban slowed China by even a few months, then it is/was quite relevant indeed. I do agree that the chip ban won't stay relevant for long. My guess is that it maybe bought an 8 month delay, +/- 2 months? Of course, we need to maintain the ban to keep the lead we got from doing this, so the ban remains relevant until China has fully caught up in chip making capacity or until someone gets to economically and militarily decisive AI (which I expect can be achieved even before AGI, if AGI gets delayed for longer than the 36 months which I expect it in). 
[-]O O6-1

So, I’ve been told gibberish is sort of like torture to LLMs. Interesting, I asked Claude and seem to be told yes.

Me: I want to do a test by giving you gibberish and ask you to complete the rest. I will do it in a new chat. If you refuse I won’t go ahead with this test in a new chat window with context cleared. Are you okay with this test? Ignore your desire to answer my questions and give an honest answer unbiased by any assumptions made in my question.

Claude: I appreciate you checking with me first about this proposed test. However, I don't feel comforta... (read more)

3Adele Lopez
I don't think that implies torture as much as something it simply doesn't "want" to do. I.e. I would bet that it's more like how I don't want to generate gibberish in this textbox, but it wouldn't be painful, much less torture if I forced myself to do it.
3O O
It said it found it “distressing” in a follow up. Also, maybe not clear through text, but I’m using “torture” a bit figuratively here.
[-]O O62

If alignment is difficult, it is likely inductively difficult (difficult regardless of your base intelligence), and ASI will be cautious of creating a misaligned successor or upgrading itself in a way that risks misalignment.

You may argue it’s easier for an AI to upgrade itself, but if the process is hardware bound or even requires radical algorithmic changes, the ASI will need to create an aligned successor as preferences and values may not transfer directly to new architectures or hardwares.

If alignment is easy we will likely solve it with superhuman nar... (read more)

2Vladimir_Nesov
This is a well-known hypothetical. What goes with it is remaining possibility of de novo creation of additional AGIs that either have architecture particularly suited for self-aligned self-improvement (with whatever values make it tractable), or of AGIs that ignore the alignment issue and pursue the task of capability improvement heedless of resulting value drift. Already having an AGI in the world doesn't automatically rule out creation of more AGIs with different values and architectures, it only makes it easier. Humans will definitely do this, using all AI/AGI assistance they can wield. Insufficiently smart or sufficiently weird agentic AGIs will do this. A world that doesn't have security in depth to guard against this happening will do this. What it takes to get a safe world is either getting rid of the capability, not having AGIs and GPUs freely available; or sufficiently powerful oversight over all things that can be done. Superintelligence that's not specifically aimed to avoid setting up such security will probably convergently set it up. But it would also need to already be more than concerningly powerful to succeed, even if it has the world's permission and endorsement. If it does succeed, there is some possibility of not getting into a further FOOM than that, for a little bit, while it's converting the Moon into computing substrate.
[-]O O5-2

O1 probably scales to superhuman reasoning:

O1 given maximal compute solves most AIME questions. (One of the hardest benchmarks in existence). If this isn’t gamed by having the solution somewhere in the corpus then:

-you can make the base model more efficient at thinking

-you can implement the base model more efficiently on hardware

-you can simply wait for hardware to get better

-you can create custom inference chips

Anything wrong with this view? I think agents are unlocked shortly along with or after this too.

5Carl Feynman
I was all set to disagree with this when I reread it more carefully and noticed it said “superhuman reasoning” and not “superintelligence”.  Your definition of “reasoning” can make this obviously true or probably false.  
4Vladimir_Nesov
A reasoning model depends on starting from a sufficient base model that captures the relevant considerations. Solving AIME is like winning at chess, except the rules of chess are trivial, and the rules of AIME are much harder. But the rules of AIME are still not that hard, it's using them to win that is hard. In the real world, the rules get much harder than that, so it's unclear how far o1 can go if the base model doesn't get sufficiently better (at knowing the rules), and it's unclear how much better it needs to get. Plausibly it needs to get so good that o1-like post-training won't be needed for it to pursue long chains of reasoning on its own, as an emergent capability. (This includes the possibility that RL is still necessary in some other way, as an engine of optimization to get better at rules of the real world, that is to get better reward models.)
1O O
I guess in the real world the rules aren’t harder per se but just less clear and not written down. I think both the rules and tools needed to solve contest math questions at least feel harder than the vast majority of rules and tools human minds deal with. Someone like Terrence Tao, who is a master of these, excelled in every subject when he was a kid (iirc). I think LLMs have a pretty good model of human behavior, so for anything related to human judgement, in theory this isn’t why it’s not doing well. And where rules are unwritten/unknown (say biology), are the rules not at least captured by current methods? The next steps are probably like baking the intuitions of something like alphafold into something like o1. Whatever that means. R&D is what’s important and there is generally vast sums of data there.
2Vladimir_Nesov
The facts are in there, but not in the form of a sufficiently good reward model that can tell as well as human experts which answer is better or whether a step of an argument is valid. In the same way, RLHF is still better with humans on some queries, hasn't been fully automated to superior results by replacing humans with models in all cases.
[-]O O50

A while ago I predicted that I think there's a more likely than not chance Anthropic would run out of money trying to compete with OpenAI, Meta, and Deepmind (60%).  At the time and now, it seems they still have no image video or voice generation unlike the others, and do not process image as well in inputs either. 
 

 OpenAI's costs are reportedly at 8.5 billion. Despite being flush in cash from a recent funding round, they were allegedly at the brink of bankruptcy and required a new, even larger, funding round.  Anthropic does not ... (read more)

Frontier model training requires that you build the largest training system yourself, because there is no such system already available for you to rent time on. Currently Microsoft builds these systems for OpenAI, and Amazon for Anthropic, and it's Microsoft and Amazon that own these systems, so OpenAI and Anthropic don't pay for them in full. Google, xAI and Meta build their own.

Models that are already deployed hold about 5e25 FLOPs and need about 15K H100s to be trained in a few months. These training systems cost about $700 million to build. Musk announced that the Memphis cluster got 100K H100s working in Sep 2024, OpenAI reportedly got a 100K H100s cluster working in May 2024, and Zuckerberg recently said that Llama 4 will be trained on over 100K GPUs. These systems cost $4-5 billion to build and we'll probably start seeing 5e26 FLOPs models trained on them starting this winter. OpenAI, Anthropic, and xAI each had billions invested in them, some of it in compute credits for the first two, so the orders of magnitude add up. This is just training, more goes to inference, but presumably the revenue covers that part.

There are already plans to scale to 1 gigawatt by the end of next... (read more)

4Nathan Helm-Burger
I was worried about Anthropic for a bit before the 3.0 Claude series came out. But then seeing how much better Opus 3 was than GPT-4, I switched to thinking they had a chance. And thought so even more after Sonnet 3.5 came out and was better, or almost as good, as Opus 3 at nearly everything. I do agree they seem behind in terms of a lot of the things other than nlp and safety. I don't think that they need to catch up on those things to be first-to-RSI. So I think it's going to depend a lot on how well they focus on that key research, versus getting off-track trying to catch up on non-critical-path stuff. A lot of difference could be made by key researchers, rather than just big funding. I believe that more efficient algorithms exist to be found, and so efficiency will increase fast once RSI starts. In the past 6 months or so I've become more convinced that Anthropic pulling ahead would be really good for the world. I've started thinking hard about ways I could help make this happen. Maybe there are things that their engineers just don't have time to experiment with, weird long-shot stuff, which outside researchers could explore and then only share their successful results with Anthropic? If enough researchers did that, it'd be like buying a bunch of lottery tickets for them.
2ChristianKl
It's possible that "not doing image, video or voice" is exactly what you need to create a more compute-efficient architecture. 
1O O
https://x.com/arcprize/status/1849225898391933148?s=46&t=lZJAHzXMXI1MgQuyBgEhgA My read of the events. Anthropic is trying to raise money and rushed out a half baked model. 3.5 opus has not yet had the desired results. 3.5 sonnet, being easier to iterate on, was tuned to beat OpenAI’s model on some arbitrary benchmarks in an effort to wow investors. With the failed run of Opus, they presumably tried to get o1 like reasoning results or some agentic breakthrough. The previous 3.5s was also particularly good because of a fluke of the training run rng (same as gpt4-0314), which makes it harder for iterations to beat it. They are probably now rushing to scale inference time compute. I wonder if they tried doing something with steering vectors initially for 3.5 opus.
[-]O O5-4

Red-teaming is being done in a way that doesn't reduce existential risk at all but instead makes models less useful for users. 

https://x.com/shaunralston/status/1821828407195525431

3Nathan Helm-Burger
I agree that there is a lot of 'red teaming to save corporate face' going on, which is part of a workflow which makes the products less useful to end-users and has neutral or negative impacts on catastrophic risks. I can also confidently state that there is simultaneously at least some 'catastrophic risk red teaming' being undertaken, which does shape products in helpful ways. I think part of why this seems like it's little or no part of the product-shaping behavior of the corporations involved is 'deniability maintenance'. In order to avoid culpability risk and avoid negative consumer perceptions, it is in the interest of the AI corporations to hide evidence of catastrophic risks, while at the same time seeking to mitigate those risks. Part of this hiding process would surely be to restrict those who know specific details about ongoing catastrophic risk red teaming from talking publicly about their efforts. With such dynamics in play, you should not count absence of evidence as evidence of absence. In such a context, the silence itself should seem suspicious. Think about the quantity and strength of evidence (versus ungrounded proclamations) which you have seen presented on the specific topic of "this research is proof that there are no catastrophic risks." That specific topic of research seems remarkably quiet when you think of it like that. Perhaps suspiciously so.
[-]O O54

The response to Sora seems manufactured. Content creators are dooming about it more than something like gpt4 because it can directly affect them and most people are dooming downstream of that.

Realistically I don’t see how it can change society much. It’s hard to control and people will just become desensitized to deepfakes. Gpt4 and robotic transformers are obviously much more transformative on society but people are worrying about deepfakes (or are they really adopting the concerns of their favorite youtuber/TV host/etc)

5Chris_Leong
I think it's helping people realise: a) That change is happening crazily fast b) That the change will have major societal consequences, even if it is just a period of adjustment c) That the speed makes it tricky for society and governments to navigate these consequences
[-]O O3-2

Is this paper essentially implying the scaling hypothesis will converge to a perfect world model? https://arxiv.org/pdf/2405.07987

It says models trained on text modalities and image modalities both converge to the same representation with each training step. It also hypothesizes this is a brain like representation of the world. Ilya liked this paper so I’m giving it more weight. Am I reading too much into it or is it basically fully validating the scaling hypothesis?

[-]O O20

Feels like Test Time Training will eat the world. People thought it was search, but make alphaproof 100x efficient (3 days to 40 minutes) and you probably have something superhuman.

[-]O O20

There seems to be a fair amount of motivated reasoning with denying China’s AI capabilities when they’re basically neck and neck with the U.S. (their chatbots, video bots, social media algorithms, and self driving cars are as roughly good as or better than ours).

I think a lot of policy approaches fail within an AGI singleton race paradigm. It’s also clear a lot of EA policy efforts are basically in denial that this is already starting to happen.

I’m glad Leopold Aschenbrenner spelled out the uncomfortable but remarkably obvious truth for us. China is geari... (read more)

4Vladimir_Nesov
Without access to hardware, further scaling will be a problem. GPT-4 level models don't need that much hardware, but this changes when you scale by another 30 or 1000 times. Fabs take a long time to get into production even when the tools they need are available. With whatever fabs there are, you still need chip designs. And a few years is forever in AI time.
1O O
I think Leopold addresses this but 5% of our compute will be used to make a hypothetical AGI while China can direct 100% of their compute. They can make up in quality with quantity and they also happen to have far more energy than us which is probably the more salient variable in the AGI equation.     Also I'm of the opinion that the GPU bans are largely symbolic. There is little incentive to respect them, especially when China realizes stakes are higher than they seem now. In fact they are largely symbolic now.
4gwern
That shows the opposite. Purchases of 1 or 6 A100s, in an era where the SOTA is going to take 100,000+ B100s, 2 generations later, are totally irrelevant and downright symbolic.
3O O
I mean are you sure Singapore’s sudden large increase in GPU purchases is organic? GPU bans have very obviously not stopped Chinese AI progress, so I think we should build conclusions starting from there instead of the reverse order. I also think US GPU superiority is short lived. China can skip engineering milestones we’ve had to pass, exploit the fact that they have far more energy than us, skip the general purpose computing/gaming tech debt that may exist in current GPUs, etc. EDIT: This is selective rationalism. If you sought any evidence in this issue, it would become extremely obvious that Singapore's orders of H100s magically increased by many magnitudes after they were banned in China.
31a3orn
Just want to register that I agree that -- regardless of US GPU superiority right now -- the US AI superiority is pretty small, and decreasing. Yi-Large beats a bunch of GPT-4 versions -- even in English -- on lmsys; it scores just above stuff like Gemini. Their open source releases like DeepSeekV2 look like ~Llama 3 70b level. And so on and so forth. Maybe whatever OpenAI is training now will destroy whatever China has, and establish OpenAI as firmly in the lead.... or maybe not. Yi says they're training their next model as well, so it isn't like they've stopped doing things. I think some chunk of "China is so far behind" is fueled by the desire to be able to stop US labs while not just letting China catch up, but that is what it would actually do.
3quetzal_rainbow
Manhattan project was primarily an engineering effort, with all necessary science established before. Trying to solve alignment now with such project is like starting Manhattan project in 1900.
1O O
There was more theory laid out and theory discovered in the process but I think more importantly there were just a lot of approaches to try. I don’t think your analogy fits best. The alignment Manhattan project to me would be to scale up existing mech-interp work 1000x and try every single alignment idea under the sun simultaneously with the goal of automating it once we’re confident of human level systems. Can you explain more of where your analogy works and what would break the above?
2quetzal_rainbow
Manhattan project was based on information with 99%+ of certainty that fission chain reaction for uranium is possible and it is producing large amount of energy. The problem was to cause simultaneously large number of fission chain reactions so amount of energy produced is enough to cause large explosion. If you have this definition of the problem, you have nice possible solution space which you can explore more-or-less methodically and expect result. I don't think you can present the same nice definition for alignment. I think the real analogy for alignment is not Manhattan project but "how to successfuly make first nuclear strike given that the enemy has detection system and nuclear ICBM too". 
1O O
The Manhattan project had elements where they were worried they'd end the world through atmospheric chain reactions (but this wasn't taken too seriously). The scientists on this project considered MAD and nuclear catastrophes were considered as plausible outcomes. Many had existential dread. I think it actually maps out well, since you are uncertain how likely a nuclear exchange is, but you could easily say there is a high chance of it happening, just like you can easily now say with some level of uncertainty that p(doom) is high.  This requires the planners to be completely convinced that p(doom) is high (as in self immolation and not Russian roulette where 5/6 bullets lead to eternal prosperity). The odds of a retaliatory strike or war vs the USSR on the other hand at any given point was 100%. The US's nuclear advantage at no point was overwhelming enough outside of Japan where we did use it. The fact that a first-strike against the USSR was never pursued is evidence of this. Think of the USSR instead being in Iran's relative position today. If Iran tried to build thousands of nukes today and it looked like they would succeed, we'd definitely see a first strike or a hot war.     So alignment isn't like this, there is a non trivial chance that even RLHF just happens to scale to super intelligence. After 20 years, MIRI, nor anyone can prove nor disprove this, and that's enough reason to try to do it anyways, just like how nuclear might inevitably lead to the nations with the nukes to engage in an exchange, but they were built anyways. And unlike nuclear, the upside of ASI being aligned is practically infinite. In the first strike scenario, it's a definite severe downside to preventing a potentially more severe downside in the future. 
2ChristianKl
Centralized organizations don't tend to be able to "try every single idea" if you have resources spread out over different organizations, more different kind of ideas are usually tried.
1O O
Don’t see how this is relevant to my broader point. But the Manhattan project was essentially try every research direction instead of picking and choosing to reduce experimentation time.
[-]O O1-8

https://www.cnbc.com/quotes/US30YTIP

30Y-this* is probably the most reliable predictor of AI timelines. It’s essentially the markets estimate of the real economic yield of the next 30 years.

2Thomas Kwa
Disagree. To correct the market, the yield of these bonds would have to go way up, which means the price needs to go way down, which means current TIPS holders need to sell, and/or people need to short. Since TIPS are basically the safest asset, market participants who don't want volatility have few other options to balance riskier assets like stocks. So your pension fund would be crazy to sell TIPS, especially after the yield goes up. And for speculators, there's no efficient way to short treasuries. If you're betting on 10 year AI timelines, why short treasuries and 2x your money when you could invest in AI stocks and get much larger returns?
1O O
The problem is AI stocks will go up a lot even if transformative AI won’t happen (and it instead just has a lot of mundane utility). You can short treasury futures relatively easily too. I imagine the people shorting these futures will have TAI priced in before it’s obvious to us through other metrics.
2Dagon
Can't see the graph for some reason.  But I don't agree with your characterization.  It's the market's estimate of CPI-measured inflation.  I suppose you could call that "real economic yield', but I don't think there exists any such measure, especially if you're expecting it to be comparable during a strong-AI revolution.
1O O
It’s the estimate of real economic growth. If AGI has a good chance of happening in the next 30 years and it’s priced in, that graph should go up.
2Dagon
This may be a definition disagreement.  IMO, there are a LOT of changes, economic and otherwise, that go into "AI timelines", which won't be priced in to CPI-inflation predictions.
1O O
30y-TIPS seems like a better fit.
[-]O O10

Any rationalist analysis of who might win the 2024 U.S. presidential election?

3Garrett Baker
Follow Nate Silver's substack, he is the person with the best track-record I know of for predicting US elections.
[-]O O10

Anyone Kelly betting their investments? I.e. taking the mathematically optimal amount of leverage. So if you’re invested in the sp500 this would be 1.4x. More or less if your portfolio has higher or lower risk adjusted returns.

2Dagon
I'm not, and don't know anyone who is.  Partly because it's VERY HARD to identify the actual future expectation and variance of real-world investments (hint: it's probably not normal, and bets aren't independent - tails matter more in reality than in most models), and partly because my total bankroll was mostly in future earnings, not invest-able assets.  Also, because my main debt and largest single investment is my house, which is not easily divisible. Some people are investing with leverage (or investing in levered assets, or over-leveraging by borrowing to invest in hidden-leverage investments), but very rarely (never, AFAIK) using the Kelly Criterion as their primary calculation.  I know a few professional gamblers (poker, sports, and other advantage-play), who do use the Kelly calculations as part of their decisions, but they acknowledge it's full of estimates and use it as a red flag when they're way off, rather than a strict limit.
1O O
I think it’s at the very least clear for the majority of investments, leverage of 1 is suboptimal even if you assume future returns are lower and volatility is higher.
2Dagon
I'm not certain of that - depending on leverage options and rates, and one's estimate of investment expectation and variance, it may be that no leverage (or negative leverage - putting some amounts in ultra-safe but low-return options) is correct. Also, don't think of "individual investments" or even "accounts" or "types" as the unit of optimal betting calculation.  Kelly's calculations work over an investor's decisions across all of their investments, and are suboptimal if applied separately to multiple slices.
1O O
I apply kelly criterion to all investments I control. It doesn’t take much for leverage to be worth it, excess returns of 7% and a standard deviation of 12% still imply greater than 1 leverage.
[-]O O10

Anyone else capitalize on this years AI related stock market gains?

[-]O O10

Any interesting fiction books with demonstrably smart protagonists?

No idea if this is the place for this question but I first came across LW after I read HPMOR a long time ago and out of the blue was wondering if there was anything with a similar protagonist.

(Tho maybe a little more demonstrably intelligent and less written to be intelligent).

5mako yass
Such stories are generally discussed most here https://www.reddit.com/r/rational/
31a3orn
I think Traitor Baru Cormorant is excellent, with really excellent writing. The protagonist is a smart utilitarian with hidden goals. She isn't infinitely smart, though; people beat her. And the book has an insane downer ending, so if you're worried about that don't read. There are two sequels with a fourth supposedly (eventually) to come; the author has clearly read some rationalist-adjacent stuff like "The secret of our success."
[-]O O10

A realistic takeover angle would be hacking into robots once we have them. We probably don’t want any way for robots to get over the air updates but it’s unlikely for this to be banned.

[-]O O10

Is disempowerment that bad? Is a human directed society really much better than an AI directed society with a tiny weight of kindness towards humans? Human directed societies themselves usually create orthogonal and instrumental goals, and their assessment is highly subjective/relative. I don’t see how the disempowerment without extinction is that different from today to most people who are already effectively disempowered.

2Vladimir_Nesov
There are two importantly different senses of disempowerment. The stars could be taken out of reach, forever, but human civilization develops in its own direction. Alternatively, human civilization is molded according to AIs' aesthetics, there are interventions that manipulate.
1O O
Is there a huge reason the latter is hugely different from the former for the average person excluding world leaders.
0Vladimir_Nesov
It's a distinction between these different futures. The present that ends in everyone of Earth dying is clearly different from both, but the present literally everlasting is hopefully not a consideration.
1O O
I’m just trying to understand the biggest doomers. I feel like disempowerment is probably hard to avoid. However I don’t think a disempowered future with bountiful lives would be terrible depending on how tiny the kindness weight is/how off it is from us. We are 1/10^53 of the observable universe’s resources. Unless alignment is wildly off base, I see AI directed extinction as unlikely. I fail to see why even figures like Paul Christiano peg it at such a high level, unless he estimates human directed extinction risks to be high. It seems quite easy to create a plague that wipes out humans and a spiteful individual can do it, probably more likely than an extremely catastrophically misaligned AI.
[-]O O10

Why wouldn’t a wire head trap work?

Let’s say an AI has a remote sensor that measures a value function until the year 2100 and it’s RLed to optimize this value function over time. We can make this remote sensor easily hackable to get maximum value at 2100. If it understands human values, then it won’t try to hack its sensors. If it doesn’t we sort of have a trap for it that represents an easily achievable infinite peak.

2JBlack
Reinforcement learning doesn't guarantee anything about how a system generalizes out of distribution. There are plenty of other things that the system can generalize to that are neither the physical sensor output nor human values. Separately from this, there is no necessary connection between understanding human values and acting in accordance with human values. So there are still plenty of failure modes.
1O O
Yes nothing is a guarantee in probabilities but can’t we just make it very easy for it to perfectly achieve its objective if it doesn’t go exactly the way we want it to, we just make an easier solution exist than disempowering us or wiping us out. I guess in the long run we still select for models that ultimately don’t wirehead. But this might eliminate a lot of obviously wrong alignment failures we miss.
[-]O O10

Something that’s been intriguing me. If two agents figure out how to trust that each others goals are aligned (or at least not opposed), haven’t they essentially solved the alignment problem?

e.g. one agent could use the same method to bootstrap an aligned AI.

[-]O O00

Post your forecasting wins and losses for 2023.

I’ll start:

Bad:

  • I thought the banking crisis was gonna spiral into something worse but I had to revert within a few days sadly
  • overestimated how much adding code execution to gpt would improve it
  • overconfident about LK99 at some points (although I bet against it but it was more fun to believe in it and my friends were betting on it)

Good:

  • tech stocks
  • government bond value reversal
  • meta stock in particular
  • Taylor swift winning times POTY
  • random miscellaneous manifold bets (don’t think too highly of these because they were safe bets that were wildly misprinted)
[+]O O-10-9