All of mesaoptimizer's Comments + Replies

Even if I'd agree with your conclusion, your argument seems quite incorrect to me.

the seeming lack of reliable feedback loops that give you some indication that you are pushing towards something practically useful in the end instead of just a bunch of cool math that nonetheless resides alone in its separate magisterium

That's what math always is. The applicability of any math depends on how well the mathematical models reflect the situation involved.

would build on that to say that for every powerfully predictive, but lossy and reductive mathematical m

... (read more)

Yes, he is doing something, but he is optimizing for signal rather than the true thing. Becoming a drug addict, developing schizophrenia, killing yourself—those are all costly signals of engaging with the abyss.

What? Michael Vassar has (AFAIK from Zack M. Davis' descriptions) not taken drugs or promoted becoming a drug addict or "killing yourself". If you hear his Spencer interview, you'll notice that he seems very sane and erudite, and clearly does not give off the unhinged 'Nick Land' vibe that you seem to be claiming that he has or he promotes.

You ar... (read more)

2Raemon
(I have not engaged with this thread deeply) I've talked to Michael Vassar many times in person. I'm somewhat confident he has taken LSD based on him saying so (although if this turned out wrong I wouldn't be too surprised, my memory is hazy) I definitely have the experiencing of him saying lots of things that sound very confusing and crazy, making pretty outlandish brainstormy-style claims that are maybe interesting, which he claims to take as literally true, that seem either false, or, at least require a lot of inferential gap. I have also heard him make a lot of morally charged, intense statements that didn't seem clearly supported. (I do think I have valued talking to Michael, despite this, he is one of the people who helped unstick me in certain ways, but, the mechanism by which he helped me was definitely via being kinda unhinged sounding.) 
mesaoptimizer*28-10

As of right now, I expect we have at least a decade, perhaps two, until we get a human intelligence level generalizing AI (which is what I consider AGI). This is a controversial statement in these social circles, and I don't have the bandwidth or resources to write a concrete and detailed argument, so I'll simply state an overview here.

... (read more)
3Kaarel
you say "Human ingenuity is irrelevant. Lots of people believe they know the one last piece of the puzzle to get AGI, but I increasingly expect the missing pieces to be too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments." and you link https://tsvibt.blogspot.com/2024/04/koan-divining-alien-datastructures-from.html for "too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments" i feel like that post and that statement are in contradiction/tension or at best orthogonal
8Vladimir_Nesov
The current scaling speed is created by increasing funding for training projects, which isn't sustainable without continued success. Without this, the speed goes down to the much slower FLOP/dollar trend of improving cost efficiency of compute, making better AI accelerators. The 2 + 4 + 8 years thing might describe gradual increase in funding, but there are still 2 OOMs of training compute beyond original GPT-4 that are already baked-in in the scale of the datacenters that are being built and didn't yet produce deployed models. We'll only observe this in full by late 2026, so the current capabilities don't yet match the capabilities before a possible scaling slowdown.

IDK how to understand your comment as referring to mine.

I'm familiar with how Eliezer uses the term. I was more pointing to the move of saying something like "You are [slipping sideways out of reality], and this is bad! Stop it!" I don't think this usually results in the person, especially confused people, reflecting and trying to be more skilled at epistemology and communication.

In fact, there's a loopy thing here where you expect someone who is 'slipping sideways out of reality' to caveat their communications with an explicit disclaimer that admits th... (read more)

2TsviBT
Excuse me, none of that is in my comment.

I think James was implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose. I agree that he could have made it clearer, but I think he's made it clear enough given the following line:

I suspect that even if we have a bunch of good agent foundations research getting done, the result is that we just blast ahead with methods that are many times easier because they lean on slow takeoff, and if takeoff is slow we’re probably fine if it’s fast we die.

And as for your last sentence:

If you don’t, you’re spra

... (read more)
1james.lucassen
I'm not sure exactly what mesa is saying here, but insofar as "implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose" means "intending to communicate from a position of uncertainty about takeoff speeds" I think he has me right. I do think mesa is familiar enough with how I talk that the fact he found this unclear suggests it was my mistake. Good to know for future.
2TsviBT
IDK how to understand your comment as referring to mine. To clarify the "slipping sideways" thing, I'm alluding to "stepping sideways" described in Q2 here: https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy#Q2___I_have_a_clever_scheme_for_saving_the_world___I_should_act_as_if_I_believe_it_will_work_and_save_everyone__right__even_if_there_s_arguments_that_it_s_almost_certainly_misguided_and_doomed___Because_if_those_arguments_are_correct_and_my_scheme_can_t_work__we_re_all_dead_anyways__right_ and from https://www.lesswrong.com/posts/m6dLwGbAGtAYMHsda/epistemic-slipperiness-1#Subtly_Bad_Jokes_and_Slipping_Sideways

Seems like most people believe (implicitly or explicitly) that empirical research is the only feasible path forward to building a somewhat aligned generally intelligent AI scientist. This is an underspecified claim, and given certain fully-specified instances of it, I'd agree.

But this belief leads to the following reasoning: (1) if we don't eat all this free energy in the form of researchers+compute+funding, someone else will; (2) other people are clearly less trustworthy compared to us (Anthropic, in this hypothetical); (3) let's do whatever it takes to m... (read more)

TsviBT1614

most people believe (implicitly or explicitly) that empirical research is the only feasible path forward to building a somewhat aligned generally intelligent AI scientist.

I don't credit that they believe that. And, I don't credit that you believe that they believe that. What did they do, to truly test their belief--such that it could have been changed? For most of them the answer is "basically nothing". Such a "belief" is not a belief (though it may be an investment, if that's what you mean). What did you do to truly test that they truly tested their be... (read more)

I recommend messaging people who seem to have experience doing so, and requesting to get on a call with them. I haven't found any useful online content related to this, and everything I've learned in relation to social skills and working with neurodivergent people, I learned by failing and debugging my failures.

1yanni kyriacos
Thanks for the feedback! I had a feeling this is where I'd land :|

I hope you've at least throttled them or IP blocked them temporarily for being annoying. It is not that difficult to scrape a website while respecting its bandwidth and CPU limitations.

7habryka
We complained to them and it's been better in recent months. We didn't want to block them because I do actually want LW to be part of the training set.

Yeah I think yours has achieved my goal -- a post to discuss this specific research advance. Please don't delete your post -- I'll move mine back to drafts.

I searched for it and found none. The twitter conversation also seems to imply that there has not been a paper / technical report out yet.

Based on your link, it seems like nobody even submitted anything to the contest throughout the time it existed. Is that correct?

2Vanessa Kosoy
There was exactly one submission, which was judged insufficient to merit the prize.

I expect that Ryan means to say one of the these things:

  1. There isn't enough funding for MATS grads to do useful work in the research directions they are working on, that have already been vouched for by senior alignment researchers (especially their mentors) to be valuable. (Potential examples: infrabayesianism)
  2. There isn't (yet) institutional infrastructure to support MATS grads to do useful work together as part of a team focused on the same (or very similar) research agendas, and that this is the case for multiple nascent and established research agend
... (read more)
2Ryan Kidd
@Elizabeth, Mesa nails it above. I would also add that I am conceptualizing impactful AI safety research as the product of multiple reagents, including talent, ideas, infrastructure, and funding. In my bullet point, I was pointing to an abundance of talent and ideas relative to infrastructure and funding. I'm still mostly working on talent development at MATS, but I'm also helping with infrastructure and funding (e.g., founding LISA, advising Catalyze Impact, regranting via Manifund) and I want to do much more for these limiting reagents.

I've become somewhat pessimistic about encouraging regulatory power over AI development recently after reading this Bismarck Analysis case study on the level of influence (or lack of it) that scientists had over nuclear policy.

The impression I got from some other secondary/tertiary sources (specifically the book Organizing Genius) was that General Groves, the military man who was the interface between the military and Oppenheimer and the Manhattan Project, did his best to shield the Manhattan Project scientists from military and bureaucratic drudgery, and ... (read more)

4jbash
I'm not actually seeing where deep expertise on nuclear weapons technology would qualify anybody to have much special input into nuclear weapons policy in general. There just don't seem to be that many technical issues compared to the vast number of political ones. I don't know if that applies to AI, but tend to think the two are different.
1DPiepgrass
I don't feel like this is actually a counterargument? You could agree with both arguments, concluding that we shouldn't work for OpenAI but a outfit better-aligned to your values is okay.
2David Hornbein
I agree with your argument here, especially your penultimate paragraph, but I'll nitpick that framing your disagreements with Groves as him being "less of a value add" seems wrong. The value that Groves added was building the bomb, not setting diplomatic policy.

I’m optimizing for consistently writing and publishing posts.

I agree with this strategy, and I plan to begin something similar soon. I forgot that Epistemological Fascinations is your less polished and more "optimized for fun and sustainability" substack. (I have both your substacks in my feed reader.)

2adamShimi
No worries. ;)

I really appreciate this essay. I also think that most of it consists of sazens. When I read your essay, I find my mind bubbling up concrete examples of experiences I've had, that confirm or contradict your claims. This is, of course, what I believe is expected from graduate students when they are studying theoretical computer science or mathematics courses -- they'd encounter an abstraction, and it is on them to build concrete examples in their mind to get a sense of what the paper or textbook is talking about.

However, when it comes to more inchoate domai... (read more)

4adamShimi
  I wholeheartedly agree. The reason why I didn't go for this more grounded and practical and teachable approach is that at the moment, I'm optimizing for consistently writing and publishing posts. Historically the way I fail at that is by trying too hard to write really good posts and make all the arguments super clean and concrete and detailed -- this leads to me dropping the piece after like a week of attempts. So instead, I'm going for "write what comes naturally, edit a bit to check typos and general coherence, and publish", which leads to much more abstract pieces (because that's how I naturally think). But reexploring this topic in an in-depth and detailed piece in the future, along the lines of what you describe, feels like an interesting challenge. Will keep it in mind. Thanks for the thoughtful comment!

GPT-4o can not reproduce the string, and instead just makes up plausible candidates. You love to see it.

Hmm. I assume you could fine-tune away an LLM from reproducing the string. Eliciting it would just become more difficult. Try posting canary text, and a part of the canary string, and see if GPT-4o completes it.

5niplav
Tried 8 times, it doesn't manage and still makes things up (given the first four/first six bytes of the canary GUID). But it really tries to, while Claude is talking about how it shouldn't've seen the string.

Anyone who has signed a non-disparagement agreement with Anthropic is free to state that fact (and we regret that some previous agreements were unclear on this point).

I'm curious as to why it took you (and therefore Anthropic) so long to make it common knowledge (or even public knowledge) that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.

The right time to reveal this was when the OpenAI non-disparagement news broke, not after Habryka connects the dots and builds social momentum for scrutiny of Anthropic.

habryka3529

that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.

I do want to be clear that a major issue is that Anthropic used non-disparagement agreements that were covered by non-disclosure agreements. I think that's an additionally much more insidious thing to do, that contributed substantially to the harm caused by the OpenAI agreements, and I think is important fact to include here (and also makes the two situations even more analogous).

Project proposal: EpochAI for compute oversight

Detailed MVP description: website with an interactive map that shows locations of high risk data centers globally, with relevant information appearing when you click on the icons on the map. Examples of relevant information: organizations and frontier labs that have access to this compute, the effective FLOPS of the data center, what time would it take to train a SOTA model in that datacenter).

High risk datacenters are datacenters that are capable of training current or next generation SOTA AI systems.

Why:

  1. I'
... (read more)
2Vladimir_Nesov
Collections of datacenter campuses sufficiently connected by appropriate fiber optic probably should count as one entity for purposes of estimating training potential, even in the current synchronous training paradigm. My impression is that laying such fiber optic is both significantly easier and significantly cheaper than building power plants or setting up power transmission over long distances in the multi-GW range. Thus for training 3M GPUs/6GW scale models ($100 billion in infrastructure, $10 billion in cost of training time), hyperscalers "only" need to upgrade the equipment and arrange for "merely" on the order of 1GW in power consumption at multiple individual datacenter campuses connected to each other, while everyone else is completely out of luck. This hypothetical advantage makes collections of datacenter campuses an important unit of measurement, and also it would be nice to have a more informed refutation or confirmation that this is a real thing.
1tmeanen
Seems like a useful resource to have out there. Some other information that would be nice to have are details about the security of the data center - but there's probably limited information that could be included [1].  1. ^ Because you probably don't want too many details about your infosec protocols out there for the entire internet to see. 

Neuro-sama is a limited scaffolded agent that livestreams on Twitch, optimized for viewer engagement (so it speaks via TTS, it can play video games, etc.).

Well, at least a subset of the sequence focuses on this. I read the first two essays and was pessimistic of the titular approach enough that I moved on.

Here's a relevant quote from the first essay in the sequence:

Furthermore, most of our focus will be on ensuring that your model is attempting to predict the right thing. That’s a very important thing almost regardless of your model’s actual capability level. As a simple example, in the same way that you probably shouldn’t trust a human who was doing their best to mimic what a malign superintelligence woul

... (read more)

Evan Hubinger's Conditioning Predictive Models sequence describes this scenario in detail.

2Carl Feynman
In a great deal of detail, apparently, since it has a recommended reading time of 131 minutes.

There's generally a cost to managing people and onboarding newcomers, and I expect that offering to volunteer for free is usually a negative signal, since it implies that there's a lot more work than usual that would need to be done to onboard this particular newcomer.

Have you experienced otherwise? I'd love to hear some specifics as to why you feel this way.

I think we'll have bigger problems than just solving the alignment problem, if we have a global thermonuclear war that is impactful enough to not only break the compute supply and improvement trends, but also destabilize the economy and geopolitical situation enough that frontier labs aren't able to continue experimenting to find algorithmic improvements.

Agent foundations research seems robust to such supply chain issues, but I'd argue that gigantic parts of the (non-academic, non-DeepMind specific) conceptual alignment research ecosystem is extremely depe... (read more)

Thiel has historically expressed disbelief about AI doom, and has been more focused on trying to prevent civilizational decline. From my perspective, it is more likely that he'd fund an organization founded by people with accelerationist credentials, than by someone who was a part of a failed coup attempt that would look to him like it involved a sincere belief in an extreme difficulty of the alignment problem.

0Chris_Leong
Paywalled. Would be fantastic if someone with access could summarise the most important bits.

I'd love to read an elaboration of your perspective on this, with concrete examples, which avoids focusing on the usual things you disagree about (pivotal acts vs. pivotal processes, social facets of the game is important for us to track, etc.) and mainly focus on your thoughts on epistemology and rationality and how it deviates from what you consider the LW norm.

I started reading your meta-rationality sequence, but it ended after just two posts without going into details.

David Chapman's website seems like the standard reference for what the post-rationalists call "metarationality". (I haven't read much of it, but the little I read made me somewhat unenthusiastic about continuing).

Note that the current power differential between evals labs and frontier labs is such that I don't expect evals labs have the slack to simply state that a frontier model failed their evals.

You'd need regulation with serious teeth and competent 'bloodhound' regulators watching the space like a hawk, for such a possibility to occur.

I just encountered polyvagal theory and I share your enthusiasm for how useful it is for modeling other people and oneself.

1thinkstopeth
Seconded! Polyvagal from this post really helped me understand the power of how our physiology affects our social efforts. 

Note that I'm waiting for the entire sequence to be published before I read it (past the first post), so here's a heads up that I'm looking forward to seeing more of this sequence!

I think Twitter systematically underpromotes tweets with links external to the Twitter platform, so reposting isn't a viable strategy.

Thanks for the link. I believe I read it a while ago, but it is useful to reread it from my current perspective.

trying to ensure that AIs will be philosophically competent

I think such scenarios are plausible: I know some people argue that certain decision theory problems cannot be safely delegated to AI systems, but if we as humans can work on these problems safely, I expect that we could probably build systems that are about as safe (by crippling their ability to establish subjunctive dependence) but are also significantly more competent at philosophical progress than we are.

Leopold's interview with Dwarkesh is a very useful source of what's going on in his mind.

What happened to his concerns over safety, I wonder?

He doesn't believe in a 'sharp left turn', which means he doesn't consider general intelligence to be a discontinuous (latent) capability spike such that alignment becomes significantly more difficult after it occurs. To him, alignment is simply a somewhat harder empirical techniques problem like capabilities work is. I assume he imagines in behavior similar to current RLHF-ed models even as frontier labs have dou... (read more)

Oh, by that I meant something like "yeah I really think it is not a good idea to focus on an AI arms race". See also Slack matters more than any other outcome.

If Company A is 12 months from building Cthulhu, we fucked up upstream. Also, I don't understand why you'd want to play the AI arms race -- you have better options. They expect an AI arms race. Use other tactics. Get into their OODA loop.

Unsee the frontier lab.

4davekasten
...yes ? I think my scenario explicitly assumes that we've fucked up upstream in many, many ways. 

These are pretty sane takes (conditional on my model of Thomas Kwa of course), and I don't understand why people have downvoted this comment. Here's an attempt to unravel my thoughts and potential disagreements with your claims.

AGI that poses serious existential risks seems at least 6 years away, and safety work seems much more valuable at crunch time, such that I think more than half of most peoples’ impact will be more than 5 years away.

I think safety work gets less and less valuable at crunch time actually. I think you have this Paul Christiano-like... (read more)

4ryan_greenblatt
Sure, but you have to actually implement these alignment/control methods at some point? And likely these can't be (fully) implemented far in advance. I usually use the term "crunch time" in a way which includes the period where you scramble to implement in anticipation of the powerful AI. One (oversimplified) model is that there are two trends: * Implementation and research on alignment/control methods becomes easier because of AIs (as test subjects). * AIs automate away work on alignment/control. Eventually, the second trend implies that safety work is less valuable, but probably safety work has already massively gone up in value by this point. (Also, note that the default way of automating safety work will involve large amounts of human labor for supervision. Either due to issues with AIs or because of lack of trust in these AIs systems (e.g. human labor is needed for a control scheme.)

It seems like a significant amount of decision theory progress happened between 2006 and 2010, and since then progress has stalled.

... (read more)
4Chris_Leong
I think I've been (slowly) making progress. I think we would be able to make progress on this if people seriously wanted to make progress, but understandably it's not the highest priority.
6Wei Dai
Yeah it seems like a bunch of low hanging fruit was picked around that time, but that opened up a vista of new problems that are still out of reach. I wrote a post about this, which I don't know if you've seen or not. (This has been my experience with philosophical questions in general, that every seeming advance just opens up a vista of new harder problems. This is a major reason that I switched my attention to trying to ensure that AIs will be philosophically competent, instead of object-level philosophical questions.)

You are missing providing a ridiculous amount of context, but yes, if you are okay with leather footwear, Meermin provides great footwear at relatively inexpensive prices.

I still recommend thrift shopping instead. I spent 250 EUR on a pair of new noots from Meermin, and 50 EUR on a pair of thrifted boots which seem about 80% as aesthetically pleasing as the first (and just as comfortable since I tried them on before buying them).

-2Alok Singh
Skipping context sure saved me a lotta time, and plus you gave a nice elab shoe thrifting is meh for me because foot size What sort of boots?

It has been six months since I wrote this, and I want to note an update: I now grok what Valentine is trying to say and what he is pointing at in Here's the Exit and We're already in AI takeoff. That is, I have a detailed enough model of Valentine's model of the things he talks about, such that I understand the things he is saying.

I still don't feel like I understand Kensho. I get the pattern of the epistemic puzzle he is demonstrating, but I don't know if I get the object-level thing he points at. Based on a reread of the comments, maybe what Valentine me... (read more)

I've experimented with Claude Opus for simple Ada autoformalization test cases (specifically quicksort), and it seems like the sort of issues that make LLM agents infeasible (hallucination-based drift, subtle drift caused by sticking to certain implicit assumptions you made before) are also the issues that make Opus hard to use for autoformalization attempts.

I haven't experimented with a scaffolded LLM agent for autoformalization, but I expect it won't go very well either, primarily because scaffolding involves attempts to make human-like implicit high-lev... (read more)

2jacquesthibs
Great. Yeah, I also expect that it is hard to get current models to work well on this. However, I will mention that the DeepSeekMath model does seem to outperform GPT-4 despite having only 7B parameters. So, it may be possible to create a +70B fine-tune that basically destroys GPT-4 at math. The issue is whether it generalizes to the kind of math we'd commonly see in alignment research. Additionally, I expect at least a bit can be done with scaffolding, search, etc. I think the issue with many prompting methods atm is that they are specifically trying to get the model to arrive at solutions on their own. And what I mean by that is that they are starting from the frame of "how can we get LLMs to solve x math task on their own," instead of "how do we augment the researcher's ability to arrive at (better) proofs more efficiently using LLMs." So, I think there's room for product building that does not involve "can you solve this math question from scratch," though I see the value in getting that to work as well.

This is very interesting, thank you for posting this.

the therapeutic idea of systematically replacing the concept “should” with less normative framings

Interesting. I independently came up with this concept, downstream of thinking about moral cognition and parts work. Could you point me to any past literature that talks about this coherently enough that you would point people to it to understand this concept?

I know that Nate has written about this:

As far as I recall, reading these posts didn't help me.

Based on gwern's comment, steganography as a capability can arise (at rather rudimentary levels) via RLHF over multi-step problems (which is effectively most cognitive work, really), and this gets exacerbated with the proliferation of AI generated text that embeds its steganographic capabilities within it.

The following paragraph by gwern (from the same thread linked in the previous paragraph) basically summarizes my current thoughts on the feasibility of prevention of steganography for CoT supervision:

Inner-monologue approaches to safety, in the new skin

... (read more)

Well, if you know relevant theoretical CS and useful math, you don’t have to rebuild the mathematical scaffolding all by yourself.

I didn't intend to imply in my message that you have mathematical scaffolding that you are recreating, although I expect it may be likely (Pearlian causality perhaps? I've been looking into it recently and clearly knowing Bayes nets is very helpful). I specifically used "you" to imply that in general this is the case. I haven't looked very deep into the stuff you are doing, unfortunately -- it is on my to-do list.

I do think that systematic self-delusion seems useful in multi-agent environments (see the commitment races problem for an abstract argument, and Sarah Constantin's essay "Is Stupidity Strength?" for a more concrete argument.

I'm not certain that this is the optimal strategy we have for dealing with such environments, and note that systematic self-delusion also leaves you (and the other people using a similar strategy to coordinate) vulnerable to risks that do not take into account your self-delusion. This mainly includes existential risks such as misaligne... (read more)

According to Eliezar Yudkowsky, your thoughts should reflect reality.

I expect that the more your beliefs track reality, the better you'll get at decision making, yes.

According to Paul Graham, the most successful people are slightly overconfident.

Ah but VCs benefit from the ergodicity of the startup founders! From the perspective of the founder, its a non-ergodic situation. Its better to make Kelly bets instead if you prefer to not fall into gambler's ruin, given whatever definition of the real world situation maps onto the abstract concept of being ... (read more)

2mesaoptimizer
I do think that systematic self-delusion seems useful in multi-agent environments (see the commitment races problem for an abstract argument, and Sarah Constantin's essay "Is Stupidity Strength?" for a more concrete argument. I'm not certain that this is the optimal strategy we have for dealing with such environments, and note that systematic self-delusion also leaves you (and the other people using a similar strategy to coordinate) vulnerable to risks that do not take into account your self-delusion. This mainly includes existential risks such as misaligned superintelligences, but also extinction-level asteroids. Its a pretty complicated picture and I don't really have clean models of these things, but I do think that for most contexts I interact in, the long-term upside of having better models of reality is significantly higher compared to the benefit of systematic self-delusion.
Load More