LESSWRONG
LW

All of mesaoptimizer's Comments + Replies

Racial Dating Preferences and Sexual Racism

Personal experience observing certain trans women doing it in in-person and online group conversations, in a part of my social circle that is composed of queer and trans autistic people.

4marisa8d

rationalist social circle? To me, that is pretty unusual behavior. I read it as if you meant to point out a general phenomenon perpetrated by specific trans women online that the reader might know about. I know a lot of trans women online and off and I've honestly never seen this behavior, except in spaces like 4chan, where it's piled under at least 4 layers of self-hate, irony, and sadism. One of the "gender ideology" (i.e. mainstream trans) platitudes you hear over and over again is that gender identity ≠ gendered behavior ≠ sex and people should only transition if they want to, to better match their gender identity. The whole point is that acting masculine doesn't disqualify you from being a woman, nor does acting feminine make you one (whether you agree with that or not). Pressuring an effeminate boy into transitioning if he doesn't experience gender dysphoria is immoral. If you or anyone else reading this comment has examples of people doing this online, please link them in reply to this comment, I would be interested to see how this is happening.

Racial Dating Preferences and Sexual Racism

mesaoptimizer13d112

Thank you for the passionate comment.

Indeed, and I apologize for not being more diplomatic.

a lot of dating advice given to men doesn’t reflect base reality

I agree.

I think it is appropriate to recommend people do expensive things, even if they are speculative, as many of the people I have in mind are distressed about matters of love and sex and have a lot of disposable income.

Seems fine if your intention was to bring it to the attention of these people, sure. I still feel somewhat wary of pushing people to take steroids out of a desire to be perc... (read more)

5koreindian13d

Yes this is very good. I think due to doing martial arts I'm more comfortable with steroid usage than I should be. I am also realizing that the two times I heard people around me say they were considering doing steroids for what I thought were insufficiently serious reasons, I told them very strongly not to do that. I guess my mental model is something like this: some people get negative desirability points for things like their race. Those race demerits still exist legibly even when you do plausible amounts of compensation by ordinary means. So consider exogenous interventions (hence steroids, among other things). But those interventions should be narrowly tailored to compensate for the relevant invariant, and as you're very correctly pointing out people are going to do that incorrectly by default.

Racial Dating Preferences and Sexual Racism

mesaoptimizer13d7642

I respect the courage in posting this on LessWrong and writing your thoughts out for all to hear and evaluate and judge you for. It is why I've decided to go out on a limb and even comment.

take steroids

Taking steroids usually leads to a permanent reduction of endogenous testosterone production, and infertility. I think it is quite irresponsible for you to recommend this, especially on LW, without the sensible caveats.

take HGH during critical growth periods

Unfortunately, this option is only available for teenagers with parents who are rich enough to... (read more)

5marisa11d

sincere question, do you have a source for this?

2Jiro12d

There's actually no reason to believe it'll help. Increasing your height this way doesn't make your body proportionately taller, and being out of proportion is unattractive. This is just another case of geeks carrying ideas to weird conclusions without doing sanity checks.

2Elizabeth12d

Koreindian didn't order anyone to do anything. He suggested a different frame and some ideas to consider. I wouldn't even call it a recommendation of any particular idea.

koreindian13d*4211

Thank you for the passionate comment.

I think the tone of my recommendations was a bit too flippant and polemical, as the post was written with a much smaller, more high-context audience in mind. I would like to give some defense of the substance of those recommendations though.

First, a lot of dating advice given to men doesn't reflect base reality (e.g. a woman recently told a male friend of mine, who is already a very nice and conscientious person, that he needed to "just be a nice person, that's it"), or the advice given is true, but contains little alph... (read more)

The Field of AI Alignment: A Postmortem, and What To Do About It

mesaoptimizer6mo41

Even if I'd agree with your conclusion, your argument seems quite incorrect to me.

the seeming lack of reliable feedback loops that give you some indication that you are pushing towards something practically useful in the end instead of just a bunch of cool math that nonetheless resides alone in its separate magisterium

That's what math always is. The applicability of any math depends on how well the mathematical models reflect the situation involved.

would build on that to say that for every powerfully predictive, but lossy and reductive mathematical m

... (read more)

Hazard's Shortform Feed

mesaoptimizer7mo0-2

Yes, he is doing something, but he is optimizing for signal rather than the true thing. Becoming a drug addict, developing schizophrenia, killing yourself—those are all costly signals of engaging with the abyss.

What? Michael Vassar has (AFAIK from Zack M. Davis' descriptions) not taken drugs or promoted becoming a drug addict or "killing yourself". If you hear his Spencer interview, you'll notice that he seems very sane and erudite, and clearly does not give off the unhinged 'Nick Land' vibe that you seem to be claiming that he has or he promotes.

You ar... (read more)

2Raemon7mo

(I have not engaged with this thread deeply) I've talked to Michael Vassar many times in person. I'm somewhat confident he has taken LSD based on him saying so (although if this turned out wrong I wouldn't be too surprised, my memory is hazy) I definitely have the experiencing of him saying lots of things that sound very confusing and crazy, making pretty outlandish brainstormy-style claims that are maybe interesting, which he claims to take as literally true, that seem either false, or, at least require a lot of inferential gap. I have also heard him make a lot of morally charged, intense statements that didn't seem clearly supported. (I do think I have valued talking to Michael, despite this, he is one of the people who helped unstick me in certain ways, but, the mechanism by which he helped me was definitely via being kinda unhinged sounding.)

mesaoptimizer's Shortform

mesaoptimizer7mo*28-10

As of right now, I expect we have at least a decade, perhaps two, until we get a human intelligence level generalizing AI (which is what I consider AGI). This is a controversial statement in these social circles, and I don't have the bandwidth or resources to write a concrete and detailed argument, so I'll simply state an overview here.

Scale is the key variable driving progress to AGI. Human ingenuity is irrelevant. Lots of people believe they know the one last piece of the puzzle to get AGI, but I increasingly expect the missing pieces to be too alien

... (read more)

3Kaarel7mo

you say "Human ingenuity is irrelevant. Lots of people believe they know the one last piece of the puzzle to get AGI, but I increasingly expect the missing pieces to be too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments." and you link https://tsvibt.blogspot.com/2024/04/koan-divining-alien-datastructures-from.html for "too alien for most researchers to stumble upon just by thinking about things without doing compute-intensive experiments" i feel like that post and that statement are in contradiction/tension or at best orthogonal

8Vladimir_Nesov7mo

The current scaling speed is created by increasing funding for training projects, which isn't sustainable without continued success. Without this, the speed goes down to the much slower FLOP/dollar trend of improving cost efficiency of compute, making better AI accelerators. The 2 + 4 + 8 years thing might describe gradual increase in funding, but there are still 2 OOMs of training compute beyond original GPT-4 that are already baked-in in the scale of the datacenters that are being built and didn't yet produce deployed models. We'll only observe this in full by late 2026, so the current capabilities don't yet match the capabilities before a possible scaling slowdown.

Why I funded PIBBSS

mesaoptimizer10mo20

IDK how to understand your comment as referring to mine.

I'm familiar with how Eliezer uses the term. I was more pointing to the move of saying something like "You are [slipping sideways out of reality], and this is bad! Stop it!" I don't think this usually results in the person, especially confused people, reflecting and trying to be more skilled at epistemology and communication.

In fact, there's a loopy thing here where you expect someone who is 'slipping sideways out of reality' to caveat their communications with an explicit disclaimer that admits th... (read more)

2TsviBT10mo

Excuse me, none of that is in my comment.

Why I funded PIBBSS

mesaoptimizer10mo20

I think James was implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose. I agree that he could have made it clearer, but I think he's made it clear enough given the following line:

I suspect that even if we have a bunch of good agent foundations research getting done, the result is that we just blast ahead with methods that are many times easier because they lean on slow takeoff, and if takeoff is slow we’re probably fine if it’s fast we die.

And as for your last sentence:

If you don’t, you’re spra

... (read more)

1james.lucassen10mo

I'm not sure exactly what mesa is saying here, but insofar as "implicitly tracking the fact that takeoff speeds are a feature of reality and not something people can choose" means "intending to communicate from a position of uncertainty about takeoff speeds" I think he has me right. I do think mesa is familiar enough with how I talk that the fact he found this unclear suggests it was my mistake. Good to know for future.

2TsviBT10mo

IDK how to understand your comment as referring to mine. To clarify the "slipping sideways" thing, I'm alluding to "stepping sideways" described in Q2 here: https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy#Q2___I_have_a_clever_scheme_for_saving_the_world___I_should_act_as_if_I_believe_it_will_work_and_save_everyone__right__even_if_there_s_arguments_that_it_s_almost_certainly_misguided_and_doomed___Because_if_those_arguments_are_correct_and_my_scheme_can_t_work__we_re_all_dead_anyways__right_ and from https://www.lesswrong.com/posts/m6dLwGbAGtAYMHsda/epistemic-slipperiness-1#Subtly_Bad_Jokes_and_Slipping_Sideways

Zach Stein-Perlman's Shortform

mesaoptimizer10mo40

Seems like most people believe (implicitly or explicitly) that empirical research is the only feasible path forward to building a somewhat aligned generally intelligent AI scientist. This is an underspecified claim, and given certain fully-specified instances of it, I'd agree.

But this belief leads to the following reasoning: (1) if we don't eat all this free energy in the form of researchers+compute+funding, someone else will; (2) other people are clearly less trustworthy compared to us (Anthropic, in this hypothetical); (3) let's do whatever it takes to m... (read more)

TsviBT10mo1614

most people believe (implicitly or explicitly) that empirical research is the only feasible path forward to building a somewhat aligned generally intelligent AI scientist.

I don't credit that they believe that. And, I don't credit that you believe that they believe that. What did they do, to truly test their belief--such that it could have been changed? For most of them the answer is "basically nothing". Such a "belief" is not a belief (though it may be an investment, if that's what you mean). What did you do to truly test that they truly tested their be... (read more)

There are transparent monsters in the world - part 1

mesaoptimizer11mo20

If you meet Buddha on the road...

yanni's Shortform

mesaoptimizer11mo20

I recommend messaging people who seem to have experience doing so, and requesting to get on a call with them. I haven't found any useful online content related to this, and everything I've learned in relation to social skills and working with neurodivergent people, I learned by failing and debugging my failures.

1yanni kyriacos11mo

Thanks for the feedback! I had a feeling this is where I'd land :|

Orpheus16's Shortform

mesaoptimizer11mo20

I hope you've at least throttled them or IP blocked them temporarily for being annoying. It is not that difficult to scrape a website while respecting its bandwidth and CPU limitations.

7habryka11mo

We complained to them and it's been better in recent months. We didn't want to block them because I do actually want LW to be part of the training set.

mesaoptimizer1y20

Yeah I think yours has achieved my goal -- a post to discuss this specific research advance. Please don't delete your post -- I'll move mine back to drafts.

"AI achieves silver-medal standard solving International Mathematical Olympiad problems"

mesaoptimizer1y40

I searched for it and found none. The twitter conversation also seems to imply that there has not been a paper / technical report out yet.

[Closed] Prize and fast track to alignment research at ALTER

mesaoptimizer1y20

Based on your link, it seems like nobody even submitted anything to the contest throughout the time it existed. Is that correct?

2Vanessa Kosoy1y

There was exactly one submission, which was judged insufficient to merit the prize.

Neural networks as non-leaky mathematical abstraction

mesaoptimizer1y20

yet mathematically true

This only seems to be the case because the equals sign is redefined in that sentence.

Ryan Kidd's Shortform

mesaoptimizer1y60

I expect that Ryan means to say one of the these things:

There isn't enough funding for MATS grads to do useful work in the research directions they are working on, that have already been vouched for by senior alignment researchers (especially their mentors) to be valuable. (Potential examples: infrabayesianism)
There isn't (yet) institutional infrastructure to support MATS grads to do useful work together as part of a team focused on the same (or very similar) research agendas, and that this is the case for multiple nascent and established research agend

... (read more)

2Ryan Kidd1y

@Elizabeth, Mesa nails it above. I would also add that I am conceptualizing impactful AI safety research as the product of multiple reagents, including talent, ideas, infrastructure, and funding. In my bullet point, I was pointing to an abundance of talent and ideas relative to infrastructure and funding. I'm still mostly working on talent development at MATS, but I'm also helping with infrastructure and funding (e.g., founding LISA, advising Catalyze Impact, regranting via Manifund) and I want to do much more for these limiting reagents.

mesaoptimizer's Shortform

mesaoptimizer1y405

I've become somewhat pessimistic about encouraging regulatory power over AI development recently after reading this Bismarck Analysis case study on the level of influence (or lack of it) that scientists had over nuclear policy.

The impression I got from some other secondary/tertiary sources (specifically the book Organizing Genius) was that General Groves, the military man who was the interface between the military and Oppenheimer and the Manhattan Project, did his best to shield the Manhattan Project scientists from military and bureaucratic drudgery, and ... (read more)

4jbash1y

I'm not actually seeing where deep expertise on nuclear weapons technology would qualify anybody to have much special input into nuclear weapons policy in general. There just don't seem to be that many technical issues compared to the vast number of political ones. I don't know if that applies to AI, but tend to think the two are different.

1DPiepgrass1y

I don't feel like this is actually a counterargument? You could agree with both arguments, concluding that we shouldn't work for OpenAI but a outfit better-aligned to your values is okay.

2David Hornbein1y

I agree with your argument here, especially your penultimate paragraph, but I'll nitpick that framing your disagreements with Groves as him being "less of a value add" seems wrong. The value that Groves added was building the bomb, not setting diplomatic policy.

The Golden Mean of Scientific Virtues

mesaoptimizer1y42

I’m optimizing for consistently writing and publishing posts.

I agree with this strategy, and I plan to begin something similar soon. I forgot that Epistemological Fascinations is your less polished and more "optimized for fun and sustainability" substack. (I have both your substacks in my feed reader.)

2adamShimi1y

No worries. ;)

The Golden Mean of Scientific Virtues

mesaoptimizer1y42

I really appreciate this essay. I also think that most of it consists of sazens. When I read your essay, I find my mind bubbling up concrete examples of experiences I've had, that confirm or contradict your claims. This is, of course, what I believe is expected from graduate students when they are studying theoretical computer science or mathematics courses -- they'd encounter an abstraction, and it is on them to build concrete examples in their mind to get a sense of what the paper or textbook is talking about.

However, when it comes to more inchoate domai... (read more)

4adamShimi1y

I wholeheartedly agree. The reason why I didn't go for this more grounded and practical and teachable approach is that at the moment, I'm optimizing for consistently writing and publishing posts. Historically the way I fail at that is by trying too hard to write really good posts and make all the arguments super clean and concrete and detailed -- this leads to me dropping the piece after like a week of attempts. So instead, I'm going for "write what comes naturally, edit a bit to check typos and general coherence, and publish", which leads to much more abstract pieces (because that's how I naturally think). But reexploring this topic in an in-depth and detailed piece in the future, along the lines of what you describe, feels like an interesting challenge. Will keep it in mind. Thanks for the thoughtful comment!

shortplav

mesaoptimizer1y51

GPT-4o can not reproduce the string, and instead just makes up plausible candidates. You love to see it.

Hmm. I assume you could fine-tune away an LLM from reproducing the string. Eliciting it would just become more difficult. Try posting canary text, and a part of the canary string, and see if GPT-4o completes it.

5niplav1y

Tried 8 times, it doesn't manage and still makes things up (given the first four/first six bytes of the canary GUID). But it really tries to, while Claude is talking about how it shouldn't've seen the string.

ryan_greenblatt's Shortform

mesaoptimizer1y22

Please read the model organisms for misalignment proposal.

Habryka's Shortform Feed

mesaoptimizer1y3729

Anyone who has signed a non-disparagement agreement with Anthropic is free to state that fact (and we regret that some previous agreements were unclear on this point).

I'm curious as to why it took you (and therefore Anthropic) so long to make it common knowledge (or even public knowledge) that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.

The right time to reveal this was when the OpenAI non-disparagement news broke, not after Habryka connects the dots and builds social momentum for scrutiny of Anthropic.

habryka1y3529

that Anthropic used non-disparagement contracts as a standard and was also planning to change its standard agreements.

I do want to be clear that a major issue is that Anthropic used non-disparagement agreements that were covered by non-disclosure agreements. I think that's an additionally much more insidious thing to do, that contributed substantially to the harm caused by the OpenAI agreements, and I think is important fact to include here (and also makes the two situations even more analogous).

The Xerox Parc/ARPA version of the intellectual Turing test: Class 1 vs Class 2 disagreement

mesaoptimizer1y20

If you like The Dream Machine, you'll also like Organizing Genius.

mesaoptimizer's Shortform

mesaoptimizer1y*71

Project proposal: EpochAI for compute oversight

Detailed MVP description: website with an interactive map that shows locations of high risk data centers globally, with relevant information appearing when you click on the icons on the map. Examples of relevant information: organizations and frontier labs that have access to this compute, the effective FLOPS of the data center, what time would it take to train a SOTA model in that datacenter).

High risk datacenters are datacenters that are capable of training current or next generation SOTA AI systems.

Why:

... (read more)

2Vladimir_Nesov1y

Collections of datacenter campuses sufficiently connected by appropriate fiber optic probably should count as one entity for purposes of estimating training potential, even in the current synchronous training paradigm. My impression is that laying such fiber optic is both significantly easier and significantly cheaper than building power plants or setting up power transmission over long distances in the multi-GW range. Thus for training 3M GPUs/6GW scale models ($100 billion in infrastructure, $10 billion in cost of training time), hyperscalers "only" need to upgrade the equipment and arrange for "merely" on the order of 1GW in power consumption at multiple individual datacenter campuses connected to each other, while everyone else is completely out of luck. This hypothetical advantage makes collections of datacenter campuses an important unit of measurement, and also it would be nice to have a more informed refutation or confirmation that this is a real thing.

1tmeanen1y

Seems like a useful resource to have out there. Some other information that would be nice to have are details about the security of the data center - but there's probably limited information that could be included [1]. 1. ^ Because you probably don't want too many details about your infosec protocols out there for the entire internet to see.

Daniel Kokotajlo's Shortform

mesaoptimizer1y110

Neuro-sama is a limited scaffolded agent that livestreams on Twitch, optimized for viewer engagement (so it speaks via TTS, it can play video games, etc.).

quetzal_rainbow's Shortform

mesaoptimizer1y20

Well, at least a subset of the sequence focuses on this. I read the first two essays and was pessimistic of the titular approach enough that I moved on.

Here's a relevant quote from the first essay in the sequence:

Furthermore, most of our focus will be on ensuring that your model is attempting to predict the right thing. That’s a very important thing almost regardless of your model’s actual capability level. As a simple example, in the same way that you probably shouldn’t trust a human who was doing their best to mimic what a malign superintelligence woul

... (read more)

quetzal_rainbow's Shortform

mesaoptimizer1y40

Evan Hubinger's Conditioning Predictive Models sequence describes this scenario in detail.

2Carl Feynman1y

In a great deal of detail, apparently, since it has a recommended reading time of 131 minutes.

yanni's Shortform

mesaoptimizer1y42

There's generally a cost to managing people and onboarding newcomers, and I expect that offering to volunteer for free is usually a negative signal, since it implies that there's a lot more work than usual that would need to be done to onboard this particular newcomer.

Have you experienced otherwise? I'd love to hear some specifics as to why you feel this way.

orthonormal's Shortform

mesaoptimizer1y20

I think we'll have bigger problems than just solving the alignment problem, if we have a global thermonuclear war that is impactful enough to not only break the compute supply and improvement trends, but also destabilize the economy and geopolitical situation enough that frontier labs aren't able to continue experimenting to find algorithmic improvements.

Agent foundations research seems robust to such supply chain issues, but I'd argue that gigantic parts of the (non-academic, non-DeepMind specific) conceptual alignment research ecosystem is extremely depe... (read more)

Ilya Sutskever created a new AGI startup

mesaoptimizer1y94

Thiel has historically expressed disbelief about AI doom, and has been more focused on trying to prevent civilizational decline. From my perspective, it is more likely that he'd fund an organization founded by people with accelerationist credentials, than by someone who was a part of a failed coup attempt that would look to him like it involved a sincere belief in an extreme difficulty of the alignment problem.

Ilya Sutskever created a new AGI startup

mesaoptimizer1y150

0Chris_Leong1y

Paywalled. Would be fantastic if someone with access could summarise the most important bits.

Richard Ngo's Shortform

mesaoptimizer1y105

I'd love to read an elaboration of your perspective on this, with concrete examples, which avoids focusing on the usual things you disagree about (pivotal acts vs. pivotal processes, social facets of the game is important for us to track, etc.) and mainly focus on your thoughts on epistemology and rationality and how it deviates from what you consider the LW norm.

Richard Ngo's Shortform

mesaoptimizer1y40

I started reading your meta-rationality sequence, but it ended after just two posts without going into details.

David Chapman's website seems like the standard reference for what the post-rationalists call "metarationality". (I haven't read much of it, but the little I read made me somewhat unenthusiastic about continuing).

Closed-Source Evaluations

mesaoptimizer1y22

Note that the current power differential between evals labs and frontier labs is such that I don't expect evals labs have the slack to simply state that a frontier model failed their evals.

You'd need regulation with serious teeth and competent 'bloodhound' regulators watching the space like a hawk, for such a possibility to occur.

Book Recommendations for social skill development?

mesaoptimizer1y30

I just encountered polyvagal theory and I share your enthusiasm for how useful it is for modeling other people and oneself.

1thinkstopeth11mo

Seconded! Polyvagal from this post really helped me understand the power of how our physiology affects our social efforts.

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)

mesaoptimizer1y20

Note that I'm waiting for the entire sequence to be published before I read it (past the first post), so here's a heads up that I'm looking forward to seeing more of this sequence!

Two easy things that maybe Just Work to improve AI discourse

mesaoptimizer1y20

I think Twitter systematically underpromotes tweets with links external to the Twitter platform, so reposting isn't a viable strategy.

mesaoptimizer's Shortform

mesaoptimizer1y20

Thanks for the link. I believe I read it a while ago, but it is useful to reread it from my current perspective.

trying to ensure that AIs will be philosophically competent

I think such scenarios are plausible: I know some people argue that certain decision theory problems cannot be safely delegated to AI systems, but if we as humans can work on these problems safely, I expect that we could probably build systems that are about as safe (by crippling their ability to establish subjunctive dependence) but are also significantly more competent at philosophical progress than we are.

jeffreycaruso's Shortform

mesaoptimizer1y111

Leopold's interview with Dwarkesh is a very useful source of what's going on in his mind.

What happened to his concerns over safety, I wonder?

He doesn't believe in a 'sharp left turn', which means he doesn't consider general intelligence to be a discontinuous (latent) capability spike such that alignment becomes significantly more difficult after it occurs. To him, alignment is simply a somewhat harder empirical techniques problem like capabilities work is. I assume he imagines in behavior similar to current RLHF-ed models even as frontier labs have dou... (read more)

Orpheus16's Shortform

mesaoptimizer1y40

Oh, by that I meant something like "yeah I really think it is not a good idea to focus on an AI arms race". See also Slack matters more than any other outcome.

Orpheus16's Shortform

mesaoptimizer1y20

If Company A is 12 months from building Cthulhu, we fucked up upstream. Also, I don't understand why you'd want to play the AI arms race -- you have better options. They expect an AI arms race. Use other tactics. Get into their OODA loop.

Unsee the frontier lab.

4davekasten1y

...yes ? I think my scenario explicitly assumes that we've fucked up upstream in many, many ways.

Prometheus's Shortform

mesaoptimizer1y70

These are pretty sane takes (conditional on my model of Thomas Kwa of course), and I don't understand why people have downvoted this comment. Here's an attempt to unravel my thoughts and potential disagreements with your claims.

AGI that poses serious existential risks seems at least 6 years away, and safety work seems much more valuable at crunch time, such that I think more than half of most peoples’ impact will be more than 5 years away.

I think safety work gets less and less valuable at crunch time actually. I think you have this Paul Christiano-like... (read more)

4ryan_greenblatt1y

Sure, but you have to actually implement these alignment/control methods at some point? And likely these can't be (fully) implemented far in advance. I usually use the term "crunch time" in a way which includes the period where you scramble to implement in anticipation of the powerful AI. One (oversimplified) model is that there are two trends: * Implementation and research on alignment/control methods becomes easier because of AIs (as test subjects). * AIs automate away work on alignment/control. Eventually, the second trend implies that safety work is less valuable, but probably safety work has already massively gone up in value by this point. (Also, note that the default way of automating safety work will involve large amounts of human labor for supervision. Either due to issues with AIs or because of lack of trust in these AIs systems (e.g. human labor is needed for a control scheme.)

mesaoptimizer's Shortform

mesaoptimizer1y132

It seems like a significant amount of decision theory progress happened between 2006 and 2010, and since then progress has stalled.

Counterfactual mugging was invented independently by Gary Drescher in 2006, and by Vladimir Nesov in 2009.
Counterlogical mugging was invented by Vladimir Nesov in 2009.
The "agent simulates predictor" problem (now popularly known as the commitment races problem) was invented by Gary Drescher in 2010.
The "self-fulfilling spurious proofs" problem (now popularly known as the 5-and-10 problem) was invented by Benja Falle

... (read more)

4Chris_Leong1y

I think I've been (slowly) making progress. I think we would be able to make progress on this if people seriously wanted to make progress, but understandably it's not the highest priority.

6Wei Dai1y

Yeah it seems like a bunch of low hanging fruit was picked around that time, but that opened up a vista of new problems that are still out of reach. I wrote a post about this, which I don't know if you've seen or not. (This has been my experience with philosophical questions in general, that every seeming advance just opens up a vista of new harder problems. This is a major reason that I switched my attention to trying to ensure that AIs will be philosophically competent, instead of object-level philosophical questions.)

Alok Singh's Shortform

mesaoptimizer1y11

You are missing providing a ridiculous amount of context, but yes, if you are okay with leather footwear, Meermin provides great footwear at relatively inexpensive prices.

I still recommend thrift shopping instead. I spent 250 EUR on a pair of new noots from Meermin, and 50 EUR on a pair of thrifted boots which seem about 80% as aesthetically pleasing as the first (and just as comfortable since I tried them on before buying them).

-2Alok Singh1y

Skipping context sure saved me a lotta time, and plus you gave a nice elab shoe thrifting is meh for me because foot size What sort of boots?

mesaoptimizer's Shortform

mesaoptimizer1y20

It has been six months since I wrote this, and I want to note an update: I now grok what Valentine is trying to say and what he is pointing at in Here's the Exit and We're already in AI takeoff. That is, I have a detailed enough model of Valentine's model of the things he talks about, such that I understand the things he is saying.

I still don't feel like I understand Kensho. I get the pattern of the epistemic puzzle he is demonstrating, but I don't know if I get the object-level thing he points at. Based on a reread of the comments, maybe what Valentine me... (read more)

jacquesthibs's Shortform

mesaoptimizer1y60

I've experimented with Claude Opus for simple Ada autoformalization test cases (specifically quicksort), and it seems like the sort of issues that make LLM agents infeasible (hallucination-based drift, subtle drift caused by sticking to certain implicit assumptions you made before) are also the issues that make Opus hard to use for autoformalization attempts.

I haven't experimented with a scaffolded LLM agent for autoformalization, but I expect it won't go very well either, primarily because scaffolding involves attempts to make human-like implicit high-lev... (read more)

2jacquesthibs1y

Great. Yeah, I also expect that it is hard to get current models to work well on this. However, I will mention that the DeepSeekMath model does seem to outperform GPT-4 despite having only 7B parameters. So, it may be possible to create a +70B fine-tune that basically destroys GPT-4 at math. The issue is whether it generalizes to the kind of math we'd commonly see in alignment research. Additionally, I expect at least a bit can be done with scaffolding, search, etc. I think the issue with many prompting methods atm is that they are specifically trying to get the model to arrive at solutions on their own. And what I mean by that is that they are starting from the frame of "how can we get LLMs to solve x math task on their own," instead of "how do we augment the researcher's ability to arrive at (better) proofs more efficiently using LLMs." So, I think there's room for product building that does not involve "can you solve this math question from scratch," though I see the value in getting that to work as well.

Notifications Received in 30 Minutes of Class

mesaoptimizer1y119

This is very interesting, thank you for posting this.

Executive Dysfunction 101

mesaoptimizer1y30

the therapeutic idea of systematically replacing the concept “should” with less normative framings

Interesting. I independently came up with this concept, downstream of thinking about moral cognition and parts work. Could you point me to any past literature that talks about this coherently enough that you would point people to it to understand this concept?

I know that Nate has written about this:

As far as I recall, reading these posts didn't help me.

james.lucassen's Shortform

mesaoptimizer1y20

Based on gwern's comment, steganography as a capability can arise (at rather rudimentary levels) via RLHF over multi-step problems (which is effectively most cognitive work, really), and this gets exacerbated with the proliferation of AI generated text that embeds its steganographic capabilities within it.

The following paragraph by gwern (from the same thread linked in the previous paragraph) basically summarizes my current thoughts on the feasibility of prevention of steganography for CoT supervision:

Inner-monologue approaches to safety, in the new skin

... (read more)