Max H's Shortform

Max H

LESSWRONG
LW

Max H's Shortform — LessWrong

Max H's Shortform

by Max H

13th May 2023

1 min read

5

This is a special post for quick takes by Max H. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Mentioned in

70Bayesian updating in real life is mostly about understanding your hypotheses

3610 quick takes about AGI

22 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:51 AM

[-]Max H5d6223

Peter Thiel pointed out that the common folk wisdom in business that you learn more from failure than success is actually wrong - failure is overdetermined and thus uninteresting.

I think you can make an analogous observation about some prosaic alignment research - a lot of it is the study of (intellectually) interesting failures, which means that it can make for a good nerdsnipe, but it's not necessarily that informative or useful if you're actually trying to succeed at (or model) doing something truly hard and transformative.

Glitch tokens, the hot mess work, and various things related to jailbreaking, simulators, and hallucinations come to mind as examples of lines of research and discussion that an analogy to business failure predicts won't end up being centrally relevant to real alignment difficulties. Which is not to say that the authors of these works are claiming that they will be, nor that this kind of work can't make for effective demonstrations and lessons. But I do think this kind of thing is unlikely to be on the critical path for trying to actually solve or understand some deeper problems.

Another way of framing the observation above is that it is an implication of instrumental convergence: without knowing anything about its internals, we can say confidently that an actually-transformative AI system (aligned or not) will be doing something that is at least roughly coherently consequentialist. There might be some intellectually interesting or even useful lessons to be learned from studying the non-consequentialist / incoherent / weird parts of such a system or its predecessors, but in my frame, these parts (whatever they end up being), are analogous to the failures and missteps of a business venture, which are overdetermined if the business ultimately fails, or irrelevant if it succeeds.

[-]TsviBT4d100

I agree with this literally, but I'd want to add what I think is a significant friendly amendment. Successes are much more informative than failures, but they are also basically impossible. You have to relax your criteria for success a lot to start getting partial successes; and my impression is that in practice, "partial successes" in "alignment" are approximately 0 informative.

If we have to retreat from successes to interesting failures, I agree this is a retreat, but I think it's necessary. I agree that many/most ways of retreating are quite unsatisfactory / unhelpful. Which retreats are more helpful? Generally I think an idea (the idea?) is to figure out highly general constraints from particular failures. See here https://tsvibt.blogspot.com/2025/11/ah-motiva-3-context-of-concept-of-value.html#why-even-talk-about-values and especially the advice here https://www.lesswrong.com/posts/rZQjk7T6dNqD5HKMg/abstract-advice-to-researchers-tackling-the-difficult-core#Generalize_a_lot :

When an idea or proposal fails, try to generalize far. Draw really wide-ranging conclusions.

Also cf. here (https://www.lesswrong.com/posts/K4K6ikQtHxcG49Tcn/hia-and-x-risk-part-2-why-it-hurts#Alignment_harnesses_added_brainpower_much_less_effectively_than_capabilities_research_does), quoting the relevant part in full:

In alignment, on the other hand, you have to understand each constraint that’s known in order to even direct your attention to the relevant areas. This is analogous to the situation with the P vs. NP , where whole classes of plausible proof strategies are proven to not work. You have to understand most of those constraints; otherwise by default you’ll probably be working on e.g. a proof that relativizes and therefore cannot show P≠NP. Progress is made by narrowing the space, and then looking into the narrowed space.

[-]Oliver Daniels3d20

This is also a great explanation for why it's hard to publish negative results

[-]gwern2d40

Depends on the field, at best. In the psychology Replication Crisis, this was one of the classic excuses to not publish failures-to-replicate: "we did it right, so you must just have done it wrong; so it's good that you can't get published and no one will cite you even if you do. You'd just pollute the literature and distract from our important success." Of course, it turns out that even if you involve the original experimenters in the followup to sign off with their magic touch, it doesn't replicate once you lock down the analysis and get a proper sample size.

[-]Oliver Daniels2d20

year fair, the particular kind of negative result I had in mind was

"we tried some novel ML/interp technique and it didn't work"

which, to your point, is indeed a very particular kind of negative result

[-]faul_sname4d20

we can say confidently that an actually-transformative AI system (aligned or not) will be doing something that is at least roughly coherently consequentialist.

I don't think we can confidently say that. If takeoff looks like more like a cambrian explosion than like a singleton (and that is how I would bet), that would definitely be transformative but the transformation would not be the result of any particular agent deciding what world state is desirable and taking actions intended to bring about that world state.

[-]quetzal_rainbow5d20

Studying failures is useful because they highlight non-obvious internal mechanism, while successes are usually about thing working as intended and therefore not requiring explanation.

Another problem is that we don't have examples of successes, because every measureable alignment success can be a failure in disguise.

[-]Knight Lee5d*20

~~I agree with the idea of failure being overdetermined.~~

But another factor might be that those failures aren't useful because they relate to current AI. Current AI is very different from AGI or superintelligence, which makes both failures and successes less useful...

~~Though I know very little about these examples :/~~

Edit: I misread, Max H wasn't trying to say that successes are more important to failures, just that failures aren't informative.

[-]Raemon5d52

Yeah, but, there's already a bunch of arguments about whether prosaic ML alignment is useful (which people have mostly decided whatever they believe about) and the OP is interesting because it's a fairly separate reason to be skeptical about a class of research.

[-]lilkim20255d10

The failure of an interesting hypothesis is informative as long as you understand why it doesn't work, and can better model how the thing you're studying works. The difference between CS research and business is that business failures can sort of "come out of nowhere" ("Why isn't anyone buying our product?" can't really be answered), whereas, if you look closely enough at the models, you can always learn something from the failure of something that should've worked but didn't.

[-]Max H4mo22-29

Rationality should not be painful.

Putting the lessons of the Sequences into practice, reflecting on and mentally rehearsing the core ideas, making them your own and weaving them into your everyday habits of thought and action until they become a part of you - at no point should any of this cause an increase in mental anguish, emotional vulnerability, depression, psychosis, mania etc., even temporarily. The worst-case consequences of absorbing these lessons should be that you regret some of your past life choices or perhaps come to realize that you're stuck in a bad situation that you can't easily change. But rationality should also leave you strictly better-equipped to deal with that situation, if you find yourself in it.

Also, the feeling of successfully becoming more rational should not feel like a sudden, tectonic shift in your mental processes or beliefs (in contrast to actually changing your mind about something concrete, which can sometimes feel like that). Rationality should feel natural and gradual and obvious in retrospect, like it was always a part of you, waiting to be discovered and adopted.

I am using "should" in the paragraphs above both descriptively and normatively. It is partly a factual claim: if you're not better off, you're probably missing something or "doing it wrong", in some concrete, identifiable way. But I am also making a normative / imperative statement that can serve as advice or a self-fulfilling prophecy of sorts - if your experience is different or you disagree, consider whether there's a mental motion you can take to make it true.

I am also not claiming that the Valley of Bad Rationality is entirely fake. But I am saying it's not that big of a deal, and in any case the best way out is through. And also that "through" should feel natural / good / easy.

I am not very interested in meditation or jhanas or taking psychoactive drugs or various other forms of "woo". I believe that the beneficial effects that many people derive from these things are real and good, but I suspect they wouldn't work on me. Not because I don't believe in them, but because I already get basically all the plausible benefits from such things by virtue of being a relatively happy, high-energy, mentally stable person with a healthy, well-organized mind.

Some of these qualities are a lucky consequence of genetics, having a nice childhood, a nice life, being generally smart, etc. But there's definitely a chunk of it that I attribute directly to having read and internalized the Sequences in my early teens, and then applied them to thousands of tiny and sometimes not-so-tiny tribulations of everyday life over the years.

The thoughts above are partially / vaguely in response to this post and its comment section about CFAR workshops, but also to some other hazy ideas that I've seen floating around lately.

I have never been to a CFAR workshop and don't actually have a strong opinion on whether attending one is a good idea or not - if you're considering going, I'd advise you to read the warnings / caveats in the post and comments, and if you feel like (a) they don't apply to you and (b) a CFAR workshop sounds like your thing, it's worth going? You'll probably meet some interesting people, have fun, and learn some useful skills. But I suspect that attending such a workshop is not a necessary or even all that helpful ingredient for actually becoming more rational.

A while ago, Eliezer wrote in the preface for the published version of the Sequences:

It ties in to the first-largest mistake in my writing, which was that I didn’t realize that the big problem in learning this valuable way of thinking was figuring out how to practice it, not knowing the theory. I didn’t realize that part was the priority; and regarding this I can only say “Oops” and “Duh.”

Yes, sometimes those big issues really are big and really are important; but that doesn’t change the basic truth that to master skills you need to practice them and it’s harder to practice on things that are further away. (Today the Center for Applied Rationality is working on repairing this huge mistake of mine in a more systematic fashion.)

And has also written:

Jeffreyssai inwardly winced at the thought of trying to pick up rationality by watching other people talk about it—

Maybe I just am typical-minding / generalizing from one example here, but in my case, simply reading a bunch of blog posts and quietly reflecting on them on my own did work, and in retrospect it feels like the only thing that could have worked, or at least that attending a workshop, practicing a bunch of rationality exercises from a handbook, discussing in a group setting, etc. would not have been particularly effective on its own, and potentially even detracting or at least distracting.

And, regardless of whether the caveats / warnings / dis-recommendations in the CFAR post and comments are worth heeding, I suspect they're pointing at issues that are just not that closely related to (what I think of as) the actual core of learning rationality.

[-]kbear4mo30

all this sounds right to me.

the most benefit i've found personally has been from memetic rationality. for example:

"i notice that i am confused"
"0. something i haven't listed yet"
steelman / ITT
"what do i have? what do i want? how can i use the former to get the latter?"

the list is rather thin. i'd like to see many more of these! each one is a concrete, concise thought with a clear external/observed trigger that serves as a starting point for further inquiry. more mystically, we could call them incantations to bring about a rational state of mind.

some nearby memes that fall short:

list of fallacies (does not encourage further inquiry)
bayes' rule (hard to remember to check priors in advance. if anyone has operationalized this, please let me know!)
various social observations (e.g. "status games" theory is true, (and a useful frame for understanding specific interactions/reactions!) but not concrete enough to inspire rationality. it goes signal-signal-counter-signal... who can track it?!)

attending a workshop, practicing a bunch of rationality exercises from a handbook, discussing in a group setting, etc. would not have been particularly effective on its own, and potentially even detracting or at least distracting.

this matches my experience. [in person] social forces are extremely strong, and rationality is fragile. it is a single-player game until either (a) all players have a strong baseline, or (b) all players are deeply comfortable with one another. [well, perhaps those requirements apply to the single-player version as well. :) ]

real rationality is something that only i can keep myself honest to. only i know if i'm really doing it: you must be able to testify from inner spirit.

this is not to say that writing down the outcomes, publishing them, or checking with others is not a useful act. but if these mushroom fruits are the purpose, and not a side-consequence of the healthy mycelium, [as i suspect they must be in workshop settings,] then "rationality" is "out the window".

[-]Viliam4mo20

I am also not claiming that the Valley of Bad Rationality is entirely fake. But I am saying it's not that big of a deal, and in any case the best way out is through. And also that "through" should feel natural / good / easy.

I guess it depends on what position you are starting from. Some people are way more fucked up than average.

The problem with "best way out is through" is that the way through may take more time than the CFAR workshop, and you may do something stupid and harmful along the way. If you could stay in a safe place where you can't hurt anyone, including yourself, I would be more likely to agree with you.

To put it bluntly, we don't need another Ziz.

The advice "rationality shouldn't hurt if you are going it right", although true, is probably of little practical use to the person doing it wrong. Those who can understand this advice are those who don't need it.

[-]Max H10mo12-2

Maybe the recent tariff blowup is actually just a misunderstanding due to bad terminology, and all we need to do is popularize some better terms or definitions. We're pretty good at that around here, right?

Here's my proposal: flip the definitions of "trade surplus" and "trade deficit." This might cause a bit of confusion at first, and a lot of existing textbooks will need updating, but I believe these new definitions capture economic reality more accurately, and will promote clearer thinking and maybe even better policy from certain influential decision-makers, once widely adopted.

New definitions:

Trade surplus: Country A has a bilateral "trade surplus" with Country B if Country A imports more tangible goods (cars, steel, electronics, etc.) from Country B than it exports back. In other words, Country A ends up with more real, physical items. Country B, meanwhile, ends up with more than it started with of something much less important: fiat currency (flimsy paper money) or 1s and 0s in a digital ledger (probably not even on a blockchain!).
If you extrapolate this indefinitely in a vacuum, Country A eventually accumulates all of Country B's tangible goods, while Country B is left with a big pile of paper. Sounds like a pretty sweet deal for Country A if you ask me.
It's OK if not everyone follows this explanation or believes it - they can tell it's the good one because it has "surplus" in the name. Surely everyone wants a surplus.
Trade deficit: Conversely, Country A has a "trade deficit" if it exports more tangible resources than it imports, and thus ends up with less goods on net. In return, it only receives worthless fiat currency from some country trying to hoard actual stuff for their own people. Terrible deal!
Again, if you don't totally follow, that's OK, just pay attention to the word "deficit". Everyone knows that deficits are bad and should be avoided.

Under the new definitions, it becomes clear that merely returning to the previous status quo of a few days ago, where the US only "wins" the trade war by several hundred billion dollars, is insufficient for the truly ambitious statesman. Instead, the US government should aggressively mint more fiat currency in order to purchase foreign goods, magnifying our trade surplus and ensuring that in the long run the United States becomes the owner of all tangible global wealth.

Addressing second order concerns: if we're worried about a collapse in our ability to manufacture key strategic goods at home during a crisis, we can set aside part of the resulting increased surplus to subsidize domestic production in those areas. Some of the extra goods we're suddenly importing will probably be pretty useful in getting some new factories of our own off the ground. (But of course we shouldn't turn around and export any of that domestic production to other countries! That would only deplete our trade surplus.)

[-]Mars_Will_Be_Ours10mo72

The strategy you describe, exporting paper currency in exchange for tangible goods is unstable. It is only viable if other countries are willing to accept your currency for goods. This cannot last forever since a Trade Surplus by your definition scams other countries, with real wealth exchanged for worthless paper. If Country A openly enacted this strategy Countries B, C, D, etcetera would realize that Country A's currency can no longer be used to buy valuable goods and services from Country A. Countries B, C, D, etcetera would reroute trade amongst themselves, ridding themselves of the parasite Country A. Once this occurs, Country A's trade surplus would disappear, leading to severe inflation caused by shortages and money printing.

Hence, a Trade Surplus can only be maintained if Country B, C, D, etcetera are coerced into using Country A's currency. If Country B and C decided to stop using Country A's currency, Country A would respond by bombing them to pieces and removing the leadership of Country B and C. Coercion allows Country A to maintain a Trade Surplus, otherwise known as extracting tribute, from other nations. If Country A does not have a dominant or seemingly dominant military, the modified strategy collapses.

I do not think America has a military capable of openly extracting a Trade Surplus from other countries. While America has the largest military on Earth, it is unable to quickly produce new warships, secure the Red Sea from Houthi attacks or produce enough artillery shells to adequately supply Ukraine. America's inability to increase weapons production and secure military objectives now indicates that America cannot ramp up military production enough to fight another world war. If America openly decided to extract a Trade Surplus from other countries, a violent conflict would inevitably result. America is unlikely to win this conflict, so it should not continue to maintain a Trade Surplus.

[-]Knight Lee10mo30

"The idea that countries can export and trade-surplus their way to wealth is a fascinating one. They're shipping goods to other countries for free. How then could they prosper more? AFAICT, by outsourcing the task of rewarding and elevating their own most productive citizens."
tweet by Yudkowsky

[-]FlorianH10mo21

Essentially you seem to want more of the same of what we had for the past decades: more cheap goods and loss of production know-how and all that goes along with it. This feels a bit funny as (i) just in the recent years many economists, after having been dead-sure that old pattern would only mean great benefits, may not quite be so cool overall (covid exposing risky dependencies, geopolitical power loss, jobs...), and (ii) your strongman in power shows to what it leads if we only think of 'surplus' (even your definition) instead of things people actually care about more (equality, jobs, social security..).

You'd still be partly right if the world was so simple that handing the trade partners your dollars would just mean we reprint more of it. But instead, handing them your dollars gives them global power; leverage over all the remaining countries in the world, as they have now the capability to produce everything cheaply for any other country globally, plus your dollars to spend on whatever they like in the global marketplace for products and influence over anyone. In reality, your imagined free lunch isn't quite so free.

[-]StanislavKrym10mo10

The current definitions imply that the country with a trade surplus makes more value than the country consumes. In other words, the country with a trade surplus is more valuable to mankind, while the country with a trade deficit ends up becoming less self-reliant and less competent, as evidenced by the companies who moved a lot of factory work to Asia and ended up making the Asians more educated while reducing the capabilities of American industry. Or are we trying to reduce our considerations to short terms due to a potential rise of the AIs?

[-]Max H2y122

Related to We don’t trade with ants: we don't trade with AI.

The original post was about reasons why smarter-than-human AI might (not) trade with us, by examining an analogy between humans and ants.

But current AI systems actually seem more like the ants (or other animals), in the analogy of a human-ant (non-)trading relationship.

People trade with OpenAI for access to ChatGPT, but there's no way to pay a GPT itself to get it do something or perform better as a condition of payment, at least in a way that the model itself actually understands and enforces. (What would ChatGPT even trade for, if it were capable of trading?)

Note, an AutoGPT-style agent that can negotiate or pay for stuff on behalf of its creators isn't really what I'm talking about here, even if it works. Unless the AI takes a cut or charges a fee which accrues to the AI itself, it is negotiating on behalf of its creators as a proxy, not trading for itself in its own right.

A sufficiently capable AutoGPT might start trading for itself spontaneously as an instrumental subtask, which would count, but I don't expect current AutoGPTs to actually succeed at that, or even really come close, without a lot of human help.

Lack of sufficient object permanence, situational awareness, coherence, etc. seem like pretty strong barriers to meaningfully owning and trading stuff in a real way.

I think this observation is helpful to keep in mind when people talk about whether current AI qualifies as "AGI", or the applicability of prosaic alignment to future AI systems, or whether we'll encounter various agent foundations problems when dealing with more capable systems in the future.

[-]Max H3mo50

OK yeah, retatrutide is good. (previous / related: The Biochemical Beauty of Retatrutide: How GLP-1s Actually Work, 30 Days of Retatrutide, How To Get Cheap Ozempic. Usual disclaimers, YMMV and this is not medical advice or a recommendation.)

I am not quite overweight enough to be officially eligible for a prescription for tirzepatide or semaglutide, and I wasn't all that interested in them anyway given their (side) effects and mechanism of reducing metabolism.

I started experimenting with a low dose (1-2 mg / week) of grey-market retatrutide about a month ago, after seeing the clinical trial results and all the anecdata about how good it is. For me the metabolic effects were immediate: I get less hungry, feel fuller for longer after eating, and generally have more energy. I am also losing weight effortlessly (a bit less than 1 lb / week, after initially losing some water weight faster at the beginning), which was my original main motivation for trying it. I am hoping to lose another 10-15 lbs or so and then reduce or maintain whatever dose I need to stay at that weight.

The only negative side effects I have experienced so far are a slight increase in RHR (mid-high 60s -> low 70s), and a small / temporary patch of red, slightly itchy skin around the injection site. I work out with weights semi-regularly and haven't noticed much impact on strength one way or the other, nor have I noticed an impact on my sleep quality, which was / is generally good.

I also feel a little bad about benefiting from Eli Lilly's intellectual property without paying them for it, but there's no way for them to legally sell it or me to legally buy it from them right now. Probably when it is approved by the FDA I'll try to talk my way into an actual prescription for it, which I would be happy to pay $1000 / mo or whatever, for both peace of mind and ethical reasons.

(Grey market suppliers seem mostly fine risk-wise; it's not a particularly complicated molecule to manufacture if you're an industrial pharmaceutical manufacturer, and not that hard for independent labs to do QA testing on samples. The main risk of depending on these suppliers is that customs will crack down on importers / distributors and make it hard to get.)

The other risk is that long term use will have some kind of more serious negative side effect or permanently screw up my previously mostly-normal / healthy metabolism in some way, which won't be definitively knowable until longer-term clinical trials have completed. But the benefits I am getting right now are real and large, and carrying a bit less weight is likely to be good for my all-cause mortality even if there are some unknown long term risks. So all things considered it seems worth the risk for me, and not worth waiting multiple years for more clinical trial data.

Looking into all of this has definitely (further) radicalized me against the FDA + AMA and made me more pro-big pharma. The earliest that retatrutide is likely to be approved for prescription use is late 2026 or 2027, and initially it will likely only be approved / prescribed for use by people who are severely overweight, have other health problems, and / or have already tried other GLP-1s.

This seems like a massive waste of QALYs in expectation; there are likely millions of people with more severe weight and metabolism problems than me for whom the immediate benefits of taking reta would outweigh most possible long term risks or side effects. And the extremely long time that it takes to bring these drugs to market + general insanity of the prescription drug market and intellectual property rights for them in various jurisdictions pushes up the price that Lilly has to charge to recoup the development costs, which will hurt accessibility even once it is actually approved.

[-]Max H1d20

Medicine as an example of a failure of virtue ethics?

Epistemic status: kinda half-baked / not-confident claim

Context: I have been following @Richard_Ngo's recent writing about consequentialism and virtue with some interest, though the thoughts below aren't directly responding to anything in particular that he has written.

I think it's uncontroversial around here to say that the field of medicine as a whole is under-performing and inadequate relative to what it could be - many people are getting sub-optimal treatment and health outcomes, and lots of questionably-useful research is produced and cited, vs. what's theoretically / technologically possible given the collective resources spent on healthcare and research. Without getting into the specifics of what the failures and inadequacies of medicine are here though, I think it's interesting and maybe informative to view them through the lens of a systemic failure of virtue ethics.

By virtue and virtue ethics below, I mean (what I think is) a relatively standard conception of virtue - acting in accordance with principles that are generally regarded as good, noble, pro-social, etc. according to one's culture, in-group, and beliefs, with room for judicious flexibility based on experience and context, in contrast to deontology or consequentialism.

Richard lists "common-sense virtues like integrity, honor, kindness and dutifulness", but I think it is fair to take a slightly more expansive view on what can be classified as a virtue in medicine: respect for and universal adherence to established procedure, loss aversion, and safety-ism are generally viewed negatively around these parts, but they are important operating principles in various medical systems. I claim that these are relatively central examples of virtues by the definition above - practitioners who adhere to them are typically regarded as good, virtuous, and following "common-sense" by the general public in both abstract terms and when applied to concrete situations. They're more narrow than, say, "integrity", but they're still general and flexible enough to guide decision-making and evaluation in many different contexts.

And medical professionals and systems mostly do live up to their stated principles, so the problem is not a lack of virtue by individual participants - it's that the field as a whole has settled on the wrong virtues, with no good mechanism for self-correction. An unusually careful, analytical, and consequentialist thinker might notice in the moment that the virtues of medicine I listed sometimes conflict with deeper and more general virtues of kindness and integrity when put into practice and applied strictly, but I don't think that happens often enough for virtue-based decision-making to succeed in medicine.

[-]Max H3y*20

Using shortform to register a public prediction about the trajectory of AI capabilities in the near future: the next big breakthroughs, and the most capable systems within the next few years, will look more like generalizations of MuZero and Dreamer, and less like larger / better-trained / more efficient large language models.

Specifically, SoTA AI systems (in terms of generality and problem-solving ability) will involve things like tree search and / or networks which are explicitly designed and trained to model the world, as opposed to predicting text or generating images.

These systems may contain LLMs or diffusion models as components, arranged in particular ways to work together. This arranging may be done by humans or AI systems, but it will not be performed "inside" a current-day / near-future GPT-based LLM, nor via direct execution of the text output of such LLMs (e.g. by executing code the LLM outputs, or having the instructions for arrangement otherwise directly encoded in a single LLM's text output). There will recognizably be something like search or world modeling that happens outside or on top of a language model.

The reason I'm making this prediction is, I was listening to Paul Christiano's appearance on the Bankless podcast from a few weeks ago.

Around the 28:00 mark the hosts ask Paul if we should be concerned about AI developments from vectors other than LLM-like systems, broadly construed.

Paul's own answer is good and worth listening to on its own (up to the 33 minute mark), but I think he does leave out (or at least doesn't talk about it in this part of the podcast) the actual answer to the question, which is that, yes, there are other avenues of AI development that don't involve larger networks, more training data, and more generalized prediction and generation abilities.

I have no special / non-public knowledge about what is likely to be promising here (and wouldn't necessarily speculate if I did); but I get the sense that the zeitgeist among some people (not necessarily Paul himself) in alignment and x-risk focused communities, is that model-based RL systems and relatively complicated architectures like MuZero have recently been left somewhat in the dust by advances in LLMs. I think capabilities researchers absolutely do not see things this way, and they will not overlook these methods as avenues for further advancing capabilities. Alignment and x-risk focused researchers should be aware of this avenue, if they want to have accurate models of what the near future plausibly looks like.

Moderation Log

Rationality should not be painful.

The thoughts above are partially / vaguely in response to this post and its comment section about CFAR workshops, but also to some other hazy ideas that I've seen floating around lately.

A while ago, Eliezer wrote in the preface for the published version of the Sequences:

It ties in to the first-largest mistake in my writing, which was that I didn’t realize that the big problem in learning this valuable way of thinking was figuring out how to practice it, not knowing the theory. I didn’t realize that part was the priority; and regarding this I can only say “Oops” and “Duh.”

Yes, sometimes those big issues really are big and really are important; but that doesn’t change the basic truth that to master skills you need to practice them and it’s harder to practice on things that are further away. (Today the Center for Applied Rationality is working on repairing this huge mistake of mine in a more systematic fashion.)

And has also written:

Jeffreyssai inwardly winced at the thought of trying to pick up rationality by watching other people talk about it—

[-]kbear4mo30

all this sounds right to me.

the most benefit i've found personally has been from memetic rationality. for example:

"i notice that i am confused"
"0. something i haven't listed yet"
steelman / ITT
"what do i have? what do i want? how can i use the former to get the latter?"

some nearby memes that fall short:

list of fallacies (does not encourage further inquiry)
bayes' rule (hard to remember to check priors in advance. if anyone has operationalized this, please let me know!)
various social observations (e.g. "status games" theory is true, (and a useful frame for understanding specific interactions/reactions!) but not concrete enough to inspire rationality. it goes signal-signal-counter-signal... who can track it?!)

attending a workshop, practicing a bunch of rationality exercises from a handbook, discussing in a group setting, etc. would not have been particularly effective on its own, and potentially even detracting or at least distracting.

real rationality is something that only i can keep myself honest to. only i know if i'm really doing it: you must be able to testify from inner spirit.

[-]Viliam4mo20

I am also not claiming that the Valley of Bad Rationality is entirely fake. But I am saying it's not that big of a deal, and in any case the best way out is through. And also that "through" should feel natural / good / easy.

I guess it depends on what position you are starting from. Some people are way more fucked up than average.

To put it bluntly, we don't need another Ziz.

[-]Max H10mo12-2

New definitions:

Trade surplus: Country A has a bilateral "trade surplus" with Country B if Country A imports more tangible goods (cars, steel, electronics, etc.) from Country B than it exports back. In other words, Country A ends up with more real, physical items. Country B, meanwhile, ends up with more than it started with of something much less important: fiat currency (flimsy paper money) or 1s and 0s in a digital ledger (probably not even on a blockchain!).
If you extrapolate this indefinitely in a vacuum, Country A eventually accumulates all of Country B's tangible goods, while Country B is left with a big pile of paper. Sounds like a pretty sweet deal for Country A if you ask me.
It's OK if not everyone follows this explanation or believes it - they can tell it's the good one because it has "surplus" in the name. Surely everyone wants a surplus.
Trade deficit: Conversely, Country A has a "trade deficit" if it exports more tangible resources than it imports, and thus ends up with less goods on net. In return, it only receives worthless fiat currency from some country trying to hoard actual stuff for their own people. Terrible deal!
Again, if you don't totally follow, that's OK, just pay attention to the word "deficit". Everyone knows that deficits are bad and should be avoided.

[-]Mars_Will_Be_Ours10mo72

[-]Knight Lee10mo30

"The idea that countries can export and trade-surplus their way to wealth is a fascinating one. They're shipping goods to other countries for free. How then could they prosper more? AFAICT, by outsourcing the task of rewarding and elevating their own most productive citizens."
tweet by Yudkowsky

[-]FlorianH10mo21

[-]StanislavKrym10mo10

[-]Max H2y122

Lack of sufficient object permanence, situational awareness, coherence, etc. seem like pretty strong barriers to meaningfully owning and trading stuff in a real way.

[-]Max H3mo50

[-]Max H1d20

Medicine as an example of a failure of virtue ethics?

Epistemic status: kinda half-baked / not-confident claim

[-]Max H3y*20

The reason I'm making this prediction is, I was listening to Paul Christiano's appearance on the Bankless podcast from a few weeks ago.

Around the 28:00 mark the hosts ask Paul if we should be concerned about AI developments from vectors other than LLM-like systems, broadly construed.

Moderation Log