by Ruby
1 min read

10

This is a special post for quick takes by Ruby. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
119 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings
[-]Ruby559

Just thinking through simple stuff for myself, very rough, posting in the spirit of quick takes
 

  1. At present, we are making progress on the Technical Alignment Problem[2] and like probably could solve it within 50 years.

  2. Humanity is on track to build ~lethal superpowerful AI in more like 5-15 years.
  3. Working on technical alignment (direct or meta) only matters if we can speed up overall progress by 10x (or some lesser factor if AI capabilities is delayed from its current trajectory). Improvements of 2x are not likely to get us to an adequate technical solution in time.
  4. Working on slowing things down is only helpful if it results in delays of decades.
    1. Shorter delays are good in so far as they give you time to buy further delays.
  5. There is technical research that is useful for persuading people to slow down (and maybe also solving alignment, maybe not). This includes anything that demonstrates scary capabilities or harmful proclivities, e.g. a bunch of mech interp stuff, all the evals stuff.
  6. AI is in fact super powerful and people who perceive there being value to be had aren’t entirely wrong[3]. This results in a very strong motivation to pursue AI and resist efforts to be stop

... (read more)
Reply21111

"Cyborgism or AI-assisted research that gets up 5x speedups but applies differentially to technical alignment research"

How do you do you make meaningful progress and ensure it does not speed up capabilities?

It seems unlikely that a technique exists that is exclusively useful for alignment research and can't be tweaked to help OpenMind develop better optimization algorithms etc.

I basically agree with this:

People who want to speed up AI will use falsehoods and bad logic to muddy the waters, and many people won’t be able to see through it 

In other words, there’s going to be an epistemic war and the other side is going to fight dirty, I think even a lot of clear evidence will have a hard time against people’s motivations/incentives and bad arguments.

But I'd be more pessimistic than that, in that I honestly think pretty much every side will fight quite dirty in order to gain power over AI, and we already have seen examples of straight up lies and bad faith.

From the anti-regulation side, I remember Martin Casado straight up lying about mechanistic interpretability rendering AI models completely understood and white box, and I'm very sure that mechanistic interpretability cannot do what Martin Casado claimed.

I also remembered a16z lying a lot about SB1047.

From the pro-regulation side, I remembered Zvi incorrectly claiming that Sakana AI did instrumental convergence/recursive self-improvement, and as it turned out, the reality was far more mundane than that:

https://www.lesswrong.com/posts/ppafWk6YCeXYr4XpH/danger-ai-scientist-danger#AtXXgsws5DuP6Jxzx

Zvi the... (read more)

9Cole Wyeth
Though I am working on technical alignment (and perhaps because I know it is hard) I think the most promising route may be to increase human and institutional rationality and coordination ability. This may be more tractable than "expected" with modern theory and tools. Also, I don't think we are on track to solve technical alignment in 50 years without intelligence augmentation in some form, at least not to the point where we could get it right on a "first critical try" if such a thing occurs. I am not even sure there is a simple and rigorous technical solution that looks like something I actually want, though there is probably a decent engineering solution out there somewhere.
5davekasten
I think this can be true, but I don't think it needs to be true: I suspect that if the government is running the at-all-costs-top-national-priority Project, you will see some regulations stop being enforceable.  However, we also live in a world where you can easily find many instances of government officials complaining in their memoirs that laws and regulations prevented them from being able to go as fast or as freely as they'd want on top-priority national security issues.  (For example, DoD officials even after 9-11 famously complained that "the lawyers" restricted them too much on top-priority counterterrorism stuff.)  
2Nathan Helm-Burger
Yes, this is a good point. We need a more granular model than a binary 'all the same laws will apply to high priority national defense projects as apply to tech companies' versus 'no laws at all will apply'.
4Lao Mein
I have a few questions. 1. Can you save the world in time without a slowdown in AI development if you had a billion dollars? 2. Can you do it with a trillion dollars? 3. If so, why aren't you trying to ask the US Congress for a trillion dollars? 4. If it's about a lack of talent, do you think Terrance Tao can make significant progress on AI alignment if he actually tried? 5. Do you think he would be willing to work on AI alignment if you offered him a trillion dollars?
5Amalthea
Interestingly, Terence Tao has recently started thinking about AI, and his (publicly stated) opinions on it are ... very conservative? I find he mostly focuses on the capabilities that are already here and doesn't really extrapolate from it in any significant way.
1O O
Really? He seems pretty bullish. He thinks it will co author math papers pretty soon. I think he just doesn’t think or at least state his thoughts on implications outside of math.
1Amalthea
He's clearly not completely discounting that there's progress, but overall it doesn't feel like he's "updating all the way": This is a recent post about the deepmind math olympiad results: https://mathstodon.xyz/@tao/112850716240504978 "1. This is great work, shifting once again our expectations of which benchmark challenges are within reach of either #AI-assisted or fully autonomous methods"
4Ruby
Money helps. I could probably buy a lot of dignity points for a billion dollars. With a trillion variance definitely goes up because you could try crazy stuff and could backfire. (I mean true for a billion too). But EV of such a world is better.  I don't think there's anything that's as simple as writing a check though. US Congress gives money to specific things. I do not have a specific plan for a trillion dollars. I'd bet against Terrance Tao being some kind of amazing breakthrough researcher who changes the playing field.
4Raemon
My answer (and I think Ruby's) answer to most of these questions is "no", but What Money Cannot Buy reasons, as well as "geniuses don't often actually generalize and are hard to motivate with money."
4jmh
I really like the observation in your Further Thoughts point. I do think that is a problem people need to look at as I would guess many will view the government involvement from a acting in public interests view rather than acting in either self interest (as problematic as that migh be when the players keep changing) or from a special interest/public choice perspective. Probably some great historical analysis already written about events in the past that might serve as indicators of the pros and cons here. Any historians in the group here?
4Ruby
Not an original observation but yeah, separate from whether it's desirable, I think we need to be planning for it.
[-]Ruby301

There is a now a button to say "I didn't like this recommendation, show fewer like it"




Clicking it will:

  • update the recommendation engine that you strongly don't like this recommendation
  • store analytics data for the LessWrong team to know that you didn't like this post (we won't look at your displayname, just random id). This will hopefully let us understand trends in bad recommendations
  • hide the post item from posts lists like Enriched/Latest Posts/Recommended. It will not hide it from user profile pages, Sequences, etc
8Ruby
You can now also dislike a recommendation in the triple-dot post actions menu. This handles cases when the post title is too long to leave room for an icon there, and on small screens.  
[-]Ruby271

Seeking Beta Users for LessWrong-Integrated LLM Chat

Comment here if you'd like access. (Bonus points for describing ways you'd like to use it.)


A couple of months ago, a few of the LW team set out to see how LLMs might be useful in the context of LW.  It feels like they should be at some point before the end, maybe that point is now. My own attempts to get Claude to be helpful for writing tasks weren't particularly succeeding, but LLMs are pretty good at reading a lot of things quickly, and also can be good at explaining technical topics.

So I figured just making it easy to load a lot of relevant LessWrong context into an LLM might unlock several worthwhile use-cases. To that end, Robert and I have integrated a Claude chat window into LW, with the key feature that it will automatically pull in relevant LessWrong posts and comments to what you're asking about.

I'm currently seeking beta users. 

Since using the Claude API isn't free and we haven't figured out a payment model, we're not rolling it out broadly. But we are happy to turn it on for select users who want to try it out. 

Comment here if you'd like access. (Bonus points for describing ways you'd like to use it.)

5Ruby
@Chris_Leong @Jozdien @Seth Herd @the gears to ascension @ProgramCrafter  You've all been granted to the LW integrated LLM Chat prototype. Cheers!
3Ruby
Oh, you access it with the sparkle button in the bottom right:  
4Ruby
@Neel Nanda @Stephen Fowler @Saul Munn – you've been added. I'm hoping to get a PR deployed today that'll make a few improvements: - narrow the width so doesn't overlap the post on smaller screens than before - load more posts into the context window by default - upweight embedding distance relative to karma in the embedding search for relevant context to load in - various additions to the system response to improve tone and style
8Saul Munn
great! how do i access it on mobile LW?
2Ruby
Not available on mobile at this time, I'm afraid.
1Saul Munn
gotcha. what would be the best way to send you feedback? i could do: * comments here * sent directly to you via LW DM, email, [dm through some other means] or something else if that's better (while it's top-of-mind: the feedback that generated this question was that the chat interface pops up every single time open a tab of LW, including every time i open a post in a new tab. this gets really annoying very quickly!)
2Ruby
Cheers! Comments here are good, so is LW DM, or Intercom.
5Jozdien
I'm interested. I once tried a much more rudimentary LW-LLM integration with a GPT-4 Discord bot and it never felt quite right, so I'd be very interested in seeing what a much better version looks like.
5Seth Herd
I'm interested. I'll provide feedback, positive or negative, like I have on other site features and proposed changes. I'd be happy to pay on almost any payment model, at least for a little while. I have a Cause subscription fwiw. I'd use it to speed up researching prior related work on LW for my posts. I spend a lot of time doing this currently.
5Chris_Leong
I'd like access. TBH, if it works great I won't provide any significant feedback, apart from "all good" But if it annoys me in any way I'll let you know. For what it's worth, I have provided quite a bit of feedback about the website in the past. I want to see if it helps me with my draft document on proposed alignment solutions: https://docs.google.com/document/d/1Mis0ZxuS-YIgwy4clC7hKrKEcm6Pn0yn709YUNVcpx8/edit#heading=h.u9eroo3v6v28
4Ruby
Sounds good! I'd recommend pasting in the actual contents together with a description of what you're after.
5the gears to ascension
Interested! I would pay at cost if that was available. I'll be asking about which posts are relevant to a question, misc philosophy questions and asking for Claude to challenge me, etc. Primarily interested if I can ask for brevity using a custom prompt, in the system prompt.
5Tetraspace
I'd like beta access. My main use case is that I intend to write up some thoughts on alignment (Manifold gives 40% that I'm proud of a write-up, I'd like that number up), and this would be helpful for literature review and finding relevant existing work. Especially so because a lot of the public agent foundations work is old and migrated from the old alignment forum, where it's low-profile compared to more recent posts.
7Ruby
Added!
4Neel Nanda
I'd be interested! I would also love to see the full answer to why people care about SAEs
2Ruby
Added! That's been one of my go-to questions for testing variations of the system, I'd suggest just trying it yourself.
4Stephen Fowler
I'd like access to it. 
4dxu
I'm interested! Also curious as to how this is implemented; are you using retrieval-augmented generation, and if so, with what embeddings?
4Ruby
You are added! Claude 3.5 Sonnet is the chat client, and yes, with RAG using OpenAI text-embedding-3-large for embeddings.
4RobinGoins
Interested! Unsure how I'll use it; will need to play around with it to figure that out. But in general, I like asking questions while reading things to stay engaged and I'm very interested to see how it goes with an LLM that's loaded up with LW context.
3Ruby
Added!
3Saul Munn
i’d love access! my guess is that i’d use it like — elicit:research papers::[this feature]:LW posts
3quila
i'm interested in using it for literature search
4Ruby
I'll add you now, though I'm in the middle of some changes that should make it better for lit search.
3ProgramCrafter
I'm interested! I, among other usage, hope to use it for finding posts exploring similar topics by different names. By the way, I have an idea what to use instead of a payment model: interacting with user's local LLM like one started within LM Studio. That'd require a checkbox/field to enter API URL, some recommendations on which model to use and working out how to reduce amount of content fed into model (as user-run LLM seem to have smaller context windows than needed).
2Garrett Baker
Oh I didn’t see this! I’d like access, in part because its pretty common I try to find a LessWrong post or comment, but the usual search methods don’t work. Also because it seems like a useful way to explore the archives.
2Ruby
Added!
1Dalcy
I'd also love to have access!
2Ruby
Added!
1Mateusz Bagiński
I'd love to try it, mainly thinking about research (agent foundations and AI safety macrostrategy).
2Ruby
Your access should be activated within 5-10 minutes. Look for the button in the bottom right of the screen.
1jenn
I'm interested if you're still adding folks. I run local rationality meetups, this seems like a potentially interesting way to find readings/topics for meetups (e.g. "find me three readings with three different angles on applied rationality", "what could be some good readings to juxtapose with burdens by scott alexander", etc.)
3Ruby
Added! (Can take a few min to activate though) My advice is for each one of those, ask in it in a new separate/fresh chat because it'll only a do single search per chat.
1dirk
I'm interested! I'd probably mostly be comparing it to unaugmented Claude for things like explaining ML topics and turning my post ideas into drafts (I don't expect it to be great at this latter but I'm curious whether having some relevant posts in the context window will elicit higher quality). I also think the low-friction integration might make it useful for clarifying math- or programming-heavy posts, though I'm not sure I'll want this often.
2Ruby
You now have access to the LW LLM Chat prototype! That's actualy one of my favorite use-cases
1Sheikh Abdur Raheem Ali
I'd love to have early access. I will probably give feedback on bugs in the implementation before it is rolled out to more users, and am happy to use my own API keys.
3Ruby
You've been granted access to the LW LLM Chat prototype!  No need to provide API key (we haven't even set that up, I was just explaining why we having people manually request access rather than make it immediately available more broadly.
2[comment deleted]
[-]Ruby210

Selected Aphorisms from Francis Bacon's Novum Organum

I'm currently working to format Francis Bacon's Novum Organum as a LessWrong sequence. It's a moderate-sized project as I have to work through the entire work myself, and write an introduction which does Novum Organum justice and explains the novel move of taking an existing work and posting in on LessWrong (short answer: NovOrg is some serious hardcore rationality and contains central tenets of the LW foundational philosophy notwithstanding being published back in 1620, not to mention that Bacon and his works are credited with launching the modern Scientific Revolution)

While I'm still working on this, I want to go ahead and share some of my favorite aphorisms from is so far:

3. . . . The only way to command reality is to obey it . . .

9. Nearly all the things that go wrong in the sciences have a single cause and root, namely: while wrongly admiring and praising the powers of the human mind, we don’t look for true helps for it.

Bacon sees the unaided human mind as entirely inadequate for scientific progress. He sees for the way forward for scientific progress as constructing tools/infrastructure/methodogy t... (read more)

Please note that even things written in 1620 can be under copyright. Not the original thing, but the translation, if it is recent. Generally, every time a book is modified, the clock starts ticking anew... for the modified version. If you use a sufficiently old translation, or translate a sufficiently old text yourself, then it's okay (even if a newer translation exists, if you didn't use it).

6Raemon
Yup – Ruby/habryka specifically found a translation that we're allowed to post.
3jp
I'm a complete newcomer to information on Bacon and his time. How much of his influence was due to Novum Organum itself vs other things he did? If significantly the latter, what were those things? Feel free to tell me to Google that.
4habryka
At the very least "The New Atlantis", a fictional utopian novel he wrote, was quite influential, at least in that it's usually cited as one of the primary inspirations for the founding of the royal society: https://en.wikipedia.org/wiki/New_Atlantis#Influences
1RRBB
test
[-]Ruby184

The “Deferred and Temporary Stopping” Paradigm

Quickly written. Probably missed where people are already saying the same thing.

I actually feel like there’s a lot of policy and research effort aimed at slowing down the development of powerful AI–basically all the evals and responsible scaling policy stuff.

A story for why this is the AI safety paradigm we’ve ended up in is because it’s palatable. It’s palatable because it doesn’t actually require that you stop. Certainly, it doesn’t right now. To the extent companies (or governments) are on board, it’s because those companies are at best promising “I’ll stop later when it’s justified”. They’re probably betting that they’ll be able to keep arguing it’s not yet justified. At the least, it doesn’t require a change of course now and they’ll go along with it to placate you.

Even if people anticipate they will trigger evals and maybe have to delay or stop releases, I would bet they’re not imagining they have to delay or stop for all that long (if they’re even thinking it through that much). Just long enough to patch or fix the issue, then get back to training the next iteration. I'm curious how many people imagine that once certain evaluatio... (read more)

[-]Ruby140

Why I'm excited by the 2018 Review

I generally fear that perhaps some people see LessWrong as a place where people just read and discuss "interesting stuff", not much different from a Sub-Reddit on anime or something. You show up, see what's interesting that week, chat with your friends. LessWrong's content might be considered "more healthy" relative to most internet content and many people say they browse LessWrong to procrastinate but feel less guilty about than other browsing, but the use-case still seems a bit about entertainment.

None of the above is really a bad thing, but in my mind, LessWrong is about much more than a place for people to hang out and find entertainment in sharing joint interests. In my mind, LessWrong is a place where the community makes collective progress on valuable problems. It is an ongoing discussion where we all try to improve our understanding of the world and ourselves. It's not just play or entertainment– it's about getting somewhere. It's as much like an academic journal where people publish and discuss important findings as it is like an interest-based sub-Reddit.

And all this makes me really... (read more)

[-]Ruby140

Communal Buckets

A bucket error is when someone erroneously lumps two propositions together, e.g. I made a spelling error automatically entails I can't be great writer, they're in one bucket when really they're separate variables.

In the context of criticism, it's often mentioned that people need to learn to not make the bucket error of I was wrong or I was doing a bad thing -> I'm a bad person. That is, you being a good person is compatible with making mistakes, being wrong, and causing harm since even good people make mistakes. This seems like a right and true and a good thing to realize.

But I can see a way in which being wrong/making mistakes (and being called out for this) is upsetting even if you personally aren't making a bucket error. The issue is that you might fear that other people have the two variables collapsed into one. Even if you might realize that making a mistake doesn't inherently make you a bad person, you're afraid that other people are now going to think you are a bad person because they are making that bucket error.

The issue isn't your own buckets, it's that you have a model of the shared "communal buck... (read more)

4Dagon
"did a bad thing" -> "bad person" may not be a bucket error, it may be an actual inference (if "bad person" is defined as "person who does bad things"), or a useless category (if "bad person" has no actual meaning). This question seems to be "fear of attribution error". You know you have reasons for things you do, others assume you do things based on your nature.
4Ruby
Yeah, I think the overall fear would be something like "I made a mistake but now overall people will judge me as a bad person" where "bad person" is above some threshold of doing bad. Indeed, each bad act is an update towards the threshold, but the fear is that in the minds of others, a single act will be generalized and put you over. The "fear of attribution error" seems on the mark to me.
[-]Ruby132

As noted in an update on LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!"), yesterday we started an AB test on some users automatically being switched over to the Enriched [with recommendations] Latest Posts feed.

The first ~18 hours worth of data does seem like a real uptick in clickthrough-rate, though some of that could be novelty.

(examining members of the test (n=921) and control groups (n~=3000) for the last month, the test group seemed to have a slightly (~7%) lower clickthrough-rate baseline, I haven't investigated this)

However the specific posts that people are clicking on don't feel on the whole like the ones I was most hoping the recommendations algorithm would suggest (and get clicked on). It feels kinda like there's a selection towards clickbaity or must-read news (not completely, just not as much as I like). 

If I look over items recommended by Shoggoth that are older (50% are from last month, 50% older than that), they feel better but seem to get fewer clicks.
 



A to-do item is to look at voting behavior relative to clicking behavior. Having clicked on these items, do people upvote them as much as others? 

I'm also wanting to experiment with just a... (read more)

[-]Ruby130

It feels like the society I interact with dislikes expression of negative emotions, at least in the sense that expressing negative emotions is kind of a big deal - if someone expresses a negative feeling, it needs to be addressed (fixed, ideally). The discomfort with negative emotions and consequent response acts to a fair degree to suppress their expression. Why mention something you're a little bit sad about if people are going to make a big deal out of it and try to make you feel better, etc., etc.?


Related to the above (with an ambiguously directed causal arrow) is that we lack reliable ways to communicate about negative emotions with something like nuance or precision. If I think imagine starting a conversation with a friend by saying "I feel happy", I expect to be given space to clarify the cause, nature, and extent of my happiness. Having clarified these, my friend will react proportionally. Yet when I imagine saying "I feel sad", I expect this to be perceived as "things are bad, you need sympathy, support, etc." and the whole stage of "clarify cause, nature, extent" is skipped instead proceeding to a fairly large reaction.


And I wi... (read more)

2Viliam
The specific details are probably gender-specific. Men are supposed to be strong. If they express sadness, it's like a splash of low status and everyone is like "ugh, get away from me, loser, I hope it's not contagious". On the other hand, if they express anger, people get scared. So men gradually learn to suppress these emotions. (They also learn that words "I would really want you to show me your true feelings" are usually a bait-and-switch. The actual meaning of that phrase is that the man is supposed to perform some nice emotion, probably because his partner feels insecure about the relationship and wants to be reassured.) Women have other problems, such as being told to smile when something irritates them... but this would be more reliably described by a woman. But in general, I suppose people simply do not want to empathize with bad feelings; they just want them to go away. "Get rid of your bad feeling, so that I am not in a dilemma to either empathize with you and feel bad, or ignore you and feel like a bad person." A good reaction would be something like: "I listen to your bad emotion, but I am not letting myself get consumed by it. It remains your emotion; I am merely an audience." Perhaps it would be good to have some phrase to express that we want this kind of reaction, because from the other side, providing this reaction unprompted can lead to accusations of insensitivity. "You clearly don't care!" (By feeling bad when other people feel bad we signal that we care about them. It is a costly signal, because it makes us feel bad, too. But in turn, the cost is why we provide all kinds of useless help just to make it go away.)
[-]Ruby120

Just a thought: there's the common advice that fighting all out with the utmost desperation makes sense for very brief periods, a few weeks or months, but doing so for longer leads to burnout. So you get sayings like "it's a marathon, not a sprint." But I wonder if length of the "fight"/"war" isn't the only variable in sustainable effort. Other key ones might be the degree of ongoing feedback and certainty about the cause.

Though I expect a multiyear war which is an existential threat to your home and family to be extremely taxing, I imagine soldiers experiencing less burnout than people investing similar effort for a far-mode cause, let's say global warming which might be happening, but is slow and your contributions to preventing it unclear. (Actual soldiers may correct me on this, and I can believe war is very traumatizing, though I will still ask how much they believed in the war they were fighting.)

(Perhaps the relevant variables here are something like Hanson's Near vs Far mode thinking, where hard effort for far-mode thinking more readily leads to burnout than near-mode thinking even when sustained for long periods.)

Then... (read more)

[-]Ruby120

A random value walks into a bar. A statistician swivels around in her chair, one tall boot unlaced and an almost full Manhattan sitting a short distance from her right elbow.

"I've been expecting you," she says.

"Have you been waiting long?" respond the value.

"Only for a moment."

"Then you're very on point."

"I've met enough of your kind that there's little risk of me wasting time."

"I assure you I'm quite independent."

"Doesn't mean you're not drawn from the same mold."

"Well, what can I do for you?"

"I was hoping to gain your confidence..."

[-]Ruby110

Some Thoughts on Communal Discourse Norms

I started writing this in response to a thread about "safety", but it got long enough to warrant breaking out into its own thing.

I think it's important to people to not be attacked physically, mentally, or socially. I have a terminal preference over this, but also think it's instrumental towards truth-seeking activities too. In other words, I want people to actually be safe.

  • I think that when people feel unsafe and have defensive reactions, this makes their ability to think and converse much worse. It can push discussion from truth-seeking exchange to social war.
    • Here I think mr-hire has a point: if you don't address people's "needs" overtly, they'll start trying to get them covertly, e.g. trying to win arguments for the sake of protecting their reputation rather than trying to get to the truth. Doing things like writing hasty scathing replies rather slow, carefully considered ones (*raises hand*), and worse, feeling righteous anger while doing so. Having thoughts like "the only reason my interlocutor could think X is because they are obtuse due to their biases" rather than "maybe th
... (read more)
[-]Ruby102

There's an age old tension between ~"contentment" and ~"striving" with no universally accepted compelling resolution, even if many people feel they have figured it out. Related:

In my own thinking, I've been trying to ground things out in a raw consequentialism that one's cognition (including emotions) is just supposed to take you towards more value (boring, but reality is allowed to be)[1].

I fear that a lot of what people do is ~"wireheading". The problem with wireheading is it's myopic. You feel good now (small amount of value) at the expense of greater value later. Historically, this has made me instinctively wary of various attempts to experience more contentment such as gratitude journaling. Do such things curb the pursuit of value in exchange for feeling better less unpleasant discontent in the moment?

Clarity might come from further reduction of what "value" is. The primary notion of value I operate with is preference satisfaction: the world is how you want it to be. But also a lot of value seems to flow through experience (and the preferred state of the world is one where certain experiences happen).

A model whereby gratitude journaling (or general "attend to what is good" mot... (read more)

7tailcalled
I think gratitude also has value in letting you recognize what is worth maintaining and what has historically shown itself to have lots of opportunities and therefore in the future may have opportunities too.
2Ruby
Once I'm rambling, I'll note another thought I've been mulling over: My notion of value is not the same as the value that my mind was optimized to pursue. Meaning that I ought to be wary that typical human thought patterns might not be serving me maximally. That's of course on top of the fact that evolution's design is flawed even by its own goals; humans rationanlize left, right, and center, are awfully myopic, and we'll likely all die because of it.
1ABlue
I don't think wireheading is "myopic" when it overlaps with self-maintenance. Classic example would be painkillers; they do ~nothing but make you "feel good now" (or at least less bad), but sometimes feeling less bad is necessary to function properly and achieve long-term value. I think that gratitude journaling is also part of this overlap area. That said I don't know many peoples' experiences with it so maybe it's more prone to "abuse" than I expect.
4Ruby
Yeah, I think a question is whether I want to say "that kind of wireheading isn't mypoic" vs "that isn't wireheading". Probably fine eitherway if you're consistent / taboo adequately.
[-]Ruby100

Hypothesis that becomes very salient from managing the LW FB page: "likes and hearts" are a measure of how much people already liked your message/conclusion*.

*And also like how well written/how alluring a title/how actually insightful/how easy to understand, etc. But it also seems that the most popular posts are those which are within the Overton window, have less inferential distance, and a likable message. That's not to say they can't have tremendous value, but it does make me think that the most popular posts are not going to be the same as the most valuable posts + optimizing for likes is not going to be same as optimizing for value.

**And maybe this seems very obvious to many already, but it just feels so much more concrete when I'm putting three posts out there a week (all of which I think are great) and seeing which get the strongest response.

***This effect may be strongest at the tails.

****I think this effect would affect Gordon's proposed NPS-rating too.

*****I have less of this feeling on LW proper, but definitely far from zero.

[-]Ruby100

Narrative Tension as a Cause of Depression

I only wanted to budget a couple of hours for writing today. Might develop further and polish at a later time.

Related to and an expansion of Identities are [Subconscious] Strategies

Epistemic status: This is non-experimental psychology, my own musings. Presented here is a model derived from thinking about human minds a lot over the years, knowing many people who’ve experienced depression, and my own depression-like states. Treat it as a hypothesis, see if matches your own data and it generates helpful suggestions.

Clarifying “narrative”

In the context of psychology, I use the term narrative to describe the simple models of the world that people hold to varying degrees of implicit vs explicit awareness. They are simple in the sense of being short, being built of concepts which are basic to humans (e.g. people, relationships, roles, but not physics and statistics), and containing unsophisticated blackbox-y causal relationships like “if X then Y, if not X then not Y.”

Two main narratives

I posit that people carry two primary kinds of narratives in their minds:

  • Who I am (the role they are playing), and
  • How my life will go (the progress of their life)

T... (read more)

5RobinGoins
This is aligned with my thoughts on the importance of narratives, especially personal narratives. The best therapists are experts at helping pull out your stories - they ask many, many questions and function as working memory, so you can better see the shapes of your stories and what levers exist to mold them differently. (We have a word for those who tell stories - storyteller - but do we have a word for experts at pulling stories out of others?)
4Jakob_J
A related concept in my view is that of agency, as in how much I feel I am in control of my own life. I am not sure what is the cause and what is the effect, but I have noticed that during periods of depression I feel very little agency and during more happy periods I feel a lot more agency over my life. Often, focusing on the things I can control in my life (exercise, nutrition, social activities) over things I can't (problems at work) allows me to recover from depression a lot faster.
1Pattern
This can also be a standard, what someone considers a bare minimum, whether it's x amount of good things a, b, and c, or x amount of growth in areas a, b and c.
[-]Ruby100

Over the years, I've experienced a couple of very dramatic yet rather sudden and relatively "easy" shifts around major pain points: strong aversions, strong fears, inner conflicts, or painful yet deeply ingrained beliefs. My post Identities are [Subconscious] Strategies contains examples. It's not surprising to me that these are possible, but my S1 says they're supposed to require a lot of effort: major existential crises, hours of introspection, self-discovery journeys, drug trips, or dozens of hours with a therapist.

Have recently undergone a really big one, I noted my surprise again. Surprise, of course, is a property of bad models. (Actually, the recent shift occurred precisely because of exactly this line of thought: I noticed I was surprised and dug in, leading to an important S1 shift. Your strength as a rationalist and all that.) Attempting to come up with a model which wasn't as surprised, this is what I've got:

The shift involved S1 models. The S1 models had been there a long time, maybe a very long time. When that happens, they begin to seem how the world just *is*. If emotions arise from those models, and those models are so entrenched t... (read more)

4Kaj_Sotala
Unlocking the Emotional Brain is basically about this.

The LessWrong admins are often evaluating whether users (particularly new users) are going to be productive members of the site vs are just really bad and need strong action taken.

A question we're currently disagreeing on is which pieces of evidence it's okay to look at in forming judgments. Obviously anything posted publicly. But what about:

- Drafts (admins often have good reason to look at drafts, so they're there)
- Content the user deleted
- The referring site that sent someone to LessWrong

I'm curious how people feel about moderators looking at those.

Alt... (read more)

I want to clarify the draft thing:

In general LW admins do not look at drafts, except when a user has specifically asked for help debugging something. I indeed care a lot about people feeling like they can write drafts without an admin sneaking a peak.

The exceptions under discussion are things like "a new user's first post or comment looks very confused/crackpot-ish, to the point where we might consider banning the user from the site. The user has some other drafts. (I think a central case here is a new user shows up with a crackpot-y looking Theory of Everything. The first post that they've posted publicly looks sort of borderline crackpot-y and we're not sure what call to make. A thing we've done sometimes is do a quick skim of their other drafts to see if they're going in a direction that looks more reassuring or "yeah this person is kinda crazy and we don't want them around.")

I think the new auto-rate-limits somewhat relax the need for this (I feel a bit more confident that crackpots will get downvoted, and then automatically rate limited, instead of something the admins have to monitor and manage). I think I'd have defended the need to have this tool in the past, but it might b... (read more)

2Dagon
I apologize if I implied that the mods were routinely looking at private data without reason - I do, in fact, trust your intentions very deeply, and I'm sad when my skepticism about the ability to predict future value bleeds over into making your jobs harder. I wonder if the missing feature might be a status for "post approval required" - if someone triggers your "probably a crackpot" intuition, rather than the only options being "ban" or "normal access" have a "watchlist" option, where posts and comments have a 60-minute delay before becoming visible (in addition to rate limiting).  The only trustworthy evidence about future posts is the posts themselves - drafts or deleted things only show that they have NOT decided to post that. Note that I don't know how big a problem this is.  I think that's a great credit to the mods - you're removing the truly bad before I notice it, and leaving some not-great-but-not-crackpot, which I think is about right.  This makes it very hard for me to be confident in any opinions about whether you're putting too much work into prior-censorship or not.  
7Rafael Harth
I'm emotionally very opposed to looking at drafts of anyone, though this is not a rationally thought out position. I don't have the same reaction toward votes because I don't feel like you have an expectation of privacy there. There are forums where upvotes are just non-anonymous by default.
3Nathan Young
Ruby, why doesn't your shortform have agree/disagreevote?
2Raemon
It was made in the past and we hadn't gotten around to updating all shortforms to use the new voting system.
2Max H
Personal opinion: it's fine and good for the mods to look at all available evidence when making these calls, including votes and vote patterns. If someone is borderline, I'd rather they be judged based on all available info about them, and I think the more data the mods look at more closely, the more accurate and precise their judgments will be. I'm not particularly worried about a moderator being incorrectly "biased" from observing a low-quality draft or a suspect referral; I trust the mods to be capable of making roughly accurate Bayesian updates based on those observations. I also don't think there's a particularly strong expectation or implicit promise about privacy (w.r.t mods; of course I don't expect anyone's votes or drafts to be leaked to the public...) especially for new / borderline users. Separately, I feel like the precise policies and issues here are not worth sweating too much, for the mods / LW team. I think y'all are doing a great job overall, and it's OK if the moderation policy towards new users is a bit adhoc / case-by-case. In particular, I don't expect anything in the neighborhood of current moderation policies / rate-limiting / privacy violations currently implemented or being discussed to have any noticeable negative effects, on me personally or on most users. (In particular, I disagree pretty strongly with the hypothesis in e.g. this comment; I don't expect rate limits or any other moderation rules / actions to have any impact whatsoever on my own posting / commenting behavior, and I don't give them any thought when posting or commenting myself. I suspect the same is true for most other users, who are either unaware of them or don't care / don't notice.)
2Dagon
How frequent are moderation actions?  Is this discussion about saving moderator effort (by banning someone before you have to remove the rate-limited quantity of their bad posts), or something else?  I really worry about "quality improvement by prior restraint" - both because low-value posts aren't that harmful, they get downvoted and ignored pretty easily, and because it can take YEARS of trial-and-error for someone to become a good participant in LW-style discussions, and I don't want to make it impossible for the true newbies (young people discovering this style for the first time) to try, fail, learn, try, fail, get frustrated, go away, come back, and be slightly-above-neutral for a bit before really hitting their stride. Relatedly: I'm struck that it seems like half or more of posts get promoted to frontpage (if the /allPosts list is categorizing correctly, at least).  I can't see how many posts are deleted, of course, but I wonder if, rather than moderation, a bit more option in promotion/depromotion would help.  If we had another category (frontpage, personal, and random), and mods moved things both up and down pretty easily, it would make for lower-stakes decisionmaking, and you wouldn't have to ban anyone unless they're making lots of work for mods even after being warned (or are just pure spam, which doesn't seem to be the question triggering this discussion).
2DragonGod
I agree with Dagon here. Six years ago after discovering HPMOR and reading part (most?) of the Sequences, I was a bad participant in old LW and rationalist subreddits. I would probably have been quickly banned on current LW. It really just takes a while for people new to LW like norms to adjust.
2Dagon
Can you formalize the threat model a bit more?  What is the harm you're trying to prevent with this predictive model of whether a user (new or not) will be "productive" or "really bad"?  I'm mostly interested in your cost estimates for false positive/negative and your error bars for the information you have available.  Also, how big is the gap between "productive" and "really bad".  MOST users are neither - they're mildly good to mildly bad, with more noise than signal to figure out the sign.   The bayesean in me says "use all data you have", but the libertarian side says "only use data that the target would expect to be used", and even more "I don't believe you'll USE the less-direct data to reach correct conclusions".  For example, is it evidence of responsibility that someone deleted a bad comment, or evidence of risk that they wrote it in the first place? I DO strongly object to differential treatment of new users.  Long-term users have more history to judge them on, but aren't inherently different, and certainly shouldn't have more expectation of privacy.  I do NOT strongly object to a clear warning that drafts, deleted comments, and DMs are not actually private, and will often be looked at by site admins.  I DO object to looking at them without the clear notice that LW is different than a naive expectation in this regard. I should say explicitly: I have VERY different intuitions of what's OK to look at routinely for new users (or old) in a wide-net or general policy vs what's OK to look at if you have some reason (a complaint or public indication of suspicious behavior) to investigate an individual.  I'd be very conservative on the former, and pretty darn detailed on the latter. I think you're fully insane (or more formally, have an incoherent privacy, threat, and prediction model) if you look at deleted/private/draft messages, and ignore voting patterns.  


I want to register a weak but nonzero prediction that Anthropic’s interpretability publication of A Mathematical Framework for Transformers Circuits will turn out to lead to large capabilities gains and in hindsight will be regarded as a rather bad move that it was published.

Something like we’ll have capabilities-advancing papers citing it and using its framework to justify architecture improvements.

4the gears to ascension
Agreed, and I don't think this is bad, nor that they did anything but become the people to implement what the zeitgeist demanded. It was the obvious next step, if they hadn't done it, someone else who cared less about trying to use it to make systems actually do what humans want would have done it. So the question is, are they going to release their work for others to use, or just hoard it until someone less scrupulous releases their models? It's looking like they're trying to keep it "in the family" so only corporations can use it. Kinda concerning. If human understandability hadn't happened, the next step might have been entirely automated sparsification, and those don't necessarily produce anything humans can use to understand. Distillation into understandable models is an extremely powerful trajectory.
2riceissa
Not the same paper, but related: https://twitter.com/jamespayor/status/1634447672303304705

Edit: I thought this distinction must have been pointed out somewhere. I see it under Raw of Law vs Rule of Man

Law is Ultimate vs Judge is Ultimate

Just writing up a small idea for reference elsewhere. I think spaces can get governed differently on one pretty key dimension, and that's who/what is supposed to driving the final decision.

Option 1: What gets enforced by courts, judgeds, police, etc in countries is "the law" of various kinds, e.g. the Constitution. Lawyers and judges attempt to interpret the law and apply it in given circumstances, often with re... (read more)

PSA:

Is Slack your primary coordination tool with your coworkers?

If you're like me, you send a lot of messages asking people for information or to do things, and if your coworkers are resource-limited humans like mine, they won't always follow-up on the timescale you need.

How do you ensure loops get closed without maintaining a giant list of unfinished things in your head?

I used Slacks remind-me feature extensively. Whenever I send a message that I want to follow-up on if the targeted party doesn't get back to me within a certain time frame, I set a reminde... (read more)

Great quote from Francis Bacon (Novum Organum Book 2:8):

Don’t be afraid of large numbers or tiny fractions. In dealing with numbers it is as easy to write or think a thousand or a thousandth as to write or think one.

Converting this from a Facebook comment to LW Shortform.

A friend complains about recruiters who send repeated emails saying things like "just bumping this to the top of your inbox" when they have no right to be trying to prioritize their emails over everything else my friend might be receiving from friends, travel plans, etc. The truth is they're simply paid to spam.

Some discussion of repeated messaging behavior ensued. These are my thoughts:

I feel conflicted about repeatedly messaging people. All the following being factors in this conflict... (read more)

For my own reference.

Brief timeline of notable events for LW2:

  • 2017-09-20 LW2 Open Beta launched
  • (2017-10-13 There is No Fire Alarm published)
  • (2017-10-21 AlphaGo Zero Significance post published)
  • 2017-10-28 Inadequate Equilibria first post published
  • (2017-12-30 Goodhart Taxonomy Publish) <- maybe part of January spike?
  • 2018-03-23 Official LW2 launch and switching of www.lesswrong.com to point to the new site.

In parentheses events are possible draws which spiked traffic at those times.

Huh, well that's something.

I'm curious, who else got this? And if yes, anyone click the link? Why/why not?

4Richard_Kennaway
I got the poll and voted, but not the follow-up, only "You [sic] choice has been made. It cannot be unmade."
3Max H
I got that exact message, and did click the link, about 1h after the timestamp of the message in my inbox. Reasoning: * The initial poll doesn't actually mention that the results would be used to decide the topic of next year's Petrov Day. I think all the virtues are important, but if you want to have a day specifically focusing on one, it might make more sense to have the day focused on the least voted virtue (or just not the most-voted one), since it is more likely to be neglected. * I predict there was no outright majority (just a plurality) in the original poll. So most likely, the only thing the first clicker is deciding is going with the will of something like a 20% minority group instead of a 30% minority group. * I predict that if you ran a ranked-choice poll that was explicitly on which virtue to make the next Petrov Day about, the plurality winner of the original poll would not win. All of these reasons are independent of my actual initial choice, and seem like the kind of thing that an actual majority of the initial poll respondents might agree with me about. And it actually seems preferable (or at least not harmful) if one of the other minorities gets selected instead, i.e. my actual preference ordering for what next year's Petrov Day should be about is (my own choice) > (one of the other two minority options) > (whatever the original plurality selection was). If lots of other people have a similar preference ordering, then it's better for most people if anyone clicks the link, and if you happen to be the first clicker, you get a bonus of having your own personal favorite choice selected. (Another prediction, less confident than my first two: I was not the first clicker, but the first clicker was also someone who initially chose the "Avoiding actions..." virtue in the first poll.)
2Dagon
I got both mails (with a different virtue).  I clicked on it. I think this is a meta-petrov, where everyone has the choice to make their preference (likely all in the minority, or stated as such even if not) the winner, or to defer to others.  I predict that it will eventually be revealed that the outcome would be better if nobody clicked the second link.  I defected, because pressing buttons is fun.
2jam_brand
A small extra detail not mentioned: the end of the linked URL is "unilateralism=true".
2Matt Goldenberg
I got both messages, didn't click the second.
1frontier64
I got it both messages. Only clicked on the first. I guess other admins besides you were working on this and didn't say anything to you?

When I have a problem, I have a bias towards predominantly Googling and reading. This is easy, comfortable, do it from laptop or phone. The thing I'm less inclined to do is ask other people – not because I think they won't have good answers, just because...talking to people.

I'm learning to correct for this. The think about other people is 1) sometimes they know more, 2) they can expose your mistaken assumptions.

The triggering example for this note is an appointment I had today with a hand and arm specialist for the unconventional RSI I've been experiencing... (read more)

2ChristianKl
Without medical training one has a lot of unknown unknowns when researching issues oneself. Talking things through with a doctor can often help to get aware of medical knowledge that's relevant.
2Ruby
Yeha, it's easy to get discouraged though when initial doctors clearly know less than you and only know of the most common diagnoses which very much don't seem to apply. Hence my advice to keep looking for doctors who do know more.
2ChristianKl
My point was that even if you know more specific facts then the doctor you are talking to, he might still be able to tell you something useful. When it comes to asking people to get knowledge total amount of knowledge isn't the only thing that matters. It matters a great deal that they have different knowledge then you.
2Ruby
True, true.

Test

9Dagon
Success!  Or maybe Fail!  if you hoped it not to be visible.

Failed replications notwithstanding, I think there's something to Fixed vs Growth Mindset. In particular, Fixed Mindset leading to failure being demoralizing, since it is evidence you are a failure, rings true.