Best of LessWrong 2022

Holden shares his step-by-step process for forming opinions on a topic, developing and refining hypotheses, and ultimately arriving at a nuanced view - all while focusing on writing rather than just passively consuming information.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Pablo342
1
Silver’s model and most other lines of evidence indicate that the US presidential race is as close to a tossup as it gets. But, as of this writing, you can buy Harris contracts on Polymarket for 38 cents. The explanation for this apparent mispricing seems to be that, over the past few days, a single pro-Trump trader has poured tens of millions of dollars into the platform. “Domer”, the author of the linked tweet and Polymarket’s most successful trader to date, claims that this effect has depressed Harris’s contract price by around five cents, though I am unable to independently confirm this claim.
Okay, I spent much more time with the Anthropic RSP revisions today.  Overall, I think it has two big thematic shifts for me:  1.  It's way more "professionally paranoid," but needs even more so on non-cyber risks.  A good start, but needs more on being able to stop human intelligence (i.e., good old fashioned spies) 2.  It really has an aggressively strong vibe of "we are actually using this policy, and We Have Many Line Edits As A Result."  You may not think that RSPs are sufficient -- I'm not sure I do, necessarily -- but I am heartened slightly that they genuinely seem to take the RSP seriously to the point of having mildly-frustrated-about-process-hiccup footnoes about it. (Free advice to Anthropic PR: interview a bunch of staff about this on camera, cut it together, and post it, it will be lovely and humanizing and great recruitment material, I bet). 
Dean Ball is, among other things, a prominent critic of SB-1047. I meanwhile publicly supported it. But we both talked and it turns out we have a lot of common ground, especially re: the importance of transparency in frontier AI development. So we coauthored this op-ed in TIME: 4 Ways to Advance Transparency in Frontier AI Development. (tweet thread summary here)
leogao8228
11
it's surprising just how much of cutting edge research (at least in ML) is dealing with really annoying and stupid bottlenecks. pesky details that seem like they shouldn't need attention. tools that in a good and just world would simply not break all the time. i used to assume this was merely because i was inexperienced, and that surely eventually you learn to fix all the stupid problems, and then afterwards you can just spend all your time doing actual real research without constantly needing to context switch to fix stupid things.  however, i've started to think that as long as you're pushing yourself to do novel, cutting edge research (as opposed to carving out a niche and churning out formulaic papers), you will always spend most of your time fixing random stupid things. as you get more experienced, you get bigger things done faster, but the amount of stupidity is conserved. as they say in running- it doesn't get easier, you just get faster. as a beginner, you might spend a large part of your research time trying to install CUDA or fighting with python threading. as an experienced researcher, you might spend that time instead diving deep into some complicated distributed training code to fix a deadlock or debugging where some numerical issue is causing a NaN halfway through training. i think this is important to recognize because you're much more likely to resolve these issues if you approach them with the right mindset. when you think of something as a core part of your job, you're more likely to engage your problem solving skills fully to try and find a resolution. on the other hand, if something feels like a brief intrusion into your job, you're more likely to just hit it with a wrench until the problem goes away so you can actually focus on your job. in ML research the hit it with a wrench strategy is the classic "google the error message and then run whatever command comes up" loop. to be clear, this is not a bad strategy when deployed properly - this
leogao8554
5
in research, if you settle into a particular niche you can churn out papers much faster, because you can develop a very streamlined process for that particular kind of paper. you have the advantage of already working baseline code, context on the field, and a knowledge of the easiest way to get enough results to have an acceptable paper. while these efficiency benefits of staying in a certain niche are certainly real, I think a lot of people end up in this position because of academic incentives - if your career depends on publishing lots of papers, then a recipe to get lots of easy papers with low risk is great. it's also great for the careers of your students, because if you hand down your streamlined process, then they can get a phd faster and more reliably. however, I claim that this also reduces scientific value, and especially the probability of a really big breakthrough. big scientific advances require people to do risky bets that might not work out, and often the work doesn't look quite like anything anyone has done before. as you get closer to the frontier of things that have ever been done, the road gets tougher and tougher. you end up spending more time building basic infrastructure. you explore lots of dead ends and spend lots of time pivoting to new directions that seem more promising. you genuinely don't know when you'll have the result that you'll build your paper on top of. so for people who are not beholden as strongly to academic incentives, it might make sense to think carefully about the tradeoff between efficiency and exploration. (not sure I 100% endorse this, but it is a hypothesis worth considering)

Popular Comments

Recent Discussion

(Disclaimer: This is my personal opinion, not that of any movement or organization.)

This post aims to show that, over the next decade, it is quite likely that most democratic Western countries will become fascist dictatorships - this is not a tail risk, but the most likely overall outcome. Politics is not a typical LessWrong topic, and for good reason:

  1. it tends to impair clear thinking;
  2. most well-known political issues are not neglected;
  3. most political "debates" are simply people yelling at each other online; neither saying anything new, nor even really trying to persuade the opposition.

However, like the COVID pandemic, it seems like this particular trend will be so impactful and so disruptive to ordinary Western life that it will be important to be aware of it, factor it into plans,...

Just my opinion: the concerns are valid but exaggerated.

Perhaps exaggeration is justified in order to get people's attention since most americans are still treating this as a normal election.

5.1 Post summary / Table of contents

This is the 5th of a series of 8 blog posts, which I’m serializing weekly. (Or email or DM me if you want to read the whole thing right now.)

Dissociative Identity Disorder (DID) (previously known as “Multiple Personality Disorder”) involves a person having multiple “alters” (alternate identities), with different preferences and (in some cases) different names. A DID diagnosis also requires some nonzero amount of “inter-identity amnesia”, where an alter cannot recall events that occurred when a different alter was active. For example, DSM-V talks about patients “coming to” on a beach with no recollection of how they got there.

Anyway, just like trance in the previous post, DID was one of those things that I unthinkingly assumed was vaguely fictional for most of...

4Charlie Steiner
I'm still confused about the amnesia. My memory seems pretty good at recalling an episodic memory given very partial information - e.g. some childhood memory based on verbal cues, or sound, or smell. I can recall the sight of my childhood treehouse despite currently also being exposed to visual stimuli nothing like that treehouse. On this intuition, it seems like DID amnesia should require more 'protections', changes to the memory recall process that impede recall more than normal.

Good question! I think there are two different steps here. (The following is a bit oversimplified.)

Step 1 is auto-associative recall in the hippocampus. Neurons all over the cortex directly or indirectly activate neurons in the hippocampus. And then if something is happening in any part of the cortex that partially matches some old memory, the whole old memory can autocomplete within the hippocampus.

Step 2 is: that core of an old memory has lots of (direct and indirect) associations all around the cortex / global workspace. Like, if it’s a visual memory, t... (read more)

It's been a busy season at the Nucleic Acid Observatory, and we have a lot to share since our last update. As always, If anything here is particularly interesting or if you’re working on similar problems, please reach out!

Wastewater Sequencing

We performed an initial analysis of untargeted sequencing data from aggregated airplane lavatory waste and municipal treatment plant influent that we collected and processed during our Fall 2023 partnership with CDC’s Traveler-based Genomic Surveillance program and Ginkgo Biosecurity. We've now analyzed viral abundance and diversity in sequencing data across multiple sample types and wastewater-processing and sequencing protocols. Next steps include further investigating how protocol and sample-type affect specific viruses and bacteria, as well as understanding pathogen temporal dynamics seen in airport versus treatment-plant samples.

We have continued to work...

habryka20

That is a very cute/nice logo.

Dario Amodei is thinking about the potential. The result is a mostly good essay called Machines of Loving Grace, outlining what can be done with ‘powerful AI’ if we had years of what was otherwise relative normality to exploit it in several key domains, and we avoided negative outcomes and solved the control and alignment problems. As he notes, a lot of pretty great things would then be super doable.

Anthropic also offers us improvements to its Responsible Scaling Policy (RSP, or what SB 1047 called an SSP). Still much left to do, but a clear step forward there.

Daniel Kokotajlo and Dean Ball have teamed up on an op-ed for Time on the need for greater regulatory transparency. It’s very good.

Also, it’s worth checking out the Truth Terminal...

Are the ‘AI companion’ apps, or robots, coming? I mean, yes, obviously?

The technology for bots who are "better" than humans in some way (constructive, pro-social, compassionate, intelligent, caring interactions while thinking 2 levels meta) has been around since 2022. But the target group wouldn't pay enough for GPT-4-level inference, so current human-like bots are significantly downscaled compared to what technology allows.

It’s monthly roundup time again, and it’s happily election-free.

Thinking About the Roman Empire’s Approval Rating

Propaganda works, ancient empires edition. This includes the Roman Republic being less popular than the Roman Empire and people approving of Sparta, whereas Persia and Carthage get left behind. They’re no FDA.

Polling USA: Net Favorable Opinion Of:

Ancient Athens: +44%

Roman Empire: +30%

Ancient Sparta: +23%

Roman Republican: +26%

Carthage: +13%

Holy Roman Empire: +7%

Persian Empire: +1%

Visigoths: -7%

Huns: -29%

YouGov / June 6, 2024 / n=2205

The Five Star Problem

What do we do about all 5-star ratings collapsing the way Peter describes here?

Peter Wildeford: TBH I am pretty annoyed that when I rate stuff the options are:

* “5 stars – everything was good enough I guess”

* “4 stars – there was a serious problem”

* “1-3 stars – I almost died”

I can’t

...

Supply side: It approaches the minimum average total, not marginal, cost. Maybe if people accounted for it finer (e.g., charging self "wages" and "rent"), cooking at home would be in the ballpark (assuming equal quality of inputs and outputs across venues..), but that just illustrates how real costs can explain a lot of the differential without having to jump to regulation and barriers to entry (yes, those are nonzero too!).

Demand side: Complaints in the OP about the uninformativeness of ratings also highlight how far we are from perfect competition (also,... (read more)

2Viliam
The problem with five stars seems to be illusion of transparency: "here is the system where you rate up to 5 stars, everyone understands what that means, right?", yep, but everyone understands differently. (Also, I wouldn't know before trying it, whether the minimum is 1 star or 0 stars.) It might be interesting to see a map, what each country considers to be a proper rating for "normal experience". It seems to me that this is a specifically American thing, where things need to be enthusiastically called excellent unless they really suck, where calling something "okay, I guess" is a polite way to suggest that someone should be fired, because everything is supposed to be exceptional all the time. While in Eastern Europe, "okay, I guess" would mean that even a grumpy customer couldn't find anything specific to complain about, so the service is truly excellent, at least relatively to local standards. In different context, I have seen surveys with ratings from 1 to 10, where the summary given to management was that 10 meant "good", 7-9 meant "neutral", and 1-6 meant "bad". It made me worry how many people I might have accidentally got fired by giving feedback "highly above average, but not literally Einstein". Since then, whenever possible, I refuse to provide feedback in form of stars. . The linked article says "In other words, the environment may have made deep work more difficult but we still retain the ability to concentrate in a distraction-free environment." This seems to me like a plausible explanation. We don't see the improvements in concentration in real life, because the environment keeps getting noisy much faster. But once we remove the environment, we can notice that we are actually quite good. In other words, people usually attribute the ability to concentrate to personal characteristics, but environment plays a greater role. Which leads to a question, why do employers insist on creating such noisy environments? I mean, my first job out of school, I h
1oumuamua
glomarize is the word I believe you want to use.

I think and talk a lot about the risks of powerful AI. The company I’m the CEO of, Anthropic, does a lot of research on how to reduce these risks. Because of this, people sometimes draw the conclusion that I’m a pessimist or “doomer” who thinks AI will be mostly bad or dangerous. I don’t think that at all. In fact, one of my main reasons for focusing on risks is that they’re the only thing standing between us and what I see as a fundamentally positive future. I think that most people are underestimating just how radical the upside of AI could be, just as I think most people are underestimating how bad the risks could be.

In this essay I try to sketch out what that

...

My current belief is that this essay is optimized to be understandable by a much broader audience than any comparable public writing from Anthropic on extinction-level risk. 

For instance, did you know that the word 'extinction' doesn't appear anywhere on Anthropic's or Dario's websites? Nor do 'disempower' or 'disempowerment'. The words 'existential' and 'existentially' only come up three times: when describing the work of an external organization (ARC), in one label in a paper, and one mention in the Constitutional AI. In its place they always talk a... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

the following is motivated by:

I've been a long time lurker on Less Wrong and I've noticed the recurring criticism that despite its focus on rationality, the community lacks structured training to develop practical rationality skills. Eliezer Yudkowsky talks rationality as a martial art, because it's something that can be trained and refined through deliberate practice. But where is our dojo?

A model that comes to mind is a website like LeetCode, where programmers can solve coding challenges, share solutions, and see how others approach the same problems. LeetCode can sometimes encourage overfitting to specific problem types so it's not a perfect analogy. The community driven aspect would interesting to me as you can see how...

2romeostevensit
I've thought about this for a long time and I think one of the big issues is lack of labelled training data in many domains. E.g. people made calibration toys and that helped a lot for that particular dimension. Ditto the tests on which studies replicated. In many cases we'd want more complex blinded data for people to practice on, and that requires, like in games, someone to set up all the non-fun backend for them.
2Raemon
What is an example of a type of complex blinded data that you'd be imagining here?

like the calibration game but for a variety of decision problems, where the person has to assign probabilities to things at different stages based on what information is available. Afterwards they get an example brier score based on the average of what people with good prediction track records set at each phase.

2Raemon
Yeah I'm basic using the lens of my cognitive bootcamp series to iron out the pedagogy here. I try to write up LW posts for all the key takeaways and exercises, although it takes awhile.

How can we make many humans who are very good at solving difficult problems?

Summary (table of made-up numbers)

I made up the made-up numbers in this table of made-up numbers; therefore, the numbers in this table of made-up numbers are made-up numbers.

Call to action

If you have a shitload of money, there are some projects you can give money to that would make supergenius humans on demand happen faster. If you have a fuckton of money, there are projects whose creation you could fund that would greatly accelerate this technology.

If you're young and smart, or are already an expert in either stem cell / reproductive biology, biotech, or anything related to brain-computer interfaces, there are some projects you could work on.

If neither, think hard, maybe I missed something.

You can...

2Logan Zoellner
  What data?  Why not just train it on literally 0 data (muZero style)? You think it's going to derive the existence of the physical world from the Peano Axioms? 
1Purplehermann
On human-computer interfaces: Working memory, knowledge reservoirs and raw calculation power seem like the easiest pieces, while fundamentally making people better at critical thinking, philosophy or speeding up actual comprehension would be much for difficult. The difference being upgrading the core vs plug-ins. Curated reservoirs of practical and theoretical information, well indexed, would be very useful to super geniuses. On human-human: You don't actually need to hook them up physically. Having multiple people working on different parts of a problem lets them all bounce ideas off each other. Overall: The goal should be to create a number of these people, then let them plan out the next round if their intelligence doesn't do it. If humanity can make 100 7+SD humans hooked up with large amounts of computing power, curated knowledge + tons of raw data, and massive working memories, they'll be able to figure out any further steps much better than we can.
2TsviBT
But both of these things are basically available currently, so apparently our current level isn't enough. LLMs + google (i.e. what Perplexity is trying to be) are already a pretty good index; what would a BCI add? I commented on a similar topic here: https://www.lesswrong.com/posts/jTiSWHKAtnyA723LE/overview-of-strong-human-intelligence-amplification-methods?commentId=uZg9s2FfP7E7TMTcD

Alright, I have a question stemming from TurnTrout's post on Reward is not the optimization target, where he argues that the premises that are required to get to the conclusion of reward being the optimization target are so narrowly applicable as to not apply to future RL AIs as they gain more and more power:

https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target#When_is_reward_the_optimization_target_of_the_agent_

But @gwern argued with Turntrout that reward is in fact the optimization target for a broad range of RL algorithms:

https://www.lesswrong.com/posts/ttmmKDTkzuum3fftG/#sdCdLw3ggRxYik385

https://www.lesswrong.com/posts/nmxzr2zsjNtjaHh7x/actually-othello-gpt-has-a-linear-emergent-world#Tdo7S62iaYwfBCFxL

So my question is are there known results, ideally proofs, but I can accept empirical studies if necessary that show when RL algorithms treat the reward function as an optimization target?

And how narrow is the space of RL algorithms that don't optimize for the reward function?

A good answer will link to results known in...

gwern40

I guess LLMs are model-free, so that's relevant

FWIW, I strongly disagree with this claim. I believe they are model-based, with the usual datasets & training approaches, even before RLHF/RLAIF.

2Seth Herd
Ah yes. I agree that the wireheading question deserves more thought. I'm not confident that my answer to wireheading applies to the types of AI we'll actually build - I haven't thought about it enough. FWIW the two papers I cited are secondary research, so they branch directly into a massive amount of neuroscience research that indirectly bears on the question in mammalian brains. None of it I can think of directly addresses the question of whether reward is the optimization target for humans. I'm not sure how you'd empirically test this. I do think it's pretty clear that some types of smart, model-based RL agents would optimize for reward. Those are the ones that a) choose actions based on highest estimated sum of future rewards (like humans seem to, very very approximately), and that are smart enough to estimate future rewards fairly accurately. LLMs with RLHF/RLAIF may be the relevant case. They are model-free by TurnTrout's definition, and I'm happy to accept his use of the terminology. But they do have a powerful critic component (at least in training - I'm not sure about deployment, but probably there too)0, so it seems possible that it might develop a highly general representation of "stuff that gives the system rewards". I'm not worried about that, because I think that will happen long after we've given them agentic goals, and long after they've developed a representation of "stuff humans reward me for doing" - which could be mis-specified enough to lead to doom if it was the only factor.