William_S4dΩ681568
27
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool. I resigned from OpenAI on February 15, 2024.
Pretending not to see when a rule you've set is being violated can be optimal policy in parenting sometimes (and I bet it generalizes). Example: suppose you have a toddler and a "rule" that food only stays in the kitchen. The motivation is that each time food is brough into the living room there is a small chance of an accident resulting in a permanent stain. There's cost to enforcing the rule as the toddler will put up a fight. Suppose that one night you feel really tired and the cost feels particularly high. If you enforce the rule, it will be much more painful than it's worth in that moment (meaning, fully discounting future consequences). If you fail to enforce the rule, you undermine your authority which results in your toddler fighting future enforcement (of this and possibly all other rules!) much harder, as he realizes that the rule is in fact negotiable / flexible. However, you have a third choice, which is to credibly pretend to not see that he's doing it. It's true that this will undermine your perceived competence, as an authority, somewhat. However, it does not undermine the perception that the rule is to be fully enforced if only you noticed the violation. You get to "skip" a particularly costly enforcement, without taking steps back that compromise future enforcement much. I bet this happens sometimes in classrooms (re: disruptive students) and prisons (re: troublesome prisoners) and regulation (re: companies that operate in legally aggressive ways). Of course, this stops working and becomes a farce once the pretense is clearly visible. Once your toddler knows that sometimes you pretend not to see things to avoid a fight, the benefit totally goes away. So it must be used judiciously and artfully.
I wish there were more discussion posts on LessWrong. Right now it feels like it weakly if not moderately violates some sort of cultural norm to publish a discussion post (similar but to a lesser extent on the Shortform). Something low effort of the form "X is a topic I'd like to discuss. A, B and C are a few initial thoughts I have about it. What do you guys think?" It seems to me like something we should encourage though. Here's how I'm thinking about it. Such "discussion posts" currently happen informally in social circles. Maybe you'll text a friend. Maybe you'll bring it up at a meetup. Maybe you'll post about it in a private Slack group. But if it's appropriate in those contexts, why shouldn't it be appropriate on LessWrong? Why not benefit from having it be visible to more people? The more eyes you get on it, the better the chance someone has something helpful, insightful, or just generally useful to contribute. The big downside I see is that it would screw up the post feed. Like when you go to lesswrong.com and see the list of posts, you don't want that list to have a bunch of low quality discussion posts you're not interested in. You don't want to spend time and energy sifting through the noise to find the signal. But this is easily solved with filters. Authors could mark/categorize/tag their posts as being a low-effort discussion post, and people who don't want to see such posts in their feed can apply a filter to filter these discussion posts out. Context: I was listening to the Bayesian Conspiracy podcast's episode on LessOnline. Hearing them talk about the sorts of discussions they envision happening there made me think about why that sort of thing doesn't happen more on LessWrong. Like, whatever you'd say to the group of people you're hanging out with at LessOnline, why not publish a quick discussion post about it on LessWrong?
habryka4d4922
7
Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens.  Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.
Dalcy4d447
1
Thoughtdump on why I'm interested in computational mechanics: * one concrete application to natural abstractions from here: tl;dr, belief structures generally seem to be fractal shaped. one major part of natural abstractions is trying to find the correspondence between structures in the environment and concepts used by the mind. so if we can do the inverse of what adam and paul did, i.e. 'discover' fractal structures from activations and figure out what stochastic process they might correspond to in the environment, that would be cool * ... but i was initially interested in reading compmech stuff not with a particular alignment relevant thread in mind but rather because it seemed broadly similar in directions to natural abstractions. * re: how my focus would differ from my impression of current compmech work done in academia: academia seems faaaaaar less focused on actually trying out epsilon reconstruction in real world noisy data. CSSR is an example of a reconstruction algorithm. apparently people did compmech stuff on real-world data, don't know how good, but effort-wise far too less invested compared to theory work * would be interested in these reconstruction algorithms, eg what are the bottlenecks to scaling them up, etc. * tangent: epsilon transducers seem cool. if the reconstruction algorithm is good, a prototypical example i'm thinking of is something like: pick some input-output region within a model, and literally try to discover the hmm model reconstructing it? of course it's gonna be unwieldly large. but, to shift the thread in the direction of bright-eyed theorizing ... * the foundational Calculi of Emergence paper talked about the possibility of hierarchical epsilon machines, where you do epsilon machines on top of epsilon machines and for simple examples where you can analytically do this, you get wild things like coming up with more and more compact representations of stochastic processes (eg data stream -> tree -> markov model -> stack automata -> ... ?) * this ... sounds like natural abstractions in its wildest dreams? literally point at some raw datastream and automatically build hierarchical abstractions that get more compact as you go up * haha but alas, (almost) no development afaik since the original paper. seems cool * and also more tangentially, compmech seemed to have a lot to talk about providing interesting semantics to various information measures aka True Names, so another angle i was interested in was to learn about them. * eg crutchfield talks a lot about developing a right notion of information flow - obvious usefulness in eg formalizing boundaries? * many other information measures from compmech with suggestive semantics—cryptic order? gauge information? synchronization order? check ruro1 and ruro2 for more.

Popular Comments

Recent Discussion

8Decaeneus7h
Pretending not to see when a rule you've set is being violated can be optimal policy in parenting sometimes (and I bet it generalizes). Example: suppose you have a toddler and a "rule" that food only stays in the kitchen. The motivation is that each time food is brough into the living room there is a small chance of an accident resulting in a permanent stain. There's cost to enforcing the rule as the toddler will put up a fight. Suppose that one night you feel really tired and the cost feels particularly high. If you enforce the rule, it will be much more painful than it's worth in that moment (meaning, fully discounting future consequences). If you fail to enforce the rule, you undermine your authority which results in your toddler fighting future enforcement (of this and possibly all other rules!) much harder, as he realizes that the rule is in fact negotiable / flexible. However, you have a third choice, which is to credibly pretend to not see that he's doing it. It's true that this will undermine your perceived competence, as an authority, somewhat. However, it does not undermine the perception that the rule is to be fully enforced if only you noticed the violation. You get to "skip" a particularly costly enforcement, without taking steps back that compromise future enforcement much. I bet this happens sometimes in classrooms (re: disruptive students) and prisons (re: troublesome prisoners) and regulation (re: companies that operate in legally aggressive ways). Of course, this stops working and becomes a farce once the pretense is clearly visible. Once your toddler knows that sometimes you pretend not to see things to avoid a fight, the benefit totally goes away. So it must be used judiciously and artfully.
keltan37m20

Teacher here, can confirm.

Main idea: When you have a question and are googling around for an answer, you're basically searching through a space of information, and seeking an answer to your question in this space. Sometimes you're able to find your answer quickly. Sometimes you aren't. But if you are able to ask someone for help, they'll often be able to just tell you the answer right away. This is helpful and is analogous to an O(1) lookup in programming.

In elaborating on this, I will start by discussing what Big-O notation is, and then I will talk about how it applies to asking for help.

  • If you're already familiar with Big-O notation, you'll probably want to skip to the "Asking For Help" section.
  • If you aren't too familiar with Big-O notation, I
...

This is a great use case for AI: expert knowledge tailored precisely to one’s needs

TLDR:

  1. Around Einstein-level, relatively small changes in intelligence can lead to large changes in what one is capable to accomplish.
    1. E.g. Einstein was a bit better than the other best physi at seeing deep connections and reasoning, but was able to accomplish much more in terms of impressive scientific output.
  2. There are architectures where small changes can have significant effects on intelligence.
    1. E.g. small changes in human-brain-hyperparameters: Einstein’s brain didn’t need to be trained on 3x the compute than normal physics professors for him to become much better at forming deep understanding, even without intelligence improving intelligence.

Einstein and the heavytail of human intelligence

1905 is often described as the "annus mirabilis" of Albert Einstein. He founded quantum physics by postulating the existence of (light) quanta, explained Brownian motion, introduced the special relativity theory and...

7cubefox3h
That's an interesting argument. However, something similar to your hypothetical explanation in footnote 6 suggests the following hypothesis: Most humans aren't optimized by evolution to be good at abstract physics reasoning, while they easily could have been, with evolutionary small changes in hyperparameters. After all Einstein wasn't too dissimilar in training/inference compute and architecture from the rest of us. This explanation seems somewhat plausible, since highly abstract reasoning ability perhaps wasn't very useful for most of human history. (An argument in a similar direction is the existence of Savant syndrome, which implies that quite small differences in brain hyperparameters can lead to strongly increased narrow capabilities of some form, which likely weren't useful in the ancestral environment, which explains why humans generally don't have them. The Einstein case suggests a similar phenomenon may also exists for more general abstract reasoning.) If this is right, humans would be analogous to very strong base LLMs with poor instruction tuning, where the instruction tuning (for example) only involved narrow instruction-execution pairs that are more or less directly related to finding food in the wilderness, survival and reproduction. Which would lead to bad performance at many tasks not closely related to fitness, e.g. on Math benchmarks. The point is that a lot of the "raw intelligence" of the base LLM couldn't be accessed just because the model wasn't tuned to be good at diverse abstract tasks, even though it easily could have been, without a big change in architecture or training/inference compute. But then it seems unlikely that artificial ML models (like LLMs) are or will be unoptimized for highly abstract reasoning in the same way evolution apparently didn't "care" to make us all great at abstract physics and math style thinking. Since AI models are indeed actively optimized in diverse abstract directions. Which would make it unlikely to get
1Radford Neal4h
I think you are misjudging the mental attributes that are conducive to scientific breakthroughs.  My (not very well informed) understanding is that Einstein was not especially brilliant in terms of raw brainpower (better at math and such than the average person, of course, but not much better than the average physicist). His advantage was instead being able to envision theories that did not occur to other people. What might be described as high creativity rather than high intelligence. Other attributes conducive to breakthroughs are a willingness to work on high-risk, high-reward problems (much celebrated by granting agencies today, but not actually favoured), a willingness to pursue unfashionable research directions, skepticism of the correctness of established doctrine, and a certain arrogance of thinking they can make a breakthrough, combined with a humility allowing them to discard ideas of theirs that aren't working out.  So I think the fact that there are more high-IQ researchers today than ever before does not necessarily imply that there is little "low hanging fruit".
1RussellThor1h
Not following - where could the 'low hanging fruit' possibly be hiding? We have many of "Other attributes conducive to breakthroughs are a ..." in our world of 8 billion. The data strongly suggests we are in diminishing returns. What qualities could an AI of Einstein intelligence realistically have that would let it make such progress where no person has. It would seem you would need to appeal to other less well defined qualities such as 'creativity' and argue that for some reason the AI would have much more of that. But that seems similar to just arguing that it in fact has > Einstein intelligence.

I'm not attempting to speculate on what might be possible for an AI.  I'm saying that there may be much low-hanging fruit potentially accessible to humans, despite there now being many high-IQ researchers. Note that the other attributes I mention are more culturally-influenced than IQ, so it's possible that they are uncommon now despite there being 8 billion people.

A couple years ago, I had a great conversation at a research retreat about the cool things we could do if only we had safe, reliable amnestic drugs - i.e. drugs which would allow us to act more-or-less normally for some time, but not remember it at all later on. And then nothing came of that conversation, because as far as any of us knew such drugs were science fiction.

… so yesterday when I read Eric Neyman’s fun post My hour of memoryless lucidity, I was pretty surprised to learn that what sounded like a pretty ideal amnestic drug was used in routine surgery. A little googling suggested that the drug was probably a benzodiazepine (think valium). Which means it’s not only a great amnestic, it’s also apparently one...

8ryan_greenblatt4h
See also discussion here.
5RamblinDash5h
IDK, I think this comment warrants the level of karma. OP is proposing messing around with a drug class that kills thousands of people per year. Even only counting benzo overdoses that don't involve opioids, it kills ~1500 people per year. Source: https://nida.nih.gov/research-topics/trends-statistics/overdose-death-rates (you can download the data from that page to see precise numbers). It's not often that a forum comment could save a life!
12Algon3h
Even though I think the comment was useful, it doesn't look to me like it was as useful as the typical 139 karma comment as I expect LW readers to be fairly unlikely to start popping benzos after reading this post. IMO it should've gotten like 30-40 karma. Even 60 wouldn't have been too shocking to me. But 139? That's way more karma than anything else I've posted.    I don't think it warrants this much karma, and I now share @ryan_greenblatt's concerns about the ability to vote on Quick Takes and Popular Comments introducing algorithmic virality to LW. That sort of thing is typically corrosive to epistemic hygeine as it changes the incentives of commenting more towards posting applause-lights. I don't think that's a good change for LW, as I think we've got too much group-think as it is. 

Yeah, seems right to me. If this is a recurring thing we might deactivate voting on the popular comments interface or something like that.

A few days ago I came upstairs to:

Me: how did you get in there?

Nora: all by myself!

Either we needed to be done with the crib, which had a good chance of much less sleeping at naptime, or we needed a taller crib. This is also something we went through when Lily was little, and that time what worked was removing the bottom of the crib.

It's a basic crib, a lot like this one. The mattress sits on a metal frame, which attaches to a set of holes along the side of the crib. On it's lowest setting, the mattress is still ~6" above the floor. Which means if we remove the frame and sit the mattress on the floor, we gain ~6".

Without the mattress weighing it down, though, the crib...

2mikbp15h
Why must she not be able to climb out(/in) of the crib for napping there?
jefftk2h20

Climbing out of the crib is mildly dangerous, since it's farther down on the outside than the inside. So it's good practice to switch a way from a crib (or adjust the crib to be taller) once they get to where they'll be able to do that soon.

Even if they can do it safely, though, a crib they can get in and out of on their own defeats the purpose of a crib -- at that point you should just move to something optimized for being easy to get in and out of, like a bed.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

Most people avoid saying literally false things, especially if those could be audited, like making up facts or credentials. The reasons for this are both moral and pragmatic — being caught out looks really bad, and sustaining lies is quite hard, especially over time. Let’s call the habit of not saying things you know to be false ‘shallow honesty’[1].

Often when people are shallowly honest, they still choose what true things they say in a kind of locally act-consequentialist way, to try to bring about some outcome. Maybe something they want for themselves (e.g. convincing their friends to see a particular movie), or something they truly believe is good (e.g. causing their friend to vote for the candidate they think will be better for the country).

Either way, if you...

This is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset. 

STORY (skippable)

You have the excellent fortune to live under the governance of The People's Glorious Free Democratic Republic of Earth, giving you a Glorious life of Freedom and Democracy.

Sadly, your cherished values of Democracy and Freedom are under attack by...THE ALIEN MENACE!

The typical reaction of an Alien Menace to hearing about Freedom and Democracy.  (Generated using OpenArt SDXL).

Faced with the desperate need to defend Freedom and Democracy from The Alien Menace, The People's Glorious Free Democratic Republic of Earth has been forced to redirect most of its resources into the Glorious Free People's Democratic War...

1NickSharp2h
I would like one more day if no one objects. No big deal though, I may or may not have anything by tomorrow anyway.  Thanks for posting!  Love these challenges!!
aphyer2h20

I'm always happy to have more players: if you want more than one day that's not a big deal, I'm happy to delay until next week if you'd like.

Fooming Shoggoths Dance Concert

June 1st at LessOnline

After their debut album I Have Been A Good Bing, the Fooming Shoggoths are performing at the LessOnline festival. They'll be unveiling several previously unpublished tracks, such as "Nothing is Mere", feat. Richard Feynman.

Ticket prices raise $100 on May 13th