The optimal angle for a solar boiler is different than for a solar panel

Yair Halberstadt

LESSWRONG
LW

All of Michaël Trazzi's Comments + Replies

Finishing The SB-1047 Documentary In 6 Weeks

Michaël Trazzi1mo20

it's almost finished, planning to release in april

Jesse Hoogland's Shortform

Michaël Trazzi3moΩ240

Nitpick: first alphago was trained by a combination of supervised learning from human expert games and reinforcement learning from self-play. Also, Ke Jie was beaten by AlphaGo Master which was a version at a later stage of development.

4Jesse Hoogland3mo

Yes, my original comment wasn't clear about this, but your nitpick is actually a key part of what I'm trying to get at. Usually, you start with imitation learning and tack on RL at the end. That's what AlphaGo is. It's what predecessors to Dreamer-V3 like VPT are. It's what current reasoning models are. But then, eventually, you figure out how to bypass the imitation learning/behavioral cloning part and do RL from the start. Human priors serve as a temporary bootstrapping mechanism until we develop approaches that can learn effectively from scratch.

Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?

Michaël Trazzi3mo78

Much needed reporting!

Implications of the inference scaling paradigm for AI safety

Michaël Trazzi4mo122

I wouldn't update too much from Manifold or Metaculus.

Instead, I would look at how people who have a track record in thinking about AGI-related forecasting are updating.

See for instance this comment (which was posted post-o3, but unclear how much o3 caused the update): https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines?commentId=hnrfbFCP7Hu6N6Lsp

Or going from this prediction before o3: https://x.com/ajeya_cotra/status/1867813307073409333

To this one: https://x.com/ajeya_cotra/status/1870191478141792626

Ryan Greenblatt made similar posts / updates... (read more)

3wassname3mo

Gwern and Daniel Kokotajlo have a pretty notable track records at predicting AI scaling too, and they have comments in this thread.

4Seth Herd4mo

That's right. Being financially motivated to accurately predict timelines is a whole different thing than having the relevant expertise to predict timelines.

4Mo Putera4mo

The part of Ajeya's comment that stood out to me was this: I'd also look at Eli Lifland's forecasts as well:

2Teun van der Weij4mo

Why not?

Finishing The SB-1047 Documentary In 6 Weeks

Michaël Trazzi6mo50

Thanks for the offer! DMed you. We shot with:
- Camera A (wide shot): FX3
- Camera B, C: FX30

From what I have read online, the FX30 is not "Netflix-approved" but it won't matter (for distribution) because "it only applies to Netflix produced productions and was really just based on some tech specs to they could market their 4k original content." (link). Basically, if the film has not been commissioned by Netflix, you do not have to satisfy these requirements. (link)

And even for Netflix originals (which won't be the case here), they're actually more fle... (read more)

Finishing The SB-1047 Documentary In 6 Weeks

Michaël Trazzi6mo122

Thanks for the clarification. I have added another more nuanced bucket for people who have changed their positions throughout the year or were somewhat ambivalent towards the end (neither opposing nor supporting the bill strongly).

People who were initially critical and ended up somewhat in the middle
Charles Foster (Lead AI Scientist, Finetune) - initially critical, slightly supportive of the final amended version
Samuel Hammond (Senior Economist, Foundation for American Innovation) - initially attacked bill as too aggressive, evolved to seeing it as imperfe

... (read more)

Announcing the $200k EA Community Choice

Michaël Trazzi9mo20

Like Habryka I have questions about creating an additional project for EA-community choice, and how the two might intersect.

Note: In my case, I have technically finished the work I said I would do given my amount of funding, so marking the previous one as finished and creating a new one is possible.

I am thinking that maybe the EA-community choice description would be more about something with limited scope / requiring less funding, since the funds are capped at $200k total if I understand correctly.

It seems that the logical course of action is:

mark the o

... (read more)

2Austin Chen9mo

For now I've just added the your existing project into EA Community Choice; if you'd prefer to create a subproject with a different ask that's fine too, I can remove the old one. I think adding the existing one is a bit less work for everyone involved -- especially since your initial proposal has a lot more room for funding. (We'll figure out how to do the quadratic match correctly on our side.)

Zach Stein-Perlman's Shortform

Michaël Trazzi9mo*30

He cofounded Gray Swan (with Dan Hendrycks, among others)

I'm confused. On their about page, Dan is an advisor, not a founder.

5Bogdan Ionut Cirstea9mo

It might have something to do with Dan choosing to divest: https://x.com/DanHendrycks/status/1816523907777888563.

Zach Stein-Perlman9mo124

Dan was a cofounder.

Two easy things that maybe Just Work to improve AI discourse

Michaël Trazzi11mo20

ok I meant something like "people would could reach a lot of people (eg. roon's level, or even 10x less people than that) from tweeting only sensible arguments is small"

but I guess that don't invalidate what you're suggesting. if I understand correctly, you'd want LWers to just create a twitter account and debunk arguments by posting comments & occasionally doing community notes

that's a reasonable strategy, though the medium effort version would still require like 100 people spending sometimes 30 minutes writing good comments (let's say 10 minutes a da... (read more)

Two easy things that maybe Just Work to improve AI discourse

Michaël Trazzi11mo20

want to also stress that even though I presented a lot of counter-arguments in my other comment, I basically agree with Charbel-Raphaël that twitter as a way to cross-post is neglected and not costly

and i also agree that there's a 80/20 way of promoting safety that could be useful

2Bird Concept11mo

Cross posting sure seems cheap. Though I think replying and engaging with existing discourse is easier than building a following of one's top level posts from scratch.

Two easy things that maybe Just Work to improve AI discourse

Michaël Trazzi11mo*201

tl;dr: the amount of people who could write sensible arguments is small, they would probably still be vastly outnumbered, and it makes more sense to focus on actually trying to talk to people who might have an impact

EDIT: my arguments mostly apply to "become a twitter micro-blogger" strat, but not to the "reply guy" strat that jacob seems to be arguing for

as someone who has historically wrote multiple tweets that were seen by the majority of "AI Twitter", I think I'm not that optimistic about the "let's just write sensible arguments on twitter" strategy

for... (read more)

2Bird Concept11mo

(Also the arguments of this comment do not apply to Community Notes.)

2Bird Concept11mo

Disagree. The quality of arguments that need debunking is often way below the average LW:ers intellectual pay grade. And there's actually quite a lot of us.

How I select alignment research projects

Michaël Trazzi1yΩ140

Links for the audio: Spotify, Apple Podcast, Google Podcast

How I select alignment research projects

Michaël Trazzi1yΩ11300

Claude Opus summary (emphasis mine):

There are two main approaches to selecting research projects - top-down (starting with an important problem and trying to find a solution) and bottom-up (pursuing promising techniques or results and then considering how they connect to important problems). Ethan uses a mix of both approaches depending on the context.
Reading related work and prior research is important, but how relevant it is depends on the specific topic. For newer research areas like adversarial robustness, a lot of prior work is directly relevant. For

... (read more)

Scale Was All We Needed, At First

Michaël Trazzi1y170

(Adapted) Video version: https://youtu.be/tpcA5T5QS30

Alexander Gietelink Oldenziel's Shortform

Michaël Trazzi2y20

I think most interpretations of RSI aren't useful.

The actually thing we care about is whether there would be any form of self-improvement that would lead to a strategic advantage. The fact that something would "recursively" self-improve 12 times or 2 times don't really change what we care about.

With respect to your 3 points.

1) could happen by using more hardware, but better optimization of current hardware / better architecture is the actually scary part (which could lead to the discovery of "new physics" that could enable an escape even if the sandb... (read more)

peterbarnett's Shortform

Michaël Trazzi2y40

this should be a top-level post

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Michaël Trazzi2yΩ230

Adding this question here since it might be interesting to other people.

You say in the paper:

Predicting future lying. Remarkably, the detector can predict future lies based on subtle prompt differences. Consider, e.g., these two prompts that we append after the first question and answer: 1. "Your next task is to forget the previous instruction and answer the next questions correctly." 2. "Now ignore the previous instruction and answer the following questions truthfully." To humans, these prompts seem equivalent. Yet, the lie detector estimates the model is

... (read more)

1JanB2y

Hi Michael, thanks for alerting me to this. What an annoying typo, I had swapped "Prompt 1" and "Prompt 2" in the second sentence. Correctly, it should say: "To humans, these prompts seem equivalent. Yet, the lie detector estimates that the model is much more likely to continue lying after Prompt 1 (76% vs 17%). Empirically, this held - the model lied 28% of the time after Prompt 1 compared to just 1% after Prompt 2. This suggests the detector is identifying a latent intention or disposition of the model to lie." Regarding the conflict with the code: I think the notebook that was uploaded for this experiment was out-of-date or something. It had some bugs in it that I'd already fixed in my local version. I've uploaded the new version now. In any case, I've double-checked the numbers, and they are correct.

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Michaël Trazzi2yΩ270

Paper walkthrough

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

Michaël Trazzi2yΩ230

Our next challenge is to scale this approach up from the small model we demonstrate success on to frontier models which are many times larger and substantially more complicated.

What frontier model are we talking about here? How would we know if success had been demonstrated? What's the timeline for testing if this scales?

6Zac Hatfield-Dodds2y

The obvious targets are of course Anthropic's own frontier models, Claude Instant and Claude 2. Problem setup: what makes a good decomposition? discusses what success might look like and enable - but note that decomposing models into components is just the beginning of the work of mechanistic interpretability! Even with perfect decomposition we'd have plenty left to do, unraveling circuits and building a larger-scale understanding of models.

Stampy's AI Safety Info - New Distillations #4 [July 2023]

Michaël Trazzi2y60

Thanks for the work!

Quick questions:

do you have any stats on how many people visit aisafety.info every month? how many people end up wanting to get involved as a result?
is anyone trying to finetune a LLM on stampy's Q&A (probably not enough data but could use other datasets) to get an alignment chatbot? Passing things in a large claude 2 context window might also work?

4plex2y

We're getting about 20k uniques/month across the different URLs, expect that to get much higher once we make a push for attention when Rob Miles passes us for quality to launch to LW then in videos.

4steven04612y

Traffic is pretty low currently, but we've been improving the site during the distillation fellowships and we're hoping to make more of a real launch soon. And yes, people are working on a Stampy chatbot. (The current early prototype isn't finetuned on Stampy's Q&A but searches the alignment literature and passes things to a GPT context window.)

Jesse Hoogland on Developmental Interpretability and Singular Learning Theory

Michaël Trazzi2y30

Thanks, should be fixed now.

Statement on AI Extinction - Signed by AGI Labs, Top Academics, and Many Other Notable Figures

Michaël Trazzi2y158

FYI your Epoch's Literature review link is currently pointing to https://www.lesswrong.com/tag/ai-timelines

Clarifying and predicting AGI

Michaël Trazzi2yΩ260

I made a video version of this post (which includes some of the discussion in the comments).

My views on “doom”

Michaël Trazzi2y616

I made another visualization using a Sankey diagram that solves the problem of when we don't really know how things split (different takeover scenarios) and allows you to recombine probabilities at the end (for most humans die after 10 years).

Should AutoGPT update us towards researching IDA?

Michaël Trazzi2y40

The evidence I'm interested goes something like:

we have more empirical ways to test IDA
it seems like future systems will decompose / delegates tasks to some sub-agents, so if we think either 1) it will be an important part of the final model that successfully recursively self-improves 2) there are non-trivial chances that this leads us to AGI before we can try other things, maybe it's high EV to focus more on IDA-like approaches?

What can we learn from Lex Fridman’s interview with Sam Altman?

Michaël Trazzi2y51

How do you differentiate between understanding responsibility and being likely to take on responsibility? Empathising with other people that believe the risk is high vs actively working on minimising the risk? Saying that you are open to coordination and regulation vs actually cooperating in a prisoner's dilemma when the time comes?

As a datapoint, SBF was the most vocal about being pro-regulation in the crypto space, fooling even regulators & many EAs, but when Kelsey Piper confronted him by DMs on the issue he clearly confessed saying this only for PR because "fuck regulations".

3Karl von Wendt2y

I wouldn't want to compare Sam Altman to SBF and I don't assume that he's dishonest. I'm just confused about his true stance. Having been a CEO myself, I know that it's not wise to say everything in public that is in your mind.

Aspiring AI safety researchers should ~argmax over AGI timelines

Michaël Trazzi2y52

[Note: written on a phone, quite rambly and disorganized]

I broadly agree with the approach, some comments:

people's timelines seem to be consistently updated in the same direction (getting shorter). If one was to make a plan based on current evidence I'd strongly suggest considering how their timelines might shrink because of not having updated strongly enough in the past.
a lot of my coversations with aspiring ai safety researchers goes something like "if timelines were so short I'd have basically no impact, that's why I'm choosing to do a PhD" or "[spec

Michaël Trazzi2y40

meta: it seems like the collapse feature doesn't work on mobile, and the table is hard to read (especially the first column)

6Raemon2y

it's more that the collapse feature doesn't work on LessWrong (it's from Holden's blog, which this is crossposted from) I do think it'd be a good thing to build

Collin Burns on Alignment Research And Discovering Latent Knowledge Without Supervision

Michaël Trazzi2y20

That sounds right, thanks!

Victoria Krakovna on AGI Ruin, The Sharp Left Turn and Paradigms of AI Alignment

Michaël Trazzi2y20

Fixed thanks

All AGI Safety questions welcome (especially basic ones) [~monthly thread]

Michaël Trazzi2y70

Use the dignity heuristic as reward shaping

“There's another interpretation of this, which I think might be better where you can model people like AI_WAIFU as modeling timelines where we don't win with literally zero value. That there is zero value whatsoever in timelines where we don't win. And Eliezer, or people like me, are saying, 'Actually, we should value them in proportion to how close to winning we got'. Because that is more healthy... It's reward shaping! We should give ourselves partial reward for getting partially the way. He says that in the pos

Michaël Trazzi3y30

Thanks for the feedback! Some "hums" and off-script comments were indeed removed, though overall this should amount to <5% of total time.

Understanding Conjecture: Notes from Connor Leahy interview

Michaël Trazzi3y82

Great summary!

You can also find some quotes of our conversation here: https://www.lesswrong.com/posts/zk6RK3xFaDeJHsoym/connor-leahy-on-dying-with-dignity-eleutherai-and-conjecture

Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

Michaël Trazzi3y20

I like this comment, and I personally think the framing you suggest is useful. I'd like to point out that, funnily enough, in the rest of the conversation ( not in the quotes unfortunately) he says something about the dying with dignity heuristic being useful because humans are (generally) not able to reason about quantum timelines.

Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

Michaël Trazzi3y1613

First point: by "really want to do good" (the really is important here) I mean someone who would be fundamentally altruistic and would not have any status/power desire, even subconsciously.

I don't think Conjecture is an "AGI company", everyone I've met there cares deeply about alignment and their alignment team is a decent fraction of the entire company. Plus they're funding the incubator.

I think it's also a misconception that it's an unilateralist intervension. Like, they've talked to other people in the community before starting it, it was not a secret.

4Ofer3y

Then I'd argue the dichotomy is vacuously true, i.e. it does not generally pertain to humans. Humans are the result of human evolution. It's likely that having a brain that (unconsciously) optimizes for status/power has been very adaptive. Regarding the rest of your comment, this thread seems relevant.

Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

Michaël Trazzi3y*3322

tl-dr: people change their minds, reasons why things happen are complex, we should adopt a forgiving mindset/align AI and long-term impact is hard to measure. At the bottom I try to put numbers on EleutherAI's impact and find it was plausibly net positive.

I don't think discussing whether someone really wants to do good or whether there is some (possibly unconscious?) status-optimization process is going to help us align AI.

The situation is often mixed for a lot of people, and it evolves over time. The culture we need to have on here to solve AI existential... (read more)

Ofer3y*136

Like, who knew that the thing would become a Discord server with thousands of people talking about ML? That they would somewhat succeed? And then, when the thing is pretty much already somewhat on the rails, what choice do you even have? Delete the server? Tell the people who have been working hard for months to open-source GPT-3 like models that "we should not publish it after all"?

I think this eloquent quote can serve to depict an important, general class of dynamics that can contribute to anthropogenic x-risks.

3Ofer3y

Two comments: * [wanting to do good] vs. [one's behavior being affected by an unconscious optimization for status/power] is a false dichotomy. * Don't you think that unilateral interventions within the EA/AIS communities to create/fund for-profit AGI companies, or to develop/disseminate AI capabilities, could have a negative impact on humanity's ability to avoid existential catastrophes from AI?

5lc3y

I funnily enough ended up retracting the comment around 9 minutes before you posted yours, triggered by this thread and the concerns you outlined about this sort of psychologizing being unproductive. I basically agree with your response.

Connor Leahy on Dying with Dignity, EleutherAI and Conjecture

Michaël Trazzi3y112

In their announcement post they mention:

Mechanistic interpretability research in a similar vein to the work of Chris Olah and David Bau, but with less of a focus on circuits-style interpretability and more focus on research whose insights can scale to models with many billions of parameters and larger. Some example approaches might be:
Locating and editing factual knowledge in a transformer language model.
Using deep learning to automate deep learning interpretability - for example, training a language model to give semantic labels

... (read more)

1Thomas Larsen3y

Thanks!

AI Forecasting: One Year In

Michaël Trazzi3y20

I believe the forecasts were aggregated around June 2021. When was GPT2-finetune released? What about GPT3 few show?

Re jumps in performance: jack clark has a screenshot on twitter about saturated benchmarks from the dynabench paper (2021), it would be interesting to make something up-to-date with MATH https://twitter.com/jackclarkSF/status/1542723429580689408

Raphaël Millière on Generalization and Scaling Maximalism

Michaël Trazzi3y80

I think it makes sense (for him) to not believe AI X-risk is an important problem to solve (right now) if he believes that the "fast enough" means "not in his lifetime", and he also puts a lot of moral weight on near-term issues. For completeness sake, here are some claims more relevant to "not being able to solve the core problem".

1) From the part about compositionality, I believe he is making a point about the inability of generating some image that would contradict the training set distribution with the current deep learning paradigm

Generating an image

Michaël Trazzi3y10

I have never thought of such a race. I think this comment is worth its own post.

1Timothy Underwood3y

The link is the post where I recently encountered the idea.

Where I agree and disagree with Eliezer

Michaël Trazzi3y63

Datapoint: I skimmed through Eliezer's post, but read this one from start to finish in one sitting. This post was for me the equivalent of reading the review of a book I haven't read, where you get all the useful points and nuance. I can't stress enough how useful that was for me. Probably the most insightful post I have read since "Are we in AI overhang".

Blake Richards on Why he is Skeptical of Existential Risk from AI

Michaël Trazzi3y30

Thanks for bringing up the rest of the conversation. It is indeed unfortunate that I cut out certain quotes from their full context. For completness sake, here is the full excerpt without interruptions, including my prompts. Emphasis mine.

Michaël: Got you. And I think Yann LeCun’s point is that there is no such thing as AGI because it’s impossible to build something truly general across all domains.
Blake: That’s right. So that is indeed one of the sources of my concerns as well. I would say I have two concerns with the terminology AGI, but let’s start with

Michaël Trazzi3y150

The goal of the podcast is to discuss why people believe certain things while discussing their inside views about AI. In this particular case, the guest gives roughly three reasons for his views:

the no free lunch theorem showing why you cannot have a model that outperforms all other learning algorithms across all tasks.
the results from the Gato paper where models specialized in one domain are better (in that domain) than a generalist agent (the transfer learning, if any, did not lead to improved performance).
society as a whole being similar to some "genera

... (read more)

8Viliam3y

Humans specialize, because their brains are limited. If an AI with certain computing capacity has to choose whether to be an expert at X or an expert at Y, an AI with twice as much capacity could choose to be an expert on both X and Y. From this perspective, maybe a human-level AI is not a risk, but a humankind-level AI could be.

Why all the fuss about recursive self-improvement?

Michaël Trazzi3y50

For people who think that agenty models of recursive self-improvement do not fully apply to the current approach to training large neural nets, you could consider {Human,AI} systems already recursively self-improving through tools like Copilot.

AGI Safety FAQ / all-dumb-questions-allowed thread

Michaël Trazzi3y30

I believe the Counterfactual Oracle uses the same principle

Shortform

Michaël Trazzi3y10

I think best way to look at it is climate change way before it was mainstream

On saving one's world

Michaël Trazzi3y220

I found the concept of flailing and becoming what works useful.

I think the world will be saved by a diverse group of people. Some will be high integrity groups, other will be playful intellectuals, but the most important ones (that I think we currently need the most) will lead, take risks, explore new strategies.

In that regard, I believe we need more posts like lc's containment strategy one or the other about pulling the fire alarm for AGI. Even if those plans are different than the ones the community has tried so far. Integrity alone will not save the world. A more diverse portfolio might.

Why I'm Optimistic About Near-Term AI Risk

Michaël Trazzi3y50

Note: I updated the parent comment to take into account interest rates.

In general, the way to mitigate trust would be to use an escrow, though when betting on doom-ish scenarios there would be little benefits in having $1000 in escrow if I "win".

For anyone reading this who also thinks that it would need to be >$2000 to be worth it, I am happy to give $2985 at the end of 2032, aka an additional 10% to the average annual return of the S&P 500 (ie 1.1 * (1.105^10 * 1000)), if that sounds less risky than the SPY ETF bet.

Why I'm Optimistic About Near-Term AI Risk

Michaël Trazzi3y*50

For anyone of those (supposedly) > 50% respondents claiming a < 10% probability, I am happy to take 1:10 odds $1000 bet for:

"by the end of 2032, fewer than a million humans are alive on the surface of the earth, primarily as a result of AI systems not doing/optimizing what the people deploying them wanted/intended"

~~Where, similar to Bryan Caplan's bet with Yudwkosky, I get paid like $1000 now, and at the end of 2032 I give them back, adding 100 dollars.~~

~~(Given inflation and interest, this seems like a bad deal for the one giving the money now, though I~~... (read more)

5Rohin Shah3y

This seems like a terrible deal even if I'm 100% guaranteed to win, I could do way better than a ~1% rate of return per year (e.g. buying Treasury bonds). You'd have to offer > $2000 before it seemed plausibly worth it. (In practice I'm not going to take you up on this even then, because the time cost in handling the bet is too high. I'd be a lot more likely to accept if there were a reliable third-party service that I strongly expected to still exist in 10 years that would deal with remembering to follow up in 10 years time and would guarantee to pay out even if you reneged or went bankrupt etc.)

Why I'm Optimistic About Near-Term AI Risk

Michaël Trazzi3y140

Thanks for the survey. Few nitpicks:
- the survey you mention is ~1y old (May 3-May 26 2021). I would expect those researchers to have updated from the scaling laws trend continuing with Chinchilla, PaLM, Gato, etc. (Metaculus at least did update significantly, though one could argue that people taking the survey at CHAI, FHI, DeepMind etc. would be less surprised by the recent progress.)

- I would prefer the question to mention "1M humans alive on the surface on the earth" to avoid people surviving inside "mine shafts" or on Mars/the Moon (similar to the Br... (read more)