LESSWRONG
LW

All of simeon_c's Comments + Replies

Alexander Gietelink Oldenziel's Shortform

I'm not 100% sure about the second factor but the first is definitely a big factor. There's no institution which is more dense in STEM talent than ENS to my knowledge, and elites there are extremely generalist compared to equivalent elites I've met in other countries like the US (e.g. MIT) for instance. The core of "Classes Préparatoires" is that it pushes even the world best people to grind like hell for 2 years, including weekends, every evenings etc.

ENS is the result of: push all your elite to grind like crazy for 2 years on a range of STEM topics, and then select the top 20 to 50.

Common misconceptions about OpenAI

simeon_c5mo3-6

250 upvotes is also crazy high. Another sign of the disastrous abilities of EA/LessWrong communities at character judgment.

The same is right now happening before our eyes on Anthropic. And similar crowds are as confidently asserting that this time they're really the good guys.

Ben Pace5mo112

I am somewhat confused about this.

To be clear I am pro people from organizations I think are corrupt showing up to defend themselves, so I would upvote it if it had like 20 karma or less.

I would point out that the comments criticizing the organization’s behavior and character are getting similar vote levels (e.g. top comment calls OpenAI reckless and unwise and 185 karma and 119 agree-vote).

8habryka5mo

I think people were happy to have the conversation happen. I did strong-downvote it, but I don't think upvotes are the correct measure here. If we had something like agree/disagree-votes on posts, that would have been the right measure, and my guess is it would have overall been skewed pretty strongly into the disagree-vote diretion.

Should there be just one western AGI project?

simeon_c5mo2-1

I just skimmed but just wanted to flag that I like Bengio's proposal of one coordinated coalition that develops several AGIs in a coordinated fashion (e.g. training runs at the same time on their own clusters), which decreases the main downside of having one single AGI project (power concentration).

0rosehadshar5mo

Thanks, this seems cool and I hadn't seen it.

Responsible Scaling Policies Are Risk Management Done Wrong

simeon_c5mo15-4Review for 2023 Review

I still agree with a lot of that post and am still essentially operating on it.

I also think that it's interesting to read the comments because at the time the promise of those who thought my post was wrong was that Anthropic's RSP would get better and that this was only the beginning. With RSP V2 being worse and less specific than RSP V1, it's clear that this was overoptimistic.

Now, risk management in AI has also gone a lot more mainstream than it was a year ago, in large parts thanks to the UK AISI who started operating on it. People have also... (read more)

Daniel Kokotajlo's Shortform

simeon_c6mo42

I'd be interested in also exploring model-spec-style aspirational documents too.

Happy to do a call on model-spec-style aspirational documents if it's any relevant. I think this is important and we could be interested in helping develop a template for it if Anthropic was interested in using it.

Behavioral red-teaming is unlikely to produce clear, strong evidence that models aren't scheming

simeon_c7mo60

Thanks for writing this post. I think the question of how to rule out risk post capability thresholds has generally been underdiscussed, despite it being probably the hardest risk management question with Transformers. In a recent paper, we coin "assurance properties" the research directions that are helpful for this particular problem.

Using a similar type of thinking applied to other existing safety techniques, it seems to me like interpretability is one of the only current LLM safety directions that can get you a big Bayes factor.

The second o... (read more)

Advice for journalists

simeon_c7mo6-4

This article fails to account for the fact that abiding by the rules suggested would mostly kill the ability of journalists to share the most valuable information they share with the public.

You don't get to reveal stuff from the world most powerful organizations if you double check the quotes with them.

I think journalism is one of the professions where the consequentialist vs deontological ethics have the toughest trade-offs. It's just really hard to abide by very high privacy standards and broke highly important news.

As one illustrative example, your standard would have prevented Kelsey Piper from sharing her conversation with SBF. Is that a desirable outcome? Not sure.

8ChristianKl7mo

I don't think the most important stuff that journalist reveal comes from the people who misspeak in interviews. It rather comes from the journalist having strong relationships with sources that are willing to tell the journalist about real problems. Investigative journalism is often about finding someone within an organization that's actually cares about exposing problems, and it's quite important to be portray the position of the person who's exposing the problems as accurately as possible to affect real change. If the journalists makes mistakes in portraying the positions it's a lot easier for a company to talk the problem away then when the problem is accurately described.

abstractapplic's Shortform

simeon_c7mo71

Personally I use a mix of heuristics based on how important the new idea is, how rapid it is and how painful it will be to execute it in the future once the excitement dies down.

The more ADHD you are and the more the "burst of inspired-by-a-new-idea energy" effect is strong, so that should count.

simeon_c's Shortform

simeon_c8mo40

do people have takes on the most useful metrics/KPIs that could give a sense of how good are the monitoring/anti-misuse measures on APIs?

Some ideas:
a) average time to close an account conducting misuse activities (my sense is that as long as this is >1 day, there's little chance to avoid that state actors use API-based models for a lot of misuse (everything which doesn't require major scale))

b) the logs of the 5 accounts/interactions that have been ranked as highest severity (my sense is that incident reporting like OpenAI/Microsoft have done on c... (read more)

Would catching your AIs trying to escape convince AI developers to slow down or undeploy?

simeon_c8mo810

This looks to be overwhelmingly the most likely in my opinion and I'm glad someone wrote this post. Thanks Buck

Neel Nanda's Shortform

simeon_c10mo1613

Thanks for answering, that's very useful.

My concern is that as far as I understand, a decent number of safety researchers are thinking that policy is the most important area, but because, as you mentioned, they aren't policy experts and don't really know what's going on, they just assume that Anthropic policy work is way better than those actually working in policy judge it to be. I've heard from a surprisingly high number of people among the orgs that are doing the best AI policy work that Anthropic policy is mostly anti-helpful.

Somehow though... (read more)

Neel Nanda's Shortform

simeon_c10mo1114

How aware were you (as an employee) & are you (now) of their policy work? In a world model where policy is the most important stuff, it seems to me like it could tarnish very negatively Anthropic's net impact.

Neel Nanda10mo162

I don't quite understand the question. I've heard various bits of gossip, both as an employee and now. I wouldn't say I'm confident in my understanding of any of it. I was somewhat sad about Jack and Dario's public comments about thinking it's too early to regulate (if I understood them correctly), which I also found surprising as I thought they had fairly short timelines, but policy is not at all my area of expertise so I am not confident in this take.

I think it's totally plausible Anthropic has net negative impact, but the same is true for almost any sig... (read more)

simeon_c's Shortform

simeon_c10mo20

This is the best alignment plan I've heard in a while.

simeon_c's Shortform

simeon_c10mo140

You are a LessWrong reader, want to push humanity's wisdom and don't know how to do so? Here's a workflow:

Pick an important topic where the entire world is confused
Post plausible sounding takes with a confident tone on it
Wait for Gwern's comment on your post
Problem solved

See an application of the workflow here: https://www.lesswrong.com/posts/epgCXiv3Yy3qgcsys/you-can-t-predict-a-game-of-pinball?commentId=wjLFhiWWacByqyu6a

6Seth Herd10mo

This workflow appears to work quite well. This suggests that we should collectively try to irritate Gwern into solving the alignment problem. This is a big problem, so we'll have to apply the workflow iteratively. First post a confident argument that it's impossible for specific reasons, then make similarly overconfident posts on increasingly specific arguments about how subsets of the problem are effectively impossible. The remainder will be the best available solution. That's a pretty funny example. Lots of us could've recognized the central flaw (I'm pretty alert to intimidation by physics/math and drawing much broader conclusions than the proof allows), but it would take Gwern to disprove it so eloquently, thoroughly, and with such thorough references. So, we should all be like Gwern. Merely devote our lives to the pursuit not just of knowledge, but well-organized knowledge that's therefore cumulative. Recognize that nobody will pay us to do this, so live cheaply to minimize time wasted making a living (and probably distortions in rationality from having goals outside of knowledge). I don't know Gwern's story in any more detail than that, which is my recollection of what he's said about himself. (The distortions of rationality is my addition; I really need to write up my research and thinking on motivated reasoning.)

simeon_c's Shortform

simeon_c10mo20

Playing catch-up is way easier than pushing the frontier of LLM research. One is about guessing which path others took, the other one is about carving a path among all the possible ideas that could work.

If China stopped having access to US LLM secrets and had to push the LLM frontier rather than playing catch up, how slower would it be at doing so?

My guess is at least >2x and probably more but I'd be curious to get takes.

2Vladimir_Nesov10mo

Since the scaling experiment is not yet done, it remains possible that long-horizon agency is just a matter of scale even with current architectures, no additional research necessary. In which case additional research helps save on compute and shape the AIs, but doesn't influence ability to reach the changeover point, when the LLMs take the baton and go on doing any further research on their own. Distributed training might be one key milestone that's not yet commoditized, making individual datacenters with outrageous local energy requirements unnecessary. And of course there's the issue of access to large quantities of hardware.

1O O10mo

They are pushing the frontier (https://arxiv.org/abs/2406.07394), but it’s hard to say where they would be without llamas. I don’t think they’d be much far behind. They have gpt-4 class models as is and also don’t care about copyright restrictions when training models. (Arguably they have better image models as a result)

Non-Disparagement Canaries for OpenAI

simeon_c1y4352

Great initiative! Thanks for leading the charge on this.

AI companies aren't really using external evaluators

simeon_c1y20

Jack Clark: “Pre-deployment testing is a nice idea but very difficult to implement,” from https://www.politico.eu/article/rishi-sunak-ai-testing-tech-ai-safety-institute/

2Zach Stein-Perlman1y

Possibly he didn’t just mean technically difficult. And possibly Politico took this out of context. But I agree this quote seems bad and clarification would be nice.

simeon_c's Shortform

simeon_c1y20

Thanks for the answer it makes sense.

To be clear I saw it thanks to Matt who did this tweet so credit goes to him: https://x.com/SpacedOutMatt/status/1794360084174410104?t=uBR_TnwIGpjd-y7LqeLTMw&s=19

simeon_c's Shortform

simeon_c1y240

Lighthaven City for 6.6M€? Worth a look by the Lightcone team.

https://x.com/zillowgonewild/status/1793726646425460738?t=zoFVs5LOYdSRdOXkKLGh4w&s=19

Ben Pace1y112

Glad you're keeping your eye out for these things!

It's 8 hours away from the Bay, which all-in is not that different from a plane flight to NY from the Bay, so the location doesn't really help with being where all the smart and interesting people are.

Before we started the Lightcone Offices we did a bunch of interviews to see if all the folks in the bay-area x-risk scene would click a button to move to the Presidio District in SF (i.e. imagine Lightcone team packs all your stuff and moves it for you and also all these other people in the scene move too) and... (read more)

Daniel Kokotajlo's Shortform

simeon_c1y2015

Thanks for sharing. It's both disturbing from a moral perspective and fascinating to read.

6Daniel Kokotajlo1y

Yep. Anyone have any idea why Golden Gate Claude starts skipping spaces sometimes?

AI companies aren't really using external evaluators

simeon_c1y2117

Very important point that wasn't on my radar. Thanks a lot for sharing.

simeon_c's Shortform

simeon_c1y*61

So first the 85% net worth thing went quite viral several times and made Daniel Kokotajlo a bit of a heroic figure on Twitter.

Then Kelsey Piper's reporting pushed OpenAI to give back Daniel's vested units. I think it's likely that Kelsey used elements from this discussion as initial hints for her reporting and plausible that the discussion sparked her reporting, I'd love to have her confirmation or denial on that.

7ChristianKl1y

When I was first seeing this post it had 0 karma and -8 disagree votes. It's unclear to me why. Kelsey Piper is a rationalist so it's quite plausible that she did see the discussion and was partly motivated by it. Can anyone who disagrees with simeon's comment argue their position?

simeon_c's Shortform

simeon_c1y5227

I'm not gonna lie, I'm pretty crazily happy that a random quick take I wrote 10m on a Friday morning about how Daniel Kokotajlo should get social reward and get partial refunding sparked a discussion that seems to have caused positive effects wayyyy beyond expectations.

Quick takes is an awesome innovation, it allows to post even when one is still partially confused/uncertain about sthg. Given the confusing details of the situation in that case, this wd pbbly not have happened otherwise.

1cubefox1y

How did you know Daniel Kokotajlo didn't sign the OpenAI NDA and probably lost money?

3lemonhope1y

(My track record of 0% accuracy on which messages will politically snowball is holding up very well. I'm glad that sometimes people like you say things the way you say them, rather than only people like me saying things how I say them.)

2Chris_Leong1y

What kind of effects are you thinking about?

8Stephen Fowler1y

Do you know if there have been any concrete implications (ie. someone giving Daniel a substantial amount of money) from the discussion?'

2MichaelDickens1y

I was just thinking not 10 minutes ago about how that one LW user who casually brought up Daniel K's equity (I didn't remember your username) had a massive impact and I'm really grateful for them. There's a plausible chain of events where simeon_c brings up the equity > it comes to more people's attention > OpenAI goes under scrutiny > OpenAI becomes more transparent > OpenAI can no longer maintain its de facto anti-safety policies > either OpenAI changes policy to become much more safety-conscious, or loses power relative to more safety-conscious companies > we don't all die from OpenAI's unsafe AI. So you may have saved the world.

Stephen Fowler's Shortform

simeon_c1y95

Mhhh, that seems very bad for someone in an AISI in general. I'd guess Jade Leung might sadly be under the same obligations...

That seems like a huge deal to me with disastrous consequences, thanks a lot for flagging.

OpenAI releases GPT-4o, natively interfacing with text, voice and vision

simeon_c1y42

Right. Thanks for putting the full context. Voluntary commitments refers to the WH commitments which are much narrower than the PF so I think my observation holds.

OpenAI releases GPT-4o, natively interfacing with text, voice and vision

simeon_c1y40

Agreed. Note that they don't say what Martin claim they say, but they only say

We’ve evaluated GPT-4o according to our Preparedness Framework

I think it's reasonably likely to imply that they broke all their non-evaluation PF commitments, while not being technically wrong.

Zach Stein-Perlman1y139

Full quote:

We’ve evaluated GPT-4o according to our Preparedness Framework and in line with our voluntary commitments. Our evaluations of cybersecurity, CBRN, persuasion, and model autonomy show that GPT-4o does not score above Medium risk in any of these categories. This assessment involved running a suite of automated and human evaluations throughout the model training process. We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities.
GPT-4o has also und

... (read more)

simeon_c's Shortform

simeon_c1y13792

Idea: Daniel Kokotajlo probably lost quite a bit of money by not signing an OpenAI NDA before leaving, which I consider a public service at this point. Could some of the funders of the AI safety landscape give some money or social reward for this?

I guess reimbursing everything Daniel lost might be a bit too much for funders but providing some money, both to reward the act and incentivize future safety people to not sign NDAs would have a very high value.

habryka1y10257

@Daniel Kokotajlo If you indeed avoided signing an NDA, would you be able to share how much you passed up as a result of that? I might indeed want to create a precedent here and maybe try to fundraise for some substantial fraction of it.

simeon_c's Shortform

simeon_c1y10

I mean the full option space obviously also includes "bargain with Russia and China to make credible commitments that they stop rearming (possibly in exchange for something)", and I think we should totally explore that path aswell, I just don't have much hope in it at this stage which is why I'm focusing on the other option, even if it is a fucked up local nash equilibrium.

simeon_c's Shortform

simeon_c1y10

I've been thinking a lot recently about taxonomizing AI risk related concepts to reduce the dimensionality of AI threat modelling while remaining quite comprehensive. It's in the context of developing categories to assess whether labs plans cover various areas of risk.

There are two questions I'd like to get takes on. Any take on one of these 2 wd be very valuable.

In the misalignment threat model space, a number of safety teams tend to assume that the only type of goal misgeneralization that could lead to X-risks is deceptive misalignment. I'm not sure to u

... (read more)

simeon_c's Shortform

simeon_c1y10

Rephrasing based on an ask: "Western Democracies need to urgently put a hard stop to Russia and China war (preparation) efforts" -> Western Democracies need to urgently take actions to stop the current shift towards a new World order where conflicts are a lot more likely due to Western democracies no longer being a hegemonic power able to crush authoritarians power that grab land etc. This shift is currently primarily driven by the fact that Russia & China are heavily rearming themselves whereas Western democracies are not.

@Elizabeth

2jmh1y

I'm unsure if the rephrasing is really helpful or if perhaps actually counter productive. Ithink the conflict and arming is in many ways the symptom and so the focus on that not going to be a solution. Additionally, that language seems to play directly into the framing both the Russian government and the Chinese goverment are framing things.

How did you integrate voice-to-text AI into your workflow?

Answer by simeon_cApr 10, 202430

I liked this extension (https://chrome.google.com/webstore/detail/whispering/oilbfihknpdbpfkcncojikmooipnlglo), which I use for long messages. I press a shortcut, it starts recording with Whisper, then repress and it puts the transcript in my clipboard.

simeon_c's Shortform

simeon_c1y31

In those, Ukraine committed to pass laws for Decentralisation of power, including through the adoption of the Ukrainian law "On temporary Order of Local Self-Governance in Particular Districts of Donetsk and Luhansk Oblasts". Instead of Decentralization they passed laws forbidding those districts from teaching children in the languages that those districts wants to teach them.
Ukraines unwillingness to follow the agreements was a key reason why the invasion in 2022 happened and was very popular with the Russian population

I ignored that, that's useful,... (read more)

2quiet_NaN1y

I am sure that Putin had something like the Anschluss in mind when he started his invasion. Luckily for the west, he was wrong about that. From a Machiavellian perspective, the war in Ukraine is good for the West: for a modest investment in resources, we can bind a belligerent Russia while someone else does all the dying. From a humanitarian perspective, war is hell and we should hope for a peace where Putin gets whatever he has managed to grab while the rest of Ukraine joins NATO and will be protected by NATO nukes from further aggression. I am also not sure that a conventional arms race is the answer to Russia. I am very doubtful that a war between a NATO member and Russia would stay a regional or conventional conflict.

6ChristianKl1y

The key aspect of Minsk was that it was not put into practice. The annexation of Austria by Germany was fully put into practice and accepted by other states. Ukraine didn't try. They didn't pass the laws that Minsk called for. They did pass laws to discriminate against the Russian-speaking population. They said that they wanted to retake Crimea sooner or later. Ukraine never accepted losing any territory to Russia. I don't see why we should ignore reasons. Georgia seems to be willing to produce reasons to be invaded. Maybe, Georgia shouldn't pass such laws? If you are worried about being invaded under the pretext of removing civil rights, maybe not remove civil rights? I don't think any of the EU countries that border Russia have a situation that's remotely similar in either reasons to invade or in ability to launch a promising invasion against them by Russia.

simeon_c's Shortform

simeon_c1y12

Indeed. One consideration is that the LW community used to be much less into policy adjacent stuff and hence much less relevant on that domain. Now, with AI governance becoming an increasingly big deal, I think we could potentially use some of that presence to push for certain things in defense.

Pushing for things in the genre of what Noah describes in the first piece I shared seems feasible for some people in policy.

simeon_c's Shortform

simeon_c1y*4732

Idk what the LW community can do but somehow, to the extent we think liberalism is valuable, the Western democracies need to urgently put a hard stop to Russia and China war (preparation) efforts. I fear that rearmament is a key component of the only viable path at this stage.

I won't argue in details here but link to Noahpinion, who's been quite vocal on those topics. The TLDR is that China and Russia have been scaling their war industry preparation efforts for years, while Western democracies industries keep declining and remain crazily dependent from the... (read more)

1simeon_c1y

2jmh1y

On the AI aspect I suspect we could make a small case study out of Israel's use of their AI.

2jmh1y

I wonder if potential war is the greatest concern with regard to either loss of liberalism or sites like LW. Interesting news story on views about democratic electoral processes and public trust in them as well as trust that the more democratic form of government will accomplish what it needs to. (Perhaps a lot of reading into with that particular summary but simplist was I could express the summary.) I've not read the report so not sure if the headline is actually accurate about election -- certainly what is reported in the story doesn't quite support the "voters skeptical about fairness of elections" headline claim. The rest does seem to align with lots of news and events over the past 5 or 10 years.

1lesswronguser1231y

This may be likely, iirc during wars countries tend to spend more on research and they could potentially just race to AGI like what happened with space race. Which could make hard takeoff even more likely.

Nathan Helm-Burger1y1916

Something which concerns me is that transformative AI will likely be a powerful destabilizing force, which will place countries currently behind in AI development (e.g. Russia and China) in a difficult position. Their governments are currently in the position of seeing that peacefully adhering to the status quo may lead to rapid disempowerment, and that the potential for coercive action to interfere with disempowerment is high. It is pretty clearly easier and cheaper to destroy chip fabs than create them, easier to kill tech employees with potent engineeri... (read more)

7ChristianKl1y

How do you draw that conclusion from the Minsk agreements? In those, Ukraine committed to pass laws for Decentralisation of power, including through the adoption of the Ukrainian law "On temporary Order of Local Self-Governance in Particular Districts of Donetsk and Luhansk Oblasts". Instead of Decentralization they passed laws forbidding those districts from teaching children in the languages that those districts wants to teach them. Ukraines unwillingness to follow the agreements was a key reason why the invasion in 2022 happened and was very popular with the Russian population. Being in denial about that is not helpful is you want to help prevent wars from breaking out. Having maximalist foreign policy goals is not the way you get peace. The latest illegal land grab was done by Israel without any opposition by the US. If you are truly worried about land grabs being a problem why not speak against that US position of being okay with some land grabs instead of just speaking for buying more weapons?

5matto1y

When I last looked a couple of months back, I found very little discussion of this topic in the rationalist communities. The most interesting post was probably this one from 2021: https://forum.effectivealtruism.org/posts/8cr7godn8qN9wjQYj/decreasing-populism-and-improving-democracy-evidence-based I supposed it's not a popular topic because it rubs up against politics. But I do think that liberal democracy is the operating system for running things like LW, EA, and other communities we all love. It's worth defending it--though what that means exactly is vague to me.

simeon_c's Shortform

simeon_c1y10

If you wanna reread the debate, you can scroll through this thread (https://x.com/bshlgrs/status/1764701597727416448).

simeon_c's Shortform

simeon_c1y32

There was a hot debate recently but regardless, the bottom line is just "RSPs should probably be interpreted literally and nothing else. If a literal statement is not strictly there, it should be assumed it's not a commitment."

I've not seen people doing very literal interpretation on those so I just wanted to emphasize that point.

Raemon1y106

I currently think Anthropic didn't "explicitly publicly commit" to not advance the rate of capabilities progress. But, I do think they made deceptive statements about it, and when I complain about Anthropic I am complaining about deception, not "failing to uphold literal commitments."

I'm not talking about the RSPs because the writing and conversations I'm talking about came before that. I agree that the RSP is more likely to be a good predictor of what they'll actually do.

I think most of the generator for this was more like "in person conversations", at le... (read more)

simeon_c's Shortform

simeon_c1y*171

Given the recent argument on whether Anthropic really did commit to not push the frontier or just misled most people into thinking that it was the case, it's relevant to reread the RSPs in hairsplitting mode. I was rereading the RSPs and noticed a few relevant findings:

Disclaimer: this is focused on negative stuff but does not deny the merits of RSPs etc etc.

I couldn't find any sentence committing to not significantly increase extreme risks. OTOH I found statements that if taken literally could imply an implicit acknowledgment of the opposite: "our most si

... (read more)

4Raemon1y

This debate comes from before the RSP so I don’t actually think that’s cruxy. Will try to dig up an older post.

simeon_c's Shortform

simeon_c1y10

There's a number of properties of AI systems that makes it easier to collect information in a safe way about those systems and hence demonstrate their safety: interpretability, formal verifiability, modularity etc. Which adjective wd you use to characterize those properties?

I'm thinking of "resilience" because from the perspective of an AI developer it helps a lot understanding the risk profile, but do you have other suggestions?

Some alternatives:

auditability properties
legibility properties

Vote on Anthropic Topics to Discuss

simeon_c1y12-5

Unsure how much we disagree Zach and Oliver so I'll try to quantify: I would guess that Claude 3 will cut release date of next gen models from OpenAI by a few months at least (I would guess 3 months), which has significant effects on timelines.

Tentatively, I'm thinking that this effect may be surlinear. My model is that each new release increases the speed of development (bc of increased investment in all the value chain including compute + realization from people that it's not like other technologies etc) and so that a few months now causes more than a few months on AGI timelines.

simeon_c1y10

Oh thanks, I hadn't find it, gonna delete!

Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis

simeon_c1y40

Yeah basically Davidad has not only a safety plan but a governance plan which actively aims at making this shift happen!

Ability to solve long-horizon tasks correlates with wanting things in the behaviorist sense

simeon_c1y30

Thanks for writing that. I've been trying to taboo "goals" because it creates so much confusion, which this post tries to decrease. In line with this post, I think what matters is how difficult a task is to achieve, and what it takes to achieve it in terms of ability to overcome obstacles.

We're Not Ready: thoughts on "pausing" and responsible scaling policies

[+]simeon_c2y-20-12

1evhub2y

The scaling and deployment commitments are two separate sets of commitments with their own specific trigger conditions, which is extremely clear if you read the RSP. The only way I can imagine having this sort of misunderstanding is if you read only my quotes and not the actual RSP document itself.

We're Not Ready: thoughts on "pausing" and responsible scaling policies

simeon_c2y3-4

Because it's meaningless to talk about a "compromise" dismissing one entire side of the people who disagree with you (but only one side!).

Like I could say "global compute thresholds is a robustly good compromise with everyone who disagrees with me"

*Footnote: only those who're more pessimistic than me.

We're Not Ready: thoughts on "pausing" and responsible scaling policies

simeon_c2y73

That may be right but then the claim is wrong. The true claim would be "RSPs seem like a robustly good compromise with people who are more optimistic than me".

And then the claim becomes not really relevant?

6Malo2y

IDK man, this seems like nitpicking to me ¯\_(ツ)_/¯. Though I do agree that, on my read, it’s technically more accurate. My sense here is that Holden is speaking from a place where he considers himself to be among the folks (like you and I) who put significant probability on AI posing a catastrophic/existential risk in the next few years, and “people who have different views from mine” is referring to folks who aren’t in that set. (Of course, I don’t actually know what Holden meant. This is just what seemed like the natural interpretation to me.) Why?

We're Not Ready: thoughts on "pausing" and responsible scaling policies

simeon_c2y163

Holden, thanks for this public post.

I would love if you could write something along the lines of what you wrote in "If it were all up to me, the world would pause now - but it isn’t, and I’m more uncertain about whether a “partial pause” is good" at the top of ARC post, which as we discussed and as I wrote in my post would make RSPs more likely to be positive in my opinion by making the policy/voluntary safety commitments distinction clearer.

Regarding

Responsible scaling policies (RSPs) seem like a robustly good compromise with people who have d

... (read more)

6HoldenKarnofsky1y

Thanks for the thoughts! #1: METR made some edits to the post in this direction (in particular see footnote 3). On #2, Malo’s read is what I intended. I think compromising with people who want "less caution" is most likely to result in progress (given the current state of things), so it seems appropriate to focus on that direction of disagreement when making pragmatic calls like this. On #3: I endorse the “That’s a V 1” view. While industry-wide standards often take years to revise, I think individual company policies often (maybe usually) update more quickly and frequently.

Malo2y125

Responsible scaling policies (RSPs) seem like a robustly good compromise with people who have different views from mine
2. It seems like it's empirically wrong based on the strong pushback RSPs received so that at least you shouldn't call it "robustly", unless you mean a kind of modified version that would accommodate the most important parts of the pushback.

FWIW, my read here was that “people who have different views from mine” was in reference to these sets of people:

Some people think that the kinds of risks I’m worried about are far off, farfetched

simeon_c2y169

Would your concerns be mostly addressed if ARC had published a suggestion for a much more comprehensive risk management framework, and explicitly said "these are the principles that we want labs' risk-management proposals to conform to within a few years, but we encourage less-thorough risk management proposals before then, so that we can get some commitments on the table ASAP, and so that labs can iterate in public. And such less-thorough risk management proposals should prioritize covering x, y, z."

Great question! A few points:

Yes, many of the thin

... (read more)

3M. Y. Zuo2y

Has anyone made such a credible, detailed, and comprehensive list? If not, what would it look like in your opinion?

Responsible Scaling Policies Are Risk Management Done Wrong

simeon_c2y30

Two questions related to it:

What happens in your plan if it takes five years to solve the safety evaluation/deception problem for LLMs (i.e. it's extremely hard)?
Do you have an estimate of P({China; Russia; Iran; North Korea} steals an ASL-3 system with ASL-3 security measures)? Conditional on one of these countries having the system, what's your guess of p(catastrophe)?

Responsible Scaling Policies Are Risk Management Done Wrong

simeon_c2y*42

Thanks Eli for the comment.

One reason why I haven't provided much evidence is that I think it's substantially harder to give evidence of a "for all" claim (my side of the claim) than a "there exists" (what I ask Evan). I claim that it doesn't happen that a framework on a niche area evolves so fast without accidents based on what I've seen, even in domains with substantial updates, like aviation and nuclear.

I could potentially see it happening with large accidents, but I personally don't want to bet on that and I would want it to be transparent if tha... (read more)

Responsible Scaling Policies Are Risk Management Done Wrong

simeon_c2y*102

Thanks for your comment.

I feel like a lot of the issues in this post are that the published RSPs are not very detailed and most of the work to flesh them out is not done.

I strongly disagree with this. In my opinion, a lot of the issue is that RSPs have been thought from first principles without much consideration for everything the risk management field has done, and hence doing wrong stuff without noticing.

It's not a matter of how detailed they are; they get the broad principles wrong. As I argued (the entire table is about this) I think... (read more)

5Lukas Finnveden2y

But most of the deficiencies you point out in the third column of that table is about missing and insufficient risk analysis. E.g.: * "RSPs doesn’t argue why systems passing evals are safe". * "the ISO standard asks the organization to define risk thresholds" * "ISO proposes a much more comprehensive procedure than RSPs" * "RSPs don’t seem to cover capabilities interaction as a major source of risk" * "imply significant chances to be stolen by Russia or China (...). What are the risks downstream of that?" If people took your proposal as a minimum bar for how thorough a risk management proposal would be, before publishing, it seems like that would interfere with labs being able to "post the work they are doing as they do it, so people can give feedback and input". This makes me wonder: Would your concerns be mostly addressed if ARC had published a suggestion for a much more comprehensive risk management framework, and explicitly said "these are the principles that we want labs' risk-management proposals to conform to within a few years, but we encourage less-thorough risk management proposals before then, so that we can get some commitments on the table ASAP, and so that labs can iterate in public. And such less-thorough risk management proposals should prioritize covering x, y, z."

LESSWRONG
LW

All of simeon_c's Comments + Replies

Responsible scaling policies (RSPs) seem like a robustly good compromise with people who have d

Responsible scaling policies (RSPs) seem like a robustly good compromise with people who have different views from mine