All of Ariel_'s Comments + Replies

Sure! and yeah regarding edits - I have not gone through the full request for feedback yet, I expect to have a better sense late next week of which contributions are most needed and how to prioritize. I mainly wanted to comment first on obvious things that stood out to me from the post.

 There is also an Evals workshop in Brussels on Monday where we might learn more. I've know of some some non-EU based technical safety researchers who are attending, which is great to see. 

I'd suggest updating the language in the post to clarify things and not overstate :)

Regarding the 3rd draft - opinions varied between people I work with but we are generally happy. Loss of Control is included in the selected systemic risks, as well as CBRN. Appendix 1.2 also has useful things, though some valid concerns got raised there on compatibility with the AI Act language that still need tweaking (possobly merging parts of 1.2 into selected systemic risks). As far as interpretability - the code is meant to be outcome based, and the main reason evals ... (read more)

1Katalina Hernandez
I was reading "The Urgency of Interpretability" by Dario Amodei, and the following part made me think about our discussion.    Although I agree with you that (at this stage), regulatory requirement to disclose interpretability techniques used to test models before release would not be very useful for for outcome-based CoPs.  But I hope that there is a path forward for this approach in the near future.
1Katalina Hernandez
Thanks a lot for your follow up. I'd love to connect on LinkedIn if that's okay, I'm very  grateful for your feedback! I'd say: "I believe that more feedback from alignment and interpretability researchers is needed" instead. Thoughts?

FYI I wouldn't say at all that AI safety is under-represented in the EU (if anything, it would be easier to argue the opposite). Many safety orgs (including mine) supported the Codes of Practice, and almost all the Chairs and vice chairs are respected governance researchers. But probably still good for people to give feedback, just don't want to give the impression that this is neglected.

Also no public mention of intention to sign the code has been made as far as I know. Though apart from copyright section, most of it is in line with RSPs, which makes signing more reasonable. 

1Katalina Hernandez
Thank you, Ariel! I guess I've let my personal opinion shine through. I do not see many regulatory efforts in general on verbalizing alignment or interpretability necessities or translating them into actionable compliance requirements. The AI Act mentions alignment vaguely, for example.  And as far as I saw, the third draft of the Codes (Safety & Security) mentions alignment / misalignment with a "this may be relevant to include" tone rather than providing specifics as to how GenAI providers are expected to document individual misalignment risks and appropriate mitigation strategies. And interpretability / mech interp is not mentioned at all, not even in the context of model explainability or transparency.  This is why I hoped to see feedback from this community: to know whether I am over-estimating my concerns.

Good point. Thinking of robotics overall, it's much more of a bunch of small stuff than one big thing. Though it depends how far you "zoom out" I guess. Technically Linear Algebra itself, or the Jacobian, is an essential element of robotics. But could also zoom in on a different aspect and then say that "zero backlash gearboxes" (where Harmonic Drive is notable as it's much more compact and accurate than prev versions - but perhaps a still small effect in the big picture) are the main element. Or PID control, or high resolution encoders.

I'm not quite sure ... (read more)

Other examples of fields like this include: medicine, mechanical engineering, education, SAT solving, and computer chess.

To give a maybe helpful anecdote - I am a mechanical engineer (though I now work in AI governance), and in my experience that isnt true at least for R&D (e.g. a surgical robot) where you arent just iterating or working in a highly standardized field (aerospace, hvac, mass manufacturing etc). The "bottleneck" in that case is usually figuring out the requirements (e.g. which surgical tools to support? whats the motion range, design env... (read more)

2ryan_greenblatt
Importantly, this is an example of developing a specific application (surgical robot) rather than advancing the overall field (robots in general). It's unclear whether the analogy to an individual application or an overall field is more appropriate for AI safety.

I had a great time at AISC8. Perhaps I would still find my way into a full time AI Safety position without it, but i'd guess at least 1 year later and significantly less neglected opportunity. My AI Safety Camp project later became the AI Standards Lab. 
I know several others who benefitted quite a bit from it. 

1Remmelt
Good to know. I also quoted your more detailed remark on AI Standards Lab at the top of this post.

Do you think there's some initial evidence for that? E.g. Voyager or others from Deepmind. Self play gets thrown around a lot, not sure if concretely we've seen much yet for LLMs using it.

But yes agree, good point regarding strategy games being a domain that could be verifiable

I was fairly on board with control before, I think my main remaining concern is the trusted models not being good enough. But with more elaborate control protocols (Assuming political/AI labs actually make an effort to implement), catching an escape attempt seems more likely if the model's performance is very skewed to specific domains. Though yeah I agree that some of what you mentioned might not have changed, and could still be an issue

Good post, thanks for writing!

With o1, and now o3, It seems fairly plausible now that there will be a split between "verifiable" capabilities, and general capabilities. Sure, there will be some cross-pollination (transfer), but this might have some limits.

What then? Can a superhuman mathematical + Coding AI also just reason through political strategy, or will it struggle and make errors/fallback on somewhat generic ideas in training data? 

Can we get a "seed-AI style" consequentialist in some domains, while it fails to perform above human level in others? I'd like to believe reason... (read more)

6ryan_greenblatt
I'd say control seems easier now, but it's unclear if this makes the agenda more promising. You might have thought one issue with the agenda is that control is likely to be trivial and thus not worth working on (and that some other problem, e.g., doing alignment research with AI labor regardless of whether the AIs are scheming is a bigger issue).
3Nathan Helm-Burger
I don't think o3 represents much direct progress towards AGI, but I think it does represent indirect progress. Math and code specialist speeds up AI R&D, which then shortens time to AGI.
4Matt Goldenberg
We haven't yet seen what happens when they turn to the verifiable property of o3 to self-play on a variety of strategy games. I suspect that it will unlock a lot of general reasoning and strategy

Signal boost for the "username hiding" on homepage feature in settings - it seems cool, will see if it changes how I use LW.

I wonder also about a "hide karma by default". Though less sure if that will actually achieve the intended purpose, as karma can be a good filter when just skimming comments and not reading in detail. 

LW Feature request/idea for a feature - In posts that have lots of in-text links to other posts,  perhaps add an LLM 1-2 sentence (context informed) summary in the hover preview? 

I assume that for someone who has been around the forum for many years, various posts are familiar enough that name-dropping them in a link is sufficient to give context. But If I have to click a link and read 4+ other posts as I am going through one post, perhaps the LW UI can fairly easily build in that feature. 

(suggesting it as a features since it does seem like LW is a place that experiments with various features not too different from this - ofc, I can always ask for a LLM summary manually myself if I need to) 

2RobertM
At some point in the last couple months I was tinkering with a feature that'd try to show you a preview of the section of each linked post that'd be most contextually relevant given where it was linked from, but it was both technically fiddly and the LLM reliability is not that great.  But there might be something there.

I have the Boox Nova Air (7inch) for nearly 2 years now - a bit small for reading papers but great for books and blog posts. You can run google play apps, and even set up a google drive sync to automatically transfer pdfs/epubs onto it. At some point I might get the 10inch version (the Note Air). 

Another useful feature is taking notes inside pdfs, by highlighting and then handwriting the note into the Gboard handwrite-to-text keyboard. Not as smooth as on an iPad, but pretty good way to annotate a paper. 

This was very interesting, looking forward to the follow up!

In the "AIs messing with your evaluations" (and checking for whether the AI is capable of/likely to do so) bit, I'm curious if there is any published research on this.

8ryan_greenblatt
The closest existing thing I'm aware of is password locked models. Redwood Research (where I work) might end up working on this topic or similar sometime in the next year. It also wouldn't surprise me if Anthropic puts out a paper on exploration hacking somewhat soon.

Hmm, in that case maybe I misunderstood the post, my impression wasnt that he was saying AI literally isn't a science anymore, but more that engineering work is getting too far ahead of the science part - and that in practice most ML progress now is just ML Engineering, where understanding is only a means to an end (and so is not as deep as it would be if it was science first).

I would guess that engineering gets ahead of science pretty often, but maybe in ML it's more pronounced - hype/money investment, as well as perhaps the perceived relative low stakes ... (read more)

While theoretical physics is less "applied science" than chemistry, there's still a real difference between chemistry and chemical engineering.

For context, I am a Mechanical Engineer, and while I do occasionally check the system I am designing and try to understand/verify how well it is working, I am fundamentally not doing science. The main goal is solving a practical problem (i.e. as little theoretical understanding as is sufficient), where in science the understanding is the main goal, or at least closer to it.

5Shankar Sivarajan
Certainly, I understand this science vs. engineering, pure vs. applied, fundamental vs. emergent, theoretical vs. computational vs. observational/experimental classification is fuzzy: relevant xkcd, smbc. Hell, even the math vs. physics vs. chemistry vs. biology distinctions are fuzzy! What I am saying is that either your definition has to be so narrow as to exclude most of what is generally considered "science," (à la Rutherford, the ironically Chemistry Nobel Laureate) or you need to exclude AI via special pleading. Specifically, my claim is that AI research is closer to physics (the simulations/computation end) than chemistry is. Admittedly, this claim is based on vibes, but if pressed, I could probably point to how many people transition from one field to the other.
6Linch
The canonical source for this is What Engineers Know and How They Know It, though I confess to not actually reading the book myself.

So basically, post hoc, ergo propter hoc (post hoc fallacy)

If winning happened after rationality (in this case, any action you judge to be rational under any definition you prefer) it does not mean it happened because of it.

This was a great read! Personally I feel like it ended too quickly  - even without going into gruesome details, I felt like 1 more paragraph or so of concluding bits in the story was needed. But, overall I really enjoyed it. 

1Karl von Wendt
Thank you!

I'm trying to think of ideas here. As a recap of what I think the post says:

  • China is still very much in the race
  • There is very little AI safety activities there, possibly due to a lack of reception of EA ideas.

^let me know if I am understanding correctly.

Some ideas/thoughts:

  • it seems to me that many in AI Safety or in other specific "cause areas" are already dissociating from EA, though not much from LW.
  • I am not sure if we should expect mainstream adoption of AI Safety ideas (its not really mainstream in the west, nor is EA, or LW).
  • It seems like the
... (read more)
4Lao Mein
I'm saying that AI Safety really can't take off as a field in China without a cadre of well-paid people working on technical alignment. If people were working on, say, interpretability work here for a high wage ($50,000-$100,000 is considered a lot for a PHD in the field), it would gain prestige and people would take it seriously.  Otherwise it just sounds like LARP.  That's how you do field building in China. You don't go around making speeches, you hire people. My gut feeling is that hiring 10 expats in Beijing to do "field building" gets less field building done than hiring 10 college grads in Shanghai to do technical alignment work. 
3Lichdar
My experience is that the Chinese(I am one) will disassociate with the "strange part" of EA, such as mental uploading or minimization of suffering or even life extension: the basic conservative argument for species extension and life as human beings is what works. CN is fundamentally conservative in that sense. The complications are not many and largely revolve around: 1. How is this good for the Party. 2. How is this good for "keeping things in harmony/close to nature/Taoism" 3. Emotional appeals for safety and effort. The story of effort = worth is strong there, and devaluation of human effort leads to reactions of disgust.

Working at a startup made me realize how little we can actually "reason through" things to get to a point where all team members agree. Often there's too little time to test all assumptions, if it's even doable at all. Part of the role of the CEO is to "cut" these discussions when it's evident that spending more time on it is worse than proceeding despite uncertainty. If we had "the facts", we might find it easier to agree. But in an uncertain environment, many decisions come down to the intuition (hopefully based on reliable experience - such as founding ... (read more)