zchuang - LessWrong

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations

Can you share the difference between the times it is aware (I know 33% is the floor) and the times where Claude is not aware/sandbagging?

zchuang's Shortform

zchuang10d7-2

The Claude plays pokemon discourse makes me feel out of touch because I don't understand how this is not an update people had with chess playing LLMs. If anything, I think the data contamination is worse for pokemon because of gamefaq and other guides online.

I think people are mistaking benchmarks and trailheads. Maths benchmarks are important because they theoretically speed up AI research and benchmark to useful intelligence. Claude playing pokemon doesn't tell you much because it's not a map on intelligence nor of generalisation.

China Hawks are Manufacturing an AI Arms Race

zchuang21d90

Sorry, but why did you connect apathy and lying-flat [sic] with fast-follower culture. Lying flat or tang ping is about youth apathy and nihilism about the job market and existential angst about life in the 21st century. It's not about corporate or industrial culture from the top end or specific technological political economy strategies.

Daniel Kokotajlo's Shortform

zchuang24d190

I don't know if this is helpful but as someone who was quite good at competitive Pokemon during their teenage years and also still keeps up with nuzlocking type things for fun, I would note that Pokemon's game design is made to be a low context intensity RPG especially in early generations where the linearity is pushed to allow kids to do it.

If your point holds true on agency, I think the more important pinch points will be Lavender Town and Sabrina because those require backtracking through the storyline to get things.

I think mid-late game GSC would also be important to try because there are huge level gaps and transitions in the storyline that would make it hard to progress.

What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?

zchuang2y60

Sorry but aren't we in a fast takeoff world at the point of WBE. What's the disjunctive world of no recursive self-improvement and WBE?

Statement on AI Extinction - Signed by AGI Labs, Top Academics, and Many Other Notable Figures

zchuang2y20

He posted on a twitter a request to talk to people who feel strongly here.

Clarifying and predicting AGI

zchuang2y30

Yeah, re-reading I realise I was unclear. Given your claim: "by the time we get to 2000 in that, such AGIs will be automating huge portions of AI R&D,". I'm asking the following:

Is the 2000 mark predicated on automation of things we can't envision now (finding secret sauce to singularity) or is it predicated off pushing existing things like AI R&D finds better compute or is it a combination of both?
What's the empirical on the ground representative modal action you're seeing at 2025 from either your vignette (e.g. I found the diplomacy AI super important for grokking what short timelines were to me). I guess it's more asking what you see as the divergence between you and Richard at 2025 that's represented by the difference of 25 and 100.

Hopefully that made the questions clearer.

Clarifying and predicting AGI

zchuang2y30

Sorry for a slightly dumb question but in your part of the table you set 2000 as the year before singularity and your explanation is that 2000-second tasks jump to singularity. Is your model of fast take-off then contingent on there being more special sauce for intelligence being somewhat redundant as a crux because recursive self-improvement is just much more effective. I'm having trouble envisioning a 2000-second task + more scaling and tuning --> singularity.

Additional question is what your model of falsification is for let's say 25-second task vs. 100-second task in 2025 because it seems like reading your old vignettes you really nailed the diplomacy AI part.

Also slightly pedantic but there's a typo on 2029 on Richard's guess.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments