LESSWRONG
LW

All of Quinn's Comments + Replies

Quinn's Shortform

there's an analogy between the zurich r/changemyview curse of evals and the metr/epoch curse of evals. You do this dubiously ethical (according to more US-pilled IRBs or according to more paranoid/pure AI safety advocates) measuring/elicitation project because you might think the world deserves to know. But you had to do dubiously ethical experimentation on unconsenting reddizens / help labs improve capabilities in order to get there--- but the catch is, you only come out net positive if the world chooses to act on this information

How to end credentialism

Quinn10d20

I don't know what legible/transferable evidence would be. I've audited a lot of courses at a lot of different universities. Anecdote, sorry.

How to end credentialism

Quinn11d3-1

One thing I like about this is making the actual difficulty deltas between colleges more felt/legible/concrete (by anyone who takes the exams). What I might do in your system at my IQ level (which is pretty high outside of EA but pretty mediocre inside EA) is knock out a degree at an easy university to get warmed up then study for years for a degree at a hard school^[1].

In real life, I can download or audit courses from whatever university I want, but I don't know what the grading curve is, so when 5/6 exercises are too hard I don't know if that's because I... (read more)

1Ariel10d

For [1], could you point at some evidence, if you have any on hand? My impression from TAing STEM at an Ivy League school is that homework load and the standards for its grading (as with the exams) is very light, compared to what I remember from my previous experience in a foreign state university. It wasn't at all what I expected, and shaped (among other signals of implied preference by the university) my view that the main services the university offers to its current and former students are networking opportunities and a signal of prestige.

Quinn's Shortform

Quinn16d190

I get pretty intense visceral outrage at overreaches in immigration enforcement, just seems the height of depravity. Ive looked for a lot of different routes to mental coolness over the last decade (since Trump started his speeches), they mostly amount to staying busy and distracted. Just seems like a really cost ineffective kind of activism to get involved in. Bankrolling lawyers for random people isn't really in my action space and if it was i'd have opportunity cost to consider.

3Cole Wyeth16d

Unfortunately, it seems that my action space doesn’t include options that matter in this current battle. Personally, my reaction to this kind of insanity is to keep climbing my local status/influence/wealth/knowledge gradient, in the hopes that my actions are relevant in the future. But perhaps it’s a reason to prioritize gaining power - this reminds me of https://www.lesswrong.com/posts/ottALpgA9uv4wgkkK/what-are-you-getting-paid-in

Quinn's Shortform

Quinn1mo20

seems like there's more prior literature than I thought https://en.wikipedia.org/wiki/Role-based_access_control

Quinn's Shortform

Quinn1mo20

are SOTA configuration languages sufficient for AI proliferation?

My main aim is to work on "hardening the box" i.e. eliminating software bugs so containment schemes don't fail for preventable reasons. But in the famous 4o system card example, the one that looks a little like docker exfiltration, the situation arose from user error, wild guess in compose.yaml or the shell script invoking docker run.

In a linux machine

Here's an example nix file

users.users =
    let
      authorized-key-files = [
        "${keyspath}/id_server_ed25519.pub"
        "${keyspat

... (read more)

2Quinn1mo

seems like there's more prior literature than I thought https://en.wikipedia.org/wiki/Role-based_access_control

AI Tools for Existential Security

Quinn1mo20

Examples of promising risk-targeted applications

This section reeks of the guaranteed safe AI agendas, a lot of agreement. For example, using formal methods to harden any box we try to put the AI in is a kind of defensive acceleration that doesn't work (too expensive) until certain pre-ASI stages of development. I'm working on formal verification agents along these lines right now.

Plausibly Factoring Conjectures

Quinn2mo20

@Tyra Burgess and I wrote down a royalty-aware payout function yesterday:

For a type $B$ , let $L (B)$ be the "left closure under implication" or the admissible antecedents. I.e., the set of all the antecedents A in the public ledger such that $A \to B$ . $p : T y p e \to M o n e y$ is the price that a proposition was listed for (admitting summing over duplicates). Suppose player $1, . . ., k$ have previously proven $B_{1}, . . ., B_{k}$ and $L (A)$ is none other than the set of all $B_{i}$ from $1$ to $k$ .

We would like to fix an $ϵ$ (could be fairly big, like $\frac{1}{5}$ ) and say that the royalty-aware payout given eps... (read more)

2gwern2mo

Sounds somewhat like a bucket brigade market economy.

Quinn's Shortform

Quinn2mo50

I want a name for the following principle:

the world-spec gap hurts you more than the spec-component gap

I wrote it out much like this a couple years ago and Zac recently said the same thing.

I'd love to be able to just say "the <one to three syllables> principle", yaknow?

1Archimedes2mo

How about the "World-Map [Spec] Gap" with [Spec] optional?

davekasten's Shortform

Quinn2mo20

I'm working on making sure we get high quality critical systems software out of early AGI. Hardened infrastructure buys us a lot in the slightly crazy story of "self-exfiltrated model attacks the power grid", but buys us even more in less crazy stories about all the software modules adjacent to AGI having vulnerabilities rapidly patched at crunchtime.

Quinn's Shortform

Quinn2mo7135

`<standup_comedian>` What's the deal with evals `</standup_comedian>`

epistemic status: tell me I'm wrong.

Funders seem particularly enchanted with evals, which seems to be defined as "benchmark but probably for scaffolded systems and scoring that is harder than scoring most of what we call benchmarks".

I can conjure a theory of change. It's like, 1. if measurement is bad then we're working with vibes, so we'd like to make measurement good. 2. if measurement is good then we can demonstrate to audiences (especially policymakers) that warning shots are... (read more)

5ozziegooen2mo

(potential relevant meme)

7Lech Mazur2mo

This might blur the distinction between some evals. While it's true that most evals are just about capabilities, some could be positive for improving LLM safety. I've created 8 (soon to be 9) LLM evals (I'm not funded by anyone, it's mostly out of my own curiosity, not for capability or safety or paper publishing reasons). Using them as examples, improving models to score well on some of them is likely detrimental to AI safety: https://github.com/lechmazur/step_game - to score better, LLMs must learn to deceive others and hold hidden intentions https://github.com/lechmazur/deception/ - the disinformation effectiveness part of the benchmark Some are likely somewhat negative because scoring better would enhance capabilities: https://github.com/lechmazur/nyt-connections/ https://github.com/lechmazur/generalization Others focus on capabilities that are probably not dangerous: https://github.com/lechmazur/writing - creative writing https://github.com/lechmazur/divergent - divergent thinking in writing However, improving LLMs to score high on certain evals could be beneficial: https://github.com/lechmazur/goods - teaching LLMs not to overvalue selfishness https://github.com/lechmazur/deception/?tab=readme-ov-file#-disinformation-resistance-leaderboard - the disinformation resistance part of the benchmark https://github.com/lechmazur/confabulations/ - reducing the tendency of LLMs to fabricate information (hallucinate) I think it's possible to do better than these by intentionally designing evals aimed at creating defensive AIs. It might be better to keep them private and independent. Given the rapid growth of AI capabilities, the lack of apparent concern for an international treaty (as seen in the recent Paris AI summit), and the competitive race dynamics among companies and nations, specifically developing an AI to protect us from threats from other AIs or AIs + humans might be the best we can hope for.

MichaelDickens2mo1111

If I'm allowed to psychoanalyze funders rather than discussing anything at the object level, I'd speculate that funders like evals because:

If you funded the creation of an eval, you can point to a concrete thing you did. Compare to funding theoretical technical research, which has a high chance of producing no tangible outputs; or funding policy work, which has a high chance of not resulting in any policy change. (Streetlight Effect.)
AI companies like evals, and funders seem to like doing things AI companies like, for various reasons including (a) the t

... (read more)

Quinn's Shortform

Quinn2mo30

The more I learn about measurement, the less seriously I take it

I'm impressed with models that accomplish tasks in zero or one shot with minimal prompting skill. I'm not sure what galaxy brained scaffolds and galaxy brained prompts demonstrate. There's so much optimization in the measurement space.

I shipped a benchmark recently, but it's secretly a synthetic data play so regardless of how hard people try in order to score on it, we get synthetic data out of it which leads to finetune jobs which leads to domain specific models that can do such tasks hopefully with minimal prompting effort and no scaffolding.

Quinn's Shortform

Quinn2mo4023

$PERSON at $LAB once showed me an internal document saying that there are bad benchmarks - dangerous capability benchmarks - that are used negatively, so unlike positive benchmarks where the model isn't shipped to prod if it performs under a certain amount, these benchmarks could block a model from going to prod that performs over a certain amount. I asked, "you create this benchmark like it's a bad thing, and it's a bad thing at your shop, but how do you know it won't be used in a sign-flipped way at another shop?" and he said "well we just call it EvilBe... (read more)

2

1

Nathan Helm-Burger2mo141

This is exactly why the bio team for WMDP decided to deliberately include distractors involving relatively less harmful stuff. We didn't want to publicly publish a benchmark which gave a laser-focused "how to be super dangerous" score. We aimed for a fuzzier decision boundary. This brought criticism from experts at the labs who said that the benchmark included too much harmless stuff. I still think the trade-off was worthwhile.

In response to critiques of Guaranteed Safe AI

Quinn3mo20

I'm surprised to hear you say that, since you write

Upfront, I want to clarify: I don’t believe or wish to claim that GSAI is a full or general panacea to AI risk.

I kinda think anything which is not a panacea is swiss cheese, that those are the only two options.

In a matter of what sort of portofolio can lay down slices of swiss cheese at what rate and with what uncorrelation. And I think in this way GSAI is antifragile to next year's language models, which is why I can agree mostly with Zac's talk and still work on GSAI (I don't think he talks about my ... (read more)

2Mateusz Bagiński3mo

I understood Nora as saying that GSAI in itself is not a swiss cheese approach. This is different from saying that [the overall portfolio of AI derisking approaches, one of which is GSAI] is not a swiss cheese approach.

In response to critiques of Guaranteed Safe AI

Quinn3mo40

I gave a lightning talk with my particular characterization, and included "swiss cheese" i.e. that gsai sources some layers of swiss cheese without trying to be one magic bullet. But if people agree with this, then really guaranteed-safe ai is a misnomer, cuz guarantee doesn't evoke swiss cheese at all

6Nora_Ammann3mo

What's the case for it being a swiss cheese approach? That doesn't match how I think of it.

Fertility Will Never Recover

Quinn3mo60

For anecdata: id be really jazzed about 3 or 4, 5 might be a little crazy but somewhat open to that or more.

Benito's Shortform Feed

Quinn3mo50

yeah last week was grim for a lot of people with r1's implications for proliferation and the stargate fanfare after inauguration. Had a palpable sensation of it pivoting from midgame to endgame, but I would doubt that sensation is reliable or calibrated.

2Ben Pace3mo

My feelings here aren't at all related to any news or current events. I could've written this any time in the last year or two.

Tips and Code for Empirical Research Workflows

Quinn3mo30

Tmux allows you to set up multiple panes in your terminal that keep running in the background. Therefore, if you disconnect from a remote machine, scripts running in tmux will not be killed. We tend to run experiments across many tmux panes (especially overnight).

Does no one use suffix & disown which sends a command to a background process that doesn't depend on the ssh process, or prefix nohup which does the same thing? You have to make sure any logging that goes to stdout goes to a log file instead (and in this way tmux or screen are better)

Your r... (read more)

1shawnghu2mo

I didn't learn about disown or nohup until recently, because there was no impetus to, because I'd been using tmux. (My workflow also otherwise depended on tmux; when developing locally I liked its method of managing terminal tabs/splits.)

Quinn's Shortform

Quinn3mo152

Feels like a MATS-like Program in india is a big opportunity. When I went to EAG in Singapore a while ago there were so many people underserved by the existing community building and mentorship organizations cuz of visa issues.

3Chris_Leong3mo

Impact Academy was doing this, before they pivoted towards the Global AI Safety Fellowship. It's unclear whether any further fellowships should be in India or a country that is particularly generous with its visas.

Some lessons from the OpenAI-FrontierMath debacle

Quinn3mo53

the story i roughly understand is that this was within Epoch's mandate in the first place because they wanted to forecast on benchmarks but didn't think existing benchmarks were compelling or good enough so had to take matters into their own hands. Is that roughly consensus, or true? Why is frontiermath a safety project? i haven't seen adequate discussion on this.

37vik3mo

They say it was an advanced math benchmark to test the limits of AI, not a safety project. But a number of people who contributed would have been safety-aligned and would not have wanted to if they knew OpenAI will have exclusive access.

Everywhere I Look, I See Kat Woods

Quinn3mo337

Can't relate. Don't particularly care for her content (tho audibly laughed at a couple examples that you hated), but I have no aversion to it. I do have aversion to the way you appealed to datedness as if that matters. I generally can't relate to people who find cringiness in the way you describe significantly problematic, really.

People like authenticity, humility, and irony now, both in the content and in its presentation.

I could literally care less, omg--- but im unusually averse to irony. Authenticity is great, humility is great most of the time, why is irony even in the mix?

Tho I'm weakly with you that engagement farming leaves a bad taste in my mouth.

Davidad's Bold Plan for Alignment: An In-Depth Explanation

Quinn3mo20

Update: new funding call from ARIA calls out the Safeguarded/Gatekeeper stack in a video game directly

Creating (largely) self-contained prototypes/minimal-viable-products of a Safeguarded AI workflow, similar to this example but pushing for incrementally more advanced environments (e.g. Atari games).

The Field of AI Alignment: A Postmortem, and What To Do About It

Quinn4mo20

I tried a little myself too. Hope I didn't misremembering.

1

The Field of AI Alignment: A Postmortem, and What To Do About It

Quinn4mo82

Very anecdotally, I've talked to some extremely smart people who I would guess are very good at making progress on hard problems, but just didn't think too hard about what solutions help.

A few of the dopest people i know, who id love to have on the team, fall roughly into the category of "engaged and little with lesswrong, grok the core pset better than most 'highly involved' people, but are working on something irrelevant and not even trying cuz they think it seems too hard". They have some thoughtful p(doom), but assume they're powerless.

The Field of AI Alignment: A Postmortem, and What To Do About It

Quinn4mo64

Richard ngo tweeted recently that it was a mistake to design the agi safety fundamentals curriculum to be broadly accessible, that if he could do it over again thered be punishing problem sets that alienate most people

8_will_4mo

Any chance you have a link to this tweet? (I just tried control+f'ing through @Richard's tweets over the past 5 months, but couldn't find it.)

The Field of AI Alignment: A Postmortem, and What To Do About It

Quinn4mo60

The upvotes and agree votes on this comment updated my perception of the rough consensus about mats and streetlighting. I previously would have expected less people to evaluate mats that way

The Field of AI Alignment: A Postmortem, and What To Do About It

Quinn4mo73

As someone who, isolated and unfunded, went on months-long excursions into the hard version of the pset multiple times and burned out each time, I felt extremely validated when you verbally told me a fragment of this post around a fire pit at illiad. The incentives section of this post is very grim, but very true. I know naive patches to the funding ecosystem would also be bad (easy for grifters, etc), but I feel very much like I and we were failed by funders. I could've been stronger etc, I could've been in berkeley during my attempts instead of philly, b... (read more)

Davidad's Bold Plan for Alignment: An In-Depth Explanation

Quinn4mo20

(i'm guessing) super mario might refer to a simulation of the Safeguarded AI / Gatekeeper stack in a videogame. It looks like they're skipping videogames and going straight to cyberphysical systems (1, 2).

1

2Quinn3mo

Update: new funding call from ARIA calls out the Safeguarded/Gatekeeper stack in a video game directly

Dress Up For Secular Solstice

Quinn4mo20

Ok. I'll wear a solid black shirt instead of my bright blue shirt.

Quinn's Shortform

Quinn4mo20

talk to friends as a half measure

When it comes to your internal track record, it is often said that finding what you wrote at time t-k beats trying to remember what you thought at t-k. However, the activation energy to keep such a journal is kinda a hurdle (which is why products like https://fatebook.io are so good!).

I find that a nice midpoint between the full and correct internal track record practices (rigorous journaling) and completely winging it (leaving yourself open to mistakes and self delusion) is talking to friends, because I think my memory of... (read more)

Alexander Gietelink Oldenziel's Shortform

Quinn5mo40

I was at an ARIA meeting with a bunch of category theorists working on safeguarded AI and many of them didn't know what the work had to do with AI.

epistemic status: short version of post because I never got around to doing the proper effort post I wanted to make.

Quinn's Shortform

Quinn5mo20

A sketch I'm thinking of: asking people to consume information (a question, in this case) is asking them to do you a favor, so you should do your best to ease this burden, however, also don't be paralyzed so budget some leeway to be less than maximally considerate in this way when you really need to.

Quinn's Shortform

Quinn5mo20

what's the best essay on asking for advice?

Going over etiquette and the social contract, perhaps if it's software specific it talks about minimal reproducers, whatever else the author thinks is involved.

2Quinn5mo

A sketch I'm thinking of: asking people to consume information (a question, in this case) is asking them to do you a favor, so you should do your best to ease this burden, however, also don't be paralyzed so budget some leeway to be less than maximally considerate in this way when you really need to.

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Quinn5mo3011

Rumors are that 2025 lighthaven is jam packed. If this is the case, and you need money, rudimentary economics suggests only the obvious: raise prices. I know many clients are mission aligned, and there's a reasonable ideological reason to run the joint at or below cost, but I think it's aligned with that spirit if profits from the campus fund the website.

I also want to say in print what I said in person a year ago: you can ask me to do chores on campus to save money, it'd be within my hufflepuff budget. There are good reasons to not go totally "by and for ... (read more)

2

What are the good rationality films?

Answer by QuinnNov 20, 2024*40

ThingOfThings said that Story of Louis Pasteur is a very EA movie, but I think it also counts for rationality. Huge fan.

Quinn's Shortform

Quinn5mo20

Guaranteed Safe AI paper club meets again this thursday

Event for the paper club: https://calendar.app.google/2a11YNXUFwzHbT3TA

blurb about the paper in last month's newsletter:

... If you’re wondering why you just read all that, here’s the juice: often in GSAI position papers there’ll be some reference to expectations that capture “harm” or “safety”. Preexpectations and postexpectations with respect to particular pairs of programs could be a great way to cash this out, cuz we could look at programs as interventions and simulate RCTs (labeling one program

... (read more)

Alexander Gietelink Oldenziel's Shortform

Quinn5mo50

my dude, top level post- this does not read like a shortform

Quinn's Shortform

Quinn6mo20

Yoshua Bengio is giving a talk online tomorrow https://lu.ma/4ylbvs75

Science advances one funeral at a time

Quinn6mo20

by virtue of their technical chops, also care about their career capital.

I didn't understand this-- "their technical chops impose opportunity cost as they're able to build very safe successful careers if they toe the line" would make sense, or they care about career capital independent of their technical chops would make sense. But here, the relation between technical chops and caring about career capital doesn't come through clear.

Quinn's Shortform

Quinn6mo80

did anyone draw up an estimate of how much the proportion of code written by LLMs will increase? or even what the proportion is today

Yoav Ravid's Shortform

Quinn7mo40

I was thinking the same thing this morning! My main thought was, "this is a trap. ain't no way I'm pressing a big red button especially not so near to petrov day"

1

Quinn's Shortform

Quinn7mo20

GSAI paper club is tomorrow (gcal ticket), summary (by me) and discussion of this paper

First Lighthaven Sequences Reading Group

Quinn8mo21

Alas, belief is easier than disbelief; we believe instinctively, but disbelief requires a conscious effort.

Yes, but this is one thing that I have felt being mutated as I read the sequences and continued to hang out with you lot (roughly 8 years ago, with some off and on)

Quinn's Shortform

Quinn8mo10

By all means. Happy for that

Provably Safe AI: Worldview and Projects

Quinn8mo10

discussion of the bet in Aug 2024 Progress in GSAI newsletter

Limitations on Formal Verification for AI Safety

Quinn8mo31

Note in August 2024 GSAI newsletter

See Limitations on Formal Verification for AI Safety over on LessWrong. I have a lot of agreements, and my disagreements are more a matter of what deserves emphasis than the fundamentals. Overall, I think the Tegmark/Omohundro paper failed to convey a swisscheesey worldview, and sounded too much like “why not just capture alignment properties in ‘specs’ and prove the software ‘correct’?” (i.e. the vibe I was responding to in my very pithy post). However, I think my main reason I’m not using Dickson’s post as a reason to j... (read more)

Quinn's Shortform

Quinn8mo*270

august 2024 guaranteed safe ai newsletter

in case i forgot last month, here's a link to july

A wager you say

One proof of concept for the GSAI stack would be a well-understood mechanical engineering domain automated to the next level and certified to boot. How about locks? Needs a model of basic physics, terms in some logic for all the parts and how they compose, and some test harnesses that simulate an adversary. Can you design and manufacture a provably unpickable lock?

Zac Hatfield-Dodds (of hypothesis/pytest and Anthropic, was offered and declined au... (read more)

2habryka8mo

Oh, I liked this one. Mind if I copy it into your shortform (or at least like the first few paragraphs so people can get a taste?)

Provably Safe AI: Worldview and Projects

Quinn8mo21

I do think it's important to reach appropriately high safety assurances before developing or deploying future AI systems which would be capable of causing a catastrophe. However, I believe that the path there is to extend and complement current techniques, including empirical and experimental approaches alongside formal verification - whatever actually works in practice.

for what it's worth, I do see GSAI as largely a swiss cheesey worldview. Though I can see how you might read some of the authors involved to be implying otherwise! I should knock out a post on this.

Please do not use AI to write for you

Quinn8mo-10

not just get it to sycophantically agree with you

i struggle with this, and need to attend a prompting bootcamp

Quinn's Shortform

Quinn9mo20

i'm getting back into composing and arranging. send me rat poems to set to music!