All of Quinn's Comments + Replies

Quinn20

there's an analogy between the zurich r/changemyview curse of evals and the metr/epoch curse of evals. You do this dubiously ethical (according to more US-pilled IRBs or according to more paranoid/pure AI safety advocates) measuring/elicitation project because you might think the world deserves to know. But you had to do dubiously ethical experimentation on unconsenting reddizens / help labs improve capabilities in order to get there--- but the catch is, you only come out net positive if the world chooses to act on this information

Quinn20

I don't know what legible/transferable evidence would be. I've audited a lot of courses at a lot of different universities. Anecdote, sorry.

Quinn3-1

One thing I like about this is making the actual difficulty deltas between colleges more felt/legible/concrete (by anyone who takes the exams). What I might do in your system at my IQ level (which is pretty high outside of EA but pretty mediocre inside EA) is knock out a degree at an easy university to get warmed up then study for years for a degree at a hard school[1].

In real life, I can download or audit courses from whatever university I want, but I don't know what the grading curve is, so when 5/6 exercises are too hard I don't know if that's because I... (read more)

1Ariel
For [1], could you point at some evidence, if you have any on hand? My impression from TAing STEM at an Ivy League school is that homework load and the standards for its grading (as with the exams) is very light, compared to what I remember from my previous experience in a foreign state university.    It wasn't at all what I expected, and shaped (among other signals of implied preference by the university) my view that the main services the university offers to its current and former students are networking opportunities and a signal of prestige.
Quinn190

I get pretty intense visceral outrage at overreaches in immigration enforcement, just seems the height of depravity. Ive looked for a lot of different routes to mental coolness over the last decade (since Trump started his speeches), they mostly amount to staying busy and distracted. Just seems like a really cost ineffective kind of activism to get involved in. Bankrolling lawyers for random people isn't really in my action space and if it was i'd have opportunity cost to consider.

3Cole Wyeth
Unfortunately, it seems that my action space doesn’t include options that matter in this current battle. Personally, my reaction to this kind of insanity is to keep climbing my local status/influence/wealth/knowledge gradient, in the hopes that my actions are relevant in the future. But perhaps it’s a reason to prioritize gaining power - this reminds me of https://www.lesswrong.com/posts/ottALpgA9uv4wgkkK/what-are-you-getting-paid-in  
Quinn20

seems like there's more prior literature than I thought https://en.wikipedia.org/wiki/Role-based_access_control

Quinn20

are SOTA configuration languages sufficient for AI proliferation?

My main aim is to work on "hardening the box" i.e. eliminating software bugs so containment schemes don't fail for preventable reasons. But in the famous 4o system card example, the one that looks a little like docker exfiltration, the situation arose from user error, wild guess in compose.yaml or the shell script invoking docker run.

In a linux machine

Here's an example nix file

users.users =
    let
      authorized-key-files = [
        "${keyspath}/id_server_ed25519.pub"
        "${keyspat
... (read more)
2Quinn
seems like there's more prior literature than I thought https://en.wikipedia.org/wiki/Role-based_access_control
Quinn20

Examples of promising risk-targeted applications

This section reeks of the guaranteed safe AI agendas, a lot of agreement. For example, using formal methods to harden any box we try to put the AI in is a kind of defensive acceleration that doesn't work (too expensive) until certain pre-ASI stages of development. I'm working on formal verification agents along these lines right now.

Quinn20

@Tyra Burgess and I wrote down a royalty-aware payout function yesterday:

For a type , let be the "left closure under implication" or the admissible antecedents. I.e., the set of all the antecedents A in the public ledger such that . is the price that a proposition was listed for (admitting summing over duplicates). Suppose player have previously proven and is none other than the set of all from to .

We would like to fix an (could be fairly big, like ) and say that the royalty-aware payout given eps... (read more)

2gwern
Sounds somewhat like a bucket brigade market economy.
Quinn50

I want a name for the following principle:

the world-spec gap hurts you more than the spec-component gap

I wrote it out much like this a couple years ago and Zac recently said the same thing.

I'd love to be able to just say "the <one to three syllables> principle", yaknow?

1Archimedes
How about the "World-Map [Spec] Gap" with [Spec] optional?
Quinn20

I'm working on making sure we get high quality critical systems software out of early AGI. Hardened infrastructure buys us a lot in the slightly crazy story of "self-exfiltrated model attacks the power grid", but buys us even more in less crazy stories about all the software modules adjacent to AGI having vulnerabilities rapidly patched at crunchtime.

Quinn7135

<standup_comedian> What's the deal with evals </standup_comedian>

epistemic status: tell me I'm wrong.

Funders seem particularly enchanted with evals, which seems to be defined as "benchmark but probably for scaffolded systems and scoring that is harder than scoring most of what we call benchmarks".

I can conjure a theory of change. It's like, 1. if measurement is bad then we're working with vibes, so we'd like to make measurement good. 2. if measurement is good then we can demonstrate to audiences (especially policymakers) that warning shots are... (read more)

5ozziegooen
(potential relevant meme)
7Lech Mazur
This might blur the distinction between some evals. While it's true that most evals are just about capabilities, some could be positive for improving LLM safety. I've created 8 (soon to be 9) LLM evals (I'm not funded by anyone, it's mostly out of my own curiosity, not for capability or safety or paper publishing reasons). Using them as examples, improving models to score well on some of them is likely detrimental to AI safety: https://github.com/lechmazur/step_game - to score better, LLMs must learn to deceive others and hold hidden intentions https://github.com/lechmazur/deception/ - the disinformation effectiveness part of the benchmark Some are likely somewhat negative because scoring better would enhance capabilities: https://github.com/lechmazur/nyt-connections/ https://github.com/lechmazur/generalization Others focus on capabilities that are probably not dangerous: https://github.com/lechmazur/writing - creative writing https://github.com/lechmazur/divergent - divergent thinking in writing However, improving LLMs to score high on certain evals could be beneficial: https://github.com/lechmazur/goods - teaching LLMs not to overvalue selfishness https://github.com/lechmazur/deception/?tab=readme-ov-file#-disinformation-resistance-leaderboard - the disinformation resistance part of the benchmark https://github.com/lechmazur/confabulations/ - reducing the tendency of LLMs to fabricate information (hallucinate) I think it's possible to do better than these by intentionally designing evals aimed at creating defensive AIs. It might be better to keep them private and independent. Given the rapid growth of AI capabilities, the lack of apparent concern for an international treaty (as seen in the recent Paris AI summit), and the competitive race dynamics among companies and nations, specifically developing an AI to protect us from threats from other AIs or AIs + humans might be the best we can hope for.

If I'm allowed to psychoanalyze funders rather than discussing anything at the object level, I'd speculate that funders like evals because:

  1. If you funded the creation of an eval, you can point to a concrete thing you did. Compare to funding theoretical technical research, which has a high chance of producing no tangible outputs; or funding policy work, which has a high chance of not resulting in any policy change. (Streetlight Effect.)
  2. AI companies like evals, and funders seem to like doing things AI companies like, for various reasons including (a) the t
... (read more)
Quinn30

The more I learn about measurement, the less seriously I take it

I'm impressed with models that accomplish tasks in zero or one shot with minimal prompting skill. I'm not sure what galaxy brained scaffolds and galaxy brained prompts demonstrate. There's so much optimization in the measurement space.

I shipped a benchmark recently, but it's secretly a synthetic data play so regardless of how hard people try in order to score on it, we get synthetic data out of it which leads to finetune jobs which leads to domain specific models that can do such tasks hopefully with minimal prompting effort and no scaffolding.

Quinn4023

$PERSON at $LAB once showed me an internal document saying that there are bad benchmarks - dangerous capability benchmarks - that are used negatively, so unlike positive benchmarks where the model isn't shipped to prod if it performs under a certain amount, these benchmarks could block a model from going to prod that performs over a certain amount. I asked, "you create this benchmark like it's a bad thing, and it's a bad thing at your shop, but how do you know it won't be used in a sign-flipped way at another shop?" and he said "well we just call it EvilBe... (read more)

This is exactly why the bio team for WMDP decided to deliberately include distractors involving relatively less harmful stuff. We didn't want to publicly publish a benchmark which gave a laser-focused "how to be super dangerous" score. We aimed for a fuzzier decision boundary. This brought criticism from experts at the labs who said that the benchmark included too much harmless stuff. I still think the trade-off was worthwhile.

Quinn20

I'm surprised to hear you say that, since you write

Upfront, I want to clarify: I don’t believe or wish to claim that GSAI is a full or general panacea to AI risk.

I kinda think anything which is not a panacea is swiss cheese, that those are the only two options.

In a matter of what sort of portofolio can lay down slices of swiss cheese at what rate and with what uncorrelation. And I think in this way GSAI is antifragile to next year's language models, which is why I can agree mostly with Zac's talk and still work on GSAI (I don't think he talks about my ... (read more)

2Mateusz Bagiński
I understood Nora as saying that GSAI in itself is not a swiss cheese approach. This is different from saying that [the overall portfolio of AI derisking approaches, one of which is GSAI] is not a swiss cheese approach.
Quinn40

I gave a lightning talk with my particular characterization, and included "swiss cheese" i.e. that gsai sources some layers of swiss cheese without trying to be one magic bullet. But if people agree with this, then really guaranteed-safe ai is a misnomer, cuz guarantee doesn't evoke swiss cheese at all

6Nora_Ammann
What's the case for it being a swiss cheese approach? That doesn't match how I think of it. 
Quinn60

For anecdata: id be really jazzed about 3 or 4, 5 might be a little crazy but somewhat open to that or more.

Ladies

Quinn50

yeah last week was grim for a lot of people with r1's implications for proliferation and the stargate fanfare after inauguration. Had a palpable sensation of it pivoting from midgame to endgame, but I would doubt that sensation is reliable or calibrated.

2Ben Pace
My feelings here aren't at all related to any news or current events. I could've written this any time in the last year or two.
Quinn30

Tmux allows you to set up multiple panes in your terminal that keep running in the background. Therefore, if you disconnect from a remote machine, scripts running in tmux will not be killed. We tend to run experiments across many tmux panes (especially overnight).

Does no one use suffix & disown which sends a command to a background process that doesn't depend on the ssh process, or prefix nohup which does the same thing? You have to make sure any logging that goes to stdout goes to a log file instead (and in this way tmux or screen are better)

Your r... (read more)

1shawnghu
I didn't learn about disown or nohup until recently, because there was no impetus to, because I'd been using tmux. (My workflow also otherwise depended on tmux; when developing locally I liked its method of managing terminal tabs/splits.)
Quinn152

Feels like a MATS-like Program in india is a big opportunity. When I went to EAG in Singapore a while ago there were so many people underserved by the existing community building and mentorship organizations cuz of visa issues.

3Chris_Leong
Impact Academy was doing this, before they pivoted towards the Global AI Safety Fellowship. It's unclear whether any further fellowships should be in India or a country that is particularly generous with its visas.
Quinn53

the story i roughly understand is that this was within Epoch's mandate in the first place because they wanted to forecast on benchmarks but didn't think existing benchmarks were compelling or good enough so had to take matters into their own hands. Is that roughly consensus, or true? Why is frontiermath a safety project? i haven't seen adequate discussion on this.

37vik
They say it was an advanced math benchmark to test the limits of AI, not a safety project. But a number of people who contributed would have been safety-aligned and would not have wanted to if they knew OpenAI will have exclusive access.
Quinn337

Can't relate. Don't particularly care for her content (tho audibly laughed at a couple examples that you hated), but I have no aversion to it. I do have aversion to the way you appealed to datedness as if that matters. I generally can't relate to people who find cringiness in the way you describe significantly problematic, really.

People like authenticity, humility, and irony now, both in the content and in its presentation.

I could literally care less, omg--- but im unusually averse to irony. Authenticity is great, humility is great most of the time, why is irony even in the mix?

Tho I'm weakly with you that engagement farming leaves a bad taste in my mouth.

Quinn20

Update: new funding call from ARIA calls out the Safeguarded/Gatekeeper stack in a video game directly

Creating (largely) self-contained prototypes/minimal-viable-products of a Safeguarded AI workflow, similar to this example but pushing for incrementally more advanced environments (e.g. Atari games).

Quinn20

I tried a little myself too. Hope I didn't misremembering.

Quinn82

Very anecdotally, I've talked to some extremely smart people who I would guess are very good at making progress on hard problems, but just didn't think too hard about what solutions help.

A few of the dopest people i know, who id love to have on the team, fall roughly into the category of "engaged and little with lesswrong, grok the core pset better than most 'highly involved' people, but are working on something irrelevant and not even trying cuz they think it seems too hard". They have some thoughtful p(doom), but assume they're powerless.

Quinn64

Richard ngo tweeted recently that it was a mistake to design the agi safety fundamentals curriculum to be broadly accessible, that if he could do it over again thered be punishing problem sets that alienate most people

8_will_
Any chance you have a link to this tweet? (I just tried control+f'ing through @Richard's tweets over the past 5 months, but couldn't find it.)
Quinn60

The upvotes and agree votes on this comment updated my perception of the rough consensus about mats and streetlighting. I previously would have expected less people to evaluate mats that way

Quinn73

As someone who, isolated and unfunded, went on months-long excursions into the hard version of the pset multiple times and burned out each time, I felt extremely validated when you verbally told me a fragment of this post around a fire pit at illiad. The incentives section of this post is very grim, but very true. I know naive patches to the funding ecosystem would also be bad (easy for grifters, etc), but I feel very much like I and we were failed by funders. I could've been stronger etc, I could've been in berkeley during my attempts instead of philly, b... (read more)

Quinn20

(i'm guessing) super mario might refer to a simulation of the Safeguarded AI / Gatekeeper stack in a videogame. It looks like they're skipping videogames and going straight to cyberphysical systems (1, 2).

2Quinn
Update: new funding call from ARIA calls out the Safeguarded/Gatekeeper stack in a video game directly
Quinn20

Ok. I'll wear a solid black shirt instead of my bright blue shirt.

Quinn20

talk to friends as a half measure

When it comes to your internal track record, it is often said that finding what you wrote at time t-k beats trying to remember what you thought at t-k. However, the activation energy to keep such a journal is kinda a hurdle (which is why products like https://fatebook.io are so good!).

I find that a nice midpoint between the full and correct internal track record practices (rigorous journaling) and completely winging it (leaving yourself open to mistakes and self delusion) is talking to friends, because I think my memory of... (read more)

Quinn40

I was at an ARIA meeting with a bunch of category theorists working on safeguarded AI and many of them didn't know what the work had to do with AI.

epistemic status: short version of post because I never got around to doing the proper effort post I wanted to make.

Quinn20

A sketch I'm thinking of: asking people to consume information (a question, in this case) is asking them to do you a favor, so you should do your best to ease this burden, however, also don't be paralyzed so budget some leeway to be less than maximally considerate in this way when you really need to.

Quinn20

what's the best essay on asking for advice?

Going over etiquette and the social contract, perhaps if it's software specific it talks about minimal reproducers, whatever else the author thinks is involved.

2Quinn
A sketch I'm thinking of: asking people to consume information (a question, in this case) is asking them to do you a favor, so you should do your best to ease this burden, however, also don't be paralyzed so budget some leeway to be less than maximally considerate in this way when you really need to.
Quinn3011

Rumors are that 2025 lighthaven is jam packed. If this is the case, and you need money, rudimentary economics suggests only the obvious: raise prices. I know many clients are mission aligned, and there's a reasonable ideological reason to run the joint at or below cost, but I think it's aligned with that spirit if profits from the campus fund the website.

I also want to say in print what I said in person a year ago: you can ask me to do chores on campus to save money, it'd be within my hufflepuff budget. There are good reasons to not go totally "by and for ... (read more)

Answer by Quinn*40

ThingOfThings said that Story of Louis Pasteur is a very EA movie, but I think it also counts for rationality. Huge fan.

Quinn20

Guaranteed Safe AI paper club meets again this thursday

Event for the paper club: https://calendar.app.google/2a11YNXUFwzHbT3TA

blurb about the paper in last month's newsletter:

... If you’re wondering why you just read all that, here’s the juice: often in GSAI position papers there’ll be some reference to expectations that capture “harm” or “safety”. Preexpectations and postexpectations with respect to particular pairs of programs could be a great way to cash this out, cuz we could look at programs as interventions and simulate RCTs (labeling one program

... (read more)
Quinn50

my dude, top level post- this does not read like a shortform

Quinn20

Yoshua Bengio is giving a talk online tomorrow https://lu.ma/4ylbvs75

Quinn20

by virtue of their technical chops, also care about their career capital.

I didn't understand this-- "their technical chops impose opportunity cost as they're able to build very safe successful careers if they toe the line" would make sense, or they care about career capital independent of their technical chops would make sense. But here, the relation between technical chops and caring about career capital doesn't come through clear.

Quinn80

did anyone draw up an estimate of how much the proportion of code written by LLMs will increase? or even what the proportion is today

Quinn40

I was thinking the same thing this morning! My main thought was, "this is a trap. ain't no way I'm pressing a big red button especially not so near to petrov day"

Quinn20

GSAI paper club is tomorrow (gcal ticket), summary (by me) and discussion of this paper

Quinn21

Alas, belief is easier than disbelief; we believe instinctively, but disbelief requires a conscious effort.

Yes, but this is one thing that I have felt being mutated as I read the sequences and continued to hang out with you lot (roughly 8 years ago, with some off and on)

Quinn10

By all means. Happy for that

Quinn31

Note in August 2024 GSAI newsletter

See Limitations on Formal Verification for AI Safety over on LessWrong. I have a lot of agreements, and my disagreements are more a matter of what deserves emphasis than the fundamentals. Overall, I think the Tegmark/Omohundro paper failed to convey a swisscheesey worldview, and sounded too much like “why not just capture alignment properties in ‘specs’ and prove the software ‘correct’?” (i.e. the vibe I was responding to in my very pithy post). However, I think my main reason I’m not using Dickson’s post as a reason to j... (read more)

Quinn*270

august 2024 guaranteed safe ai newsletter

in case i forgot last month, here's a link to july


A wager you say

One proof of concept for the GSAI stack would be a well-understood mechanical engineering domain automated to the next level and certified to boot. How about locks? Needs a model of basic physics, terms in some logic for all the parts and how they compose, and some test harnesses that simulate an adversary. Can you design and manufacture a provably unpickable lock? 

Zac Hatfield-Dodds (of hypothesis/pytest and Anthropic, was offered and declined au... (read more)

2habryka
Oh, I liked this one. Mind if I copy it into your shortform (or at least like the first few paragraphs so people can get a taste?)
Quinn21

I do think it's important to reach appropriately high safety assurances before developing or deploying future AI systems which would be capable of causing a catastrophe. However, I believe that the path there is to extend and complement current techniques, including empirical and experimental approaches alongside formal verification - whatever actually works in practice.

for what it's worth, I do see GSAI as largely a swiss cheesey worldview. Though I can see how you might read some of the authors involved to be implying otherwise! I should knock out a post on this.

Quinn-10

not just get it to sycophantically agree with you

i struggle with this, and need to attend a prompting bootcamp

Quinn20

i'm getting back into composing and arranging. send me rat poems to set to music!

Load More