LESSWRONG
LW

All of Quinn's Comments + Replies

@Tyra Burgess and I wrote down a royalty-aware payout function yesterday:

For a type $B$ , let $L (B)$ be the "left closure under implication" or the admissible antecedents. I.e., the set of all the antecedents A in the public ledger such that $A \to B$ . $p : T y p e \to M o n e y$ is the price that a proposition was listed for (admitting summing over duplicates). Suppose player $1, . . ., k$ have previously proven $B_{1}, . . ., B_{k}$ and $L (A)$ is none other than the set of all $B_{i}$ from $1$ to $k$ .

We would like to fix an $ϵ$ (could be fairly big, like $\frac{1}{5}$ ) and say that the royalty-aware payout given eps... (read more)

2gwern9d

Sounds somewhat like a bucket brigade market economy.

Quinn's Shortform

Quinn22d50

I want a name for the following principle:

the world-spec gap hurts you more than the spec-component gap

I wrote it out much like this a couple years ago and Zac recently said the same thing.

I'd love to be able to just say "the <one to three syllables> principle", yaknow?

1Archimedes22d

How about the "World-Map [Spec] Gap" with [Spec] optional?

davekasten's Shortform

Quinn23d20

I'm working on making sure we get high quality critical systems software out of early AGI. Hardened infrastructure buys us a lot in the slightly crazy story of "self-exfiltrated model attacks the power grid", but buys us even more in less crazy stories about all the software modules adjacent to AGI having vulnerabilities rapidly patched at crunchtime.

Quinn's Shortform

Quinn23d7135

`<standup_comedian>` What's the deal with evals `</standup_comedian>`

epistemic status: tell me I'm wrong.

Funders seem particularly enchanted with evals, which seems to be defined as "benchmark but probably for scaffolded systems and scoring that is harder than scoring most of what we call benchmarks".

I can conjure a theory of change. It's like, 1. if measurement is bad then we're working with vibes, so we'd like to make measurement good. 2. if measurement is good then we can demonstrate to audiences (especially policymakers) that warning shots are... (read more)

5ozziegooen22d

(potential relevant meme)

7Lech Mazur23d

This might blur the distinction between some evals. While it's true that most evals are just about capabilities, some could be positive for improving LLM safety. I've created 8 (soon to be 9) LLM evals (I'm not funded by anyone, it's mostly out of my own curiosity, not for capability or safety or paper publishing reasons). Using them as examples, improving models to score well on some of them is likely detrimental to AI safety: https://github.com/lechmazur/step_game - to score better, LLMs must learn to deceive others and hold hidden intentions https://github.com/lechmazur/deception/ - the disinformation effectiveness part of the benchmark Some are likely somewhat negative because scoring better would enhance capabilities: https://github.com/lechmazur/nyt-connections/ https://github.com/lechmazur/generalization Others focus on capabilities that are probably not dangerous: https://github.com/lechmazur/writing - creative writing https://github.com/lechmazur/divergent - divergent thinking in writing However, improving LLMs to score high on certain evals could be beneficial: https://github.com/lechmazur/goods - teaching LLMs not to overvalue selfishness https://github.com/lechmazur/deception/?tab=readme-ov-file#-disinformation-resistance-leaderboard - the disinformation resistance part of the benchmark https://github.com/lechmazur/confabulations/ - reducing the tendency of LLMs to fabricate information (hallucinate) I think it's possible to do better than these by intentionally designing evals aimed at creating defensive AIs. It might be better to keep them private and independent. Given the rapid growth of AI capabilities, the lack of apparent concern for an international treaty (as seen in the recent Paris AI summit), and the competitive race dynamics among companies and nations, specifically developing an AI to protect us from threats from other AIs or AIs + humans might be the best we can hope for.

MichaelDickens23d1111

If I'm allowed to psychoanalyze funders rather than discussing anything at the object level, I'd speculate that funders like evals because:

If you funded the creation of an eval, you can point to a concrete thing you did. Compare to funding theoretical technical research, which has a high chance of producing no tangible outputs; or funding policy work, which has a high chance of not resulting in any policy change. (Streetlight Effect.)
AI companies like evals, and funders seem to like doing things AI companies like, for various reasons including (a) the t

... (read more)

Quinn's Shortform

Quinn23d30

The more I learn about measurement, the less seriously I take it

I'm impressed with models that accomplish tasks in zero or one shot with minimal prompting skill. I'm not sure what galaxy brained scaffolds and galaxy brained prompts demonstrate. There's so much optimization in the measurement space.

I shipped a benchmark recently, but it's secretly a synthetic data play so regardless of how hard people try in order to score on it, we get synthetic data out of it which leads to finetune jobs which leads to domain specific models that can do such tasks hopefully with minimal prompting effort and no scaffolding.

Quinn's Shortform

Quinn23d4023

$PERSON at $LAB once showed me an internal document saying that there are bad benchmarks - dangerous capability benchmarks - that are used negatively, so unlike positive benchmarks where the model isn't shipped to prod if it performs under a certain amount, these benchmarks could block a model from going to prod that performs over a certain amount. I asked, "you create this benchmark like it's a bad thing, and it's a bad thing at your shop, but how do you know it won't be used in a sign-flipped way at another shop?" and he said "well we just call it EvilBe... (read more)

Nathan Helm-Burger22d141

This is exactly why the bio team for WMDP decided to deliberately include distractors involving relatively less harmful stuff. We didn't want to publicly publish a benchmark which gave a laser-focused "how to be super dangerous" score. We aimed for a fuzzier decision boundary. This brought criticism from experts at the labs who said that the benchmark included too much harmless stuff. I still think the trade-off was worthwhile.

In response to critiques of Guaranteed Safe AI

Quinn1mo20

I'm surprised to hear you say that, since you write

Upfront, I want to clarify: I don’t believe or wish to claim that GSAI is a full or general panacea to AI risk.

I kinda think anything which is not a panacea is swiss cheese, that those are the only two options.

In a matter of what sort of portofolio can lay down slices of swiss cheese at what rate and with what uncorrelation. And I think in this way GSAI is antifragile to next year's language models, which is why I can agree mostly with Zac's talk and still work on GSAI (I don't think he talks about my ... (read more)

2Mateusz Bagiński1mo

I understood Nora as saying that GSAI in itself is not a swiss cheese approach. This is different from saying that [the overall portfolio of AI derisking approaches, one of which is GSAI] is not a swiss cheese approach.

In response to critiques of Guaranteed Safe AI

Quinn1mo40

I gave a lightning talk with my particular characterization, and included "swiss cheese" i.e. that gsai sources some layers of swiss cheese without trying to be one magic bullet. But if people agree with this, then really guaranteed-safe ai is a misnomer, cuz guarantee doesn't evoke swiss cheese at all

6Nora_Ammann1mo

What's the case for it being a swiss cheese approach? That doesn't match how I think of it.

Fertility Will Never Recover

Quinn1mo60

For anecdata: id be really jazzed about 3 or 4, 5 might be a little crazy but somewhat open to that or more.

Ladies

Benito's Shortform Feed

Quinn1mo50

yeah last week was grim for a lot of people with r1's implications for proliferation and the stargate fanfare after inauguration. Had a palpable sensation of it pivoting from midgame to endgame, but I would doubt that sensation is reliable or calibrated.

2Ben Pace1mo

My feelings here aren't at all related to any news or current events. I could've written this any time in the last year or two.

Tips and Code for Empirical Research Workflows

Quinn1mo20

Tmux allows you to set up multiple panes in your terminal that keep running in the background. Therefore, if you disconnect from a remote machine, scripts running in tmux will not be killed. We tend to run experiments across many tmux panes (especially overnight).

Does no one use suffix & disown which sends a command to a background process that doesn't depend on the ssh process, or prefix nohup which does the same thing? You have to make sure any logging that goes to stdout goes to a log file instead (and in this way tmux or screen are better)

Your r... (read more)

1shawnghu4d

I didn't learn about disown or nohup until recently, because there was no impetus to, because I'd been using tmux. (My workflow also otherwise depended on tmux; when developing locally I liked its method of managing terminal tabs/splits.)

Quinn's Shortform

Quinn2mo152

Feels like a MATS-like Program in india is a big opportunity. When I went to EAG in Singapore a while ago there were so many people underserved by the existing community building and mentorship organizations cuz of visa issues.

3Chris_Leong2mo

Impact Academy was doing this, before they pivoted towards the Global AI Safety Fellowship. It's unclear whether any further fellowships should be in India or a country that is particularly generous with its visas.

Some lessons from the OpenAI-FrontierMath debacle

Quinn2mo53

the story i roughly understand is that this was within Epoch's mandate in the first place because they wanted to forecast on benchmarks but didn't think existing benchmarks were compelling or good enough so had to take matters into their own hands. Is that roughly consensus, or true? Why is frontiermath a safety project? i haven't seen adequate discussion on this.

37vik2mo

They say it was an advanced math benchmark to test the limits of AI, not a safety project. But a number of people who contributed would have been safety-aligned and would not have wanted to if they knew OpenAI will have exclusive access.

Everywhere I Look, I See Kat Woods

Quinn2mo337

Can't relate. Don't particularly care for her content (tho audibly laughed at a couple examples that you hated), but I have no aversion to it. I do have aversion to the way you appealed to datedness as if that matters. I generally can't relate to people who find cringiness in the way you describe significantly problematic, really.

People like authenticity, humility, and irony now, both in the content and in its presentation.

I could literally care less, omg--- but im unusually averse to irony. Authenticity is great, humility is great most of the time, why is irony even in the mix?

Tho I'm weakly with you that engagement farming leaves a bad taste in my mouth.

Davidad's Bold Plan for Alignment: An In-Depth Explanation

Quinn2mo20

Update: new funding call from ARIA calls out the Safeguarded/Gatekeeper stack in a video game directly

Creating (largely) self-contained prototypes/minimal-viable-products of a Safeguarded AI workflow, similar to this example but pushing for incrementally more advanced environments (e.g. Atari games).

The Field of AI Alignment: A Postmortem, and What To Do About It

Quinn2mo20

I tried a little myself too. Hope I didn't misremembering.

The Field of AI Alignment: A Postmortem, and What To Do About It

Quinn2mo82

Very anecdotally, I've talked to some extremely smart people who I would guess are very good at making progress on hard problems, but just didn't think too hard about what solutions help.

A few of the dopest people i know, who id love to have on the team, fall roughly into the category of "engaged and little with lesswrong, grok the core pset better than most 'highly involved' people, but are working on something irrelevant and not even trying cuz they think it seems too hard". They have some thoughtful p(doom), but assume they're powerless.

The Field of AI Alignment: A Postmortem, and What To Do About It

Quinn2mo64

Richard ngo tweeted recently that it was a mistake to design the agi safety fundamentals curriculum to be broadly accessible, that if he could do it over again thered be punishing problem sets that alienate most people

8_will_2mo

Any chance you have a link to this tweet? (I just tried control+f'ing through @Richard's tweets over the past 5 months, but couldn't find it.)

The Field of AI Alignment: A Postmortem, and What To Do About It

Quinn2mo60

The upvotes and agree votes on this comment updated my perception of the rough consensus about mats and streetlighting. I previously would have expected less people to evaluate mats that way

The Field of AI Alignment: A Postmortem, and What To Do About It

Quinn2mo51

As someone who, isolated and unfunded, went on months-long excursions into the hard version of the pset multiple times and burned out each time, I felt extremely validated when you verbally told me a fragment of this post around a fire pit at illiad. The incentives section of this post is very grim, but very true. I know naive patches to the funding ecosystem would also be bad (easy for grifters, etc), but I feel very much like I and we were failed by funders. I could've been stronger etc, I could've been in berkeley during my attempts instead of philly, b... (read more)

Davidad's Bold Plan for Alignment: An In-Depth Explanation

Quinn2mo20

(i'm guessing) super mario might refer to a simulation of the Safeguarded AI / Gatekeeper stack in a videogame. It looks like they're skipping videogames and going straight to cyberphysical systems (1, 2).

2Quinn2mo

Update: new funding call from ARIA calls out the Safeguarded/Gatekeeper stack in a video game directly

Dress Up For Secular Solstice

Quinn3mo20

Ok. I'll wear a solid black shirt instead of my bright blue shirt.

Quinn's Shortform

Quinn3mo20

talk to friends as a half measure

When it comes to your internal track record, it is often said that finding what you wrote at time t-k beats trying to remember what you thought at t-k. However, the activation energy to keep such a journal is kinda a hurdle (which is why products like https://fatebook.io are so good!).

I find that a nice midpoint between the full and correct internal track record practices (rigorous journaling) and completely winging it (leaving yourself open to mistakes and self delusion) is talking to friends, because I think my memory of... (read more)

Alexander Gietelink Oldenziel's Shortform

Quinn3mo40

I was at an ARIA meeting with a bunch of category theorists working on safeguarded AI and many of them didn't know what the work had to do with AI.

epistemic status: short version of post because I never got around to doing the proper effort post I wanted to make.

Quinn's Shortform

Quinn3mo20

A sketch I'm thinking of: asking people to consume information (a question, in this case) is asking them to do you a favor, so you should do your best to ease this burden, however, also don't be paralyzed so budget some leeway to be less than maximally considerate in this way when you really need to.

Quinn's Shortform

Quinn3mo20

what's the best essay on asking for advice?

Going over etiquette and the social contract, perhaps if it's software specific it talks about minimal reproducers, whatever else the author thinks is involved.

2Quinn3mo

(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser

Quinn3mo3011

Rumors are that 2025 lighthaven is jam packed. If this is the case, and you need money, rudimentary economics suggests only the obvious: raise prices. I know many clients are mission aligned, and there's a reasonable ideological reason to run the joint at or below cost, but I think it's aligned with that spirit if profits from the campus fund the website.

I also want to say in print what I said in person a year ago: you can ask me to do chores on campus to save money, it'd be within my hufflepuff budget. There are good reasons to not go totally "by and for ... (read more)

What are the good rationality films?

Answer by QuinnNov 20, 2024*40

ThingOfThings said that Story of Louis Pasteur is a very EA movie, but I think it also counts for rationality. Huge fan.

Quinn's Shortform

Quinn4mo20

Guaranteed Safe AI paper club meets again this thursday

Event for the paper club: https://calendar.app.google/2a11YNXUFwzHbT3TA

blurb about the paper in last month's newsletter:

... If you’re wondering why you just read all that, here’s the juice: often in GSAI position papers there’ll be some reference to expectations that capture “harm” or “safety”. Preexpectations and postexpectations with respect to particular pairs of programs could be a great way to cash this out, cuz we could look at programs as interventions and simulate RCTs (labeling one program

Quinn4mo40

my dude, top level post- this does not read like a shortform

Quinn's Shortform

Quinn4mo20

Yoshua Bengio is giving a talk online tomorrow https://lu.ma/4ylbvs75

Science advances one funeral at a time

Quinn4mo20

by virtue of their technical chops, also care about their career capital.

I didn't understand this-- "their technical chops impose opportunity cost as they're able to build very safe successful careers if they toe the line" would make sense, or they care about career capital independent of their technical chops would make sense. But here, the relation between technical chops and caring about career capital doesn't come through clear.

Quinn's Shortform

Quinn5mo80

did anyone draw up an estimate of how much the proportion of code written by LLMs will increase? or even what the proportion is today

Yoav Ravid's Shortform

Quinn5mo40

I was thinking the same thing this morning! My main thought was, "this is a trap. ain't no way I'm pressing a big red button especially not so near to petrov day"

Quinn's Shortform

Quinn6mo20

GSAI paper club is tomorrow (gcal ticket), summary (by me) and discussion of this paper

First Lighthaven Sequences Reading Group

Quinn6mo21

Alas, belief is easier than disbelief; we believe instinctively, but disbelief requires a conscious effort.

Yes, but this is one thing that I have felt being mutated as I read the sequences and continued to hang out with you lot (roughly 8 years ago, with some off and on)

Quinn's Shortform

Quinn6mo10

By all means. Happy for that

Provably Safe AI: Worldview and Projects

Quinn6mo10

discussion of the bet in Aug 2024 Progress in GSAI newsletter

Limitations on Formal Verification for AI Safety

Quinn6mo31

Note in August 2024 GSAI newsletter

See Limitations on Formal Verification for AI Safety over on LessWrong. I have a lot of agreements, and my disagreements are more a matter of what deserves emphasis than the fundamentals. Overall, I think the Tegmark/Omohundro paper failed to convey a swisscheesey worldview, and sounded too much like “why not just capture alignment properties in ‘specs’ and prove the software ‘correct’?” (i.e. the vibe I was responding to in my very pithy post). However, I think my main reason I’m not using Dickson’s post as a reason to j... (read more)

Quinn's Shortform

Quinn6mo*270

august 2024 guaranteed safe ai newsletter

in case i forgot last month, here's a link to july

A wager you say

One proof of concept for the GSAI stack would be a well-understood mechanical engineering domain automated to the next level and certified to boot. How about locks? Needs a model of basic physics, terms in some logic for all the parts and how they compose, and some test harnesses that simulate an adversary. Can you design and manufacture a provably unpickable lock?

Zac Hatfield-Dodds (of hypothesis/pytest and Anthropic, was offered and declined au... (read more)

2habryka6mo

Oh, I liked this one. Mind if I copy it into your shortform (or at least like the first few paragraphs so people can get a taste?)

Provably Safe AI: Worldview and Projects

Quinn6mo21

I do think it's important to reach appropriately high safety assurances before developing or deploying future AI systems which would be capable of causing a catastrophe. However, I believe that the path there is to extend and complement current techniques, including empirical and experimental approaches alongside formal verification - whatever actually works in practice.

for what it's worth, I do see GSAI as largely a swiss cheesey worldview. Though I can see how you might read some of the authors involved to be implying otherwise! I should knock out a post on this.

Please do not use AI to write for you

Quinn7mo-10

not just get it to sycophantically agree with you

i struggle with this, and need to attend a prompting bootcamp

Quinn's Shortform

Quinn7mo20

i'm getting back into composing and arranging. send me rat poems to set to music!

Eric Neyman's Shortform

Quinn10mo10

sure -- i agree that's why i said "something adjacent to" because it had enough overlap in properties. I think my comment completely stands with a different word choice, I'm just not sure what word choice would do a better job.

Eric Neyman's Shortform

Quinn10mo10

I eventually decided that human chauvinism approximately works most of the time because good successor criteria are very brittle. I'd prefer to avoid lock-in to my or anyone's values at t=2024, but such a lock-in might be "good enough" if I'm threatened with what I think are the counterfactual alternatives. If I did not think good successor criteria were very brittle, I'd accept something adjacent to E/Acc that focuses on designing minds which prosper more effectively than human minds. (the current comment will not address defining prosperity at different ... (read more)

1mesaoptimizer10mo

e/acc is not a coherent philosophy and treating it as one means you are fighting shadows. Landian accelerationism at least is somewhat coherent. "e/acc" is a bundle of memes that support the self-interest of the people supporting and propagating it, both financially (VC money, dreams of making it big) and socially (the non-Beff e/acc vibe is one of optimism and hope and to do things -- to engage with the object level -- instead of just trying to steer social reality). A more charitable interpretation is that the philosophical roots of "e/acc" are founded upon a frustration with how bad things are, and a desire to improve things by yourself. This is a sentiment I share and empathize with. I find the term "techno-optimism" to be a more accurate description of the latter, and perhaps "Beff Jezos philosophy" a more accurate description of what you have in your mind. And "e/acc" to mainly describe the community and its coordinated movements at steering the world towards outcomes that the people within the community perceive as benefiting them.

Quinn's Shortform

Quinn11mo100

Thinking about a top-level post on FOMO and research taste

Fear of missing out defined as inability to execute on a project cuz there's a cooler project if you pivot
but it also gestures at more of a strict negative, where you think your project sucks before you finish it, so you never execute
was discussing this with a friend: "yeah I mean lesswrong is pretty egregious cuz it sorta promotes this idea of research taste as the ability to tear things down, which can be done armchair"
I've developed strategies to beat this FOMO and gain more depth and detail

... (read more)

Quinn's Shortform

Quinn11mo10

He had become so caught up in building sentences that he had almost forgotten the barbaric days when thinking was like a splash of color landing on a page.

Edward St Aubyn

Quinn's Shortform

Quinn11mo10

a $B$ -valued quantifier is any function $(A \to B) \to B$ , so when $B$ is bool quantifiers are the functions that take predicates as input and return bool as output (same for prop). the standard max and min functions on arrays count as real-valued quantifiers for some index set $A$ .

I thought I had seen $\forall$ as the max of the Prop-valued quantifiers, and exists as the min somewhere, which has a nice mindfeel since forall has this "big" feeling (if you determined for $P : A \to P r o p$ that $\forall P$ (of which $\forall x : A, P x$ is just syntax sugar since the variable name $x$ is irrelevant) by exhaustive ... (read more)

3Gurkenglas7mo

Among monotonic, boolean quantifiers that don't ignore their input, exists is maximal because it returns true as often as possible; forall is minimal because it returns true as rarely as possible.

Quinn's Shortform

Quinn11mo30

I'm excited for language model interpretability to teach us about the difference between compilers and simulations of compilers. In the sense that chatgpt and I can both predict what a compiler of a suitably popular programming language will do on some input, what's going on there---- surely we're not reimplementing the compiler on our substrate, even in the limit of perfect prediction? Will be an opportunity for a programming language theorist in another year or two of interp progress

Announcing Atlas Computing

Quinn11mo10

firefox is giving me a currently broken redirect on https://atlascomputing.org/safeguarded-ai-summary.pdf

5gwern11mo

Browsers usually cache redirects, unfortunately, which means that if you ever screw up a redirect, your browser will keep doing it even after you fix it, and force-refresh doesn't affect it (because it's not the final loaded page that is broken but before that). I've learned this the hard way. You need to flush cache or download with a separate tool like wget which won't have cached the broken redirect.

2Daniel Windham11mo

Sorry about that Quinn. If the link is still not working for you, please let us know, but in the meantime here's a copy: https://drive.google.com/file/d/182M0BC2j7p-BQUI3lvXx1QNNa_VvrvPJ/view?usp=sharing

2miyazono11mo

Ah, you might have caught right when we were updating links to davidad's program and re-uploading. It works for me now on Firefox, but please let me know if it doesn't work for you and we'll debug more.

All of Quinn's Comments + Replies

<standup_comedian> What's the deal with evals </standup_comedian>

The more I learn about measurement, the less seriously I take it

talk to friends as a half measure

what's the best essay on asking for advice?

Guaranteed Safe AI paper club meets again this thursday

Note in August 2024 GSAI newsletter

august 2024 guaranteed safe ai newsletter

A wager you say

`<standup_comedian>` What's the deal with evals `</standup_comedian>`