Meta: I have been burrowed away in other research but came across these notes and thought I would publish them rather than let them languish. If there are other efforts in this direction, I would be glad to be pointed that way so I can abandon this idea and support someone else's instead.
'I get surrounded by small ugh fields that grow into larger, overlapping ugh fields until my navigation becomes constained and eventually impossible' was how I described one such experience
A Sketched Proposal for Interrogatory, Low-Friction Model Cards
My auditor brain was getting annoyed by what I see the current state of model cards as being. If we adopt better norms about these proactively, this seems like low effort to moderately good payoff? I am unsure on this, hence, rough draft below.
Problem
Model cards are uneven: selective disclosure, vague provenance, flattering metrics, generic limitations. Regulation (EU AI Act) and risk frameworks (NIST AI RMF) are pushing toward evidence-backed docs, but most “cards” are still self-reported. If we want to close evals gaps and make safety claims more falsifiable, model card norms are an obvious lever
Design Goal
A modern, interrogatory model card that:
- Minimizes authoring friction
-... (read more)
You'd run into cognitive overhead limits. Manually reviewing other people's conversations can only really happen at 1:1 to 2:1 speeds. Summaries are much more efficient.
Plus, people behave very differently in radically observed environments. Panopticons were designed as part of a punishment system for a reason.
my project DRI starter kit
DRI = Directly Responsible Individual. Often a / the Project Manager, but not always!
By 'graceful', do you mean morally graceful, technically graceful, or both / other?
Thanks for the write-up, I recall a conversation introducing me to all these ideas in Berkeley last year and it's going to be very handy having a resource to point people at (and so I don't misremember details about things like the Yamanaka factors!).
Am I reading the current plan correctly such that the path is something like:
Get funding -> Continue R+D through primate trials -> Create an entity in a science-friendly, non-US state for human trials -> first rounds of Superbabies? That scenario seems like it would require a bunch of medical tourism, which I imagine is probably not off the table for people with the resources and mindset willing to participate in this.
I'm not sure that this mental line of defence would necessarily hold, us humans are easily manipulated by things that we know to be extremely simple agents that are definitely trying to manipulate us all the time: babies, puppies, kittens, etc.
This still holds true a significant amount of the time even if we pre-warn ourselves against the pending manipulation - there is a recurrent meme of, eg, dads in families not ostensibly not wanting a pet, only to relent when presented with one.
This implies your timelines for any large impact from AI would span multiple future generations, is that correct?
This kind of post is awesome and too uncommon.
Helping people through operational bottlenecks or invisible stage-gates - like tailoring your application to suit your org's reputation / scale - is good metis and worth spreading.