Wikitag Contributions

Comments

Sorted by

there's an analogy between the zurich r/changemyview curse of evals and the metr/epoch curse of evals. You do this dubiously ethical (according to more US-pilled IRBs or according to more paranoid/pure AI safety advocates) measuring/elicitation project because you might think the world deserves to know. But you had to do dubiously ethical experimentation on unconsenting reddizens / help labs improve capabilities in order to get there--- but the catch is, you only come out net positive if the world chooses to act on this information

I don't know what legible/transferable evidence would be. I've audited a lot of courses at a lot of different universities. Anecdote, sorry.

One thing I like about this is making the actual difficulty deltas between colleges more felt/legible/concrete (by anyone who takes the exams). What I might do in your system at my IQ level (which is pretty high outside of EA but pretty mediocre inside EA) is knock out a degree at an easy university to get warmed up then study for years for a degree at a hard school[1].

In real life, I can download or audit courses from whatever university I want, but I don't know what the grading curve is, so when 5/6 exercises are too hard I don't know if that's because I'm dumb or if 1/6 is B+ level performance. This is a way that the current system underserves a credential-indifferent autodidact. It's really hard to know how difficult a course is supposed to be when you're isolated from the local conditions that make up the grading curve!

Another thing I like about your system is tutoring markets separated from assessment companies. Why is it that we bundle gatekeeping/assessment with preparation? Unbundling might help maintain objective standards, get rid of problems that look like "the professor feels too much affection for the student to fail them".

This is all completely separate for why your proposal is a hard social problem / a complete nonstarter, which is that I don't think the system is "broken" right now. There's an idea you might pick up if you read the smarter leftists, which is that credentialism especially at elite levels preserves privilege and status as a first class use case. This is not completely false today, not least because the further you go back in time in western universities the truer it is.


  1. my prior, 15 years ago, looked like "stanford has a boating scholarship, so obviously selectivity is a wealth/status thing and not reflective of scholarship or rigor", so the fact that I now believe "more selective colleges have harder coursework" means I've seen a lot of evidence. It pains me, believe me, but reality doesn't care :) ↩︎

I get pretty intense visceral outrage at overreaches in immigration enforcement, just seems the height of depravity. Ive looked for a lot of different routes to mental coolness over the last decade (since Trump started his speeches), they mostly amount to staying busy and distracted. Just seems like a really cost ineffective kind of activism to get involved in. Bankrolling lawyers for random people isn't really in my action space and if it was i'd have opportunity cost to consider.

seems like there's more prior literature than I thought https://en.wikipedia.org/wiki/Role-based_access_control

are SOTA configuration languages sufficient for AI proliferation?

My main aim is to work on "hardening the box" i.e. eliminating software bugs so containment schemes don't fail for preventable reasons. But in the famous 4o system card example, the one that looks a little like docker exfiltration, the situation arose from user error, wild guess in compose.yaml or the shell script invoking docker run.

In a linux machine

Here's an example nix file

users.users =
    let
      authorized-key-files = [
        "${keyspath}/id_server_ed25519.pub"
        "${keyspath}/id_qd_ed25519.pub"
      ];
    in
    {
      unpermissioneduser = { 
        isNormalUser = false;
        extraGroups = [ "docker" ];
        description = "AgentID=claude-0x0000";
      };
      coreuser = {
        isNormalUser = true;
        extraGroups = [
          "wheel"
          "networkmanager"
          "docker"
          "video"
        ];
        home = "/home/coreuser";
        description = "Core User (delegator of unpermissioneduser)";
        shell = pkgs.fish;
        openssh.authorizedKeys.keyFiles = authorized-key-files;
      };
      root = {
        openssh.authorizedKeys.keyFiles = authorized-key-files;
        shell = pkgs.fish;
      };
    };

You can see that unpermissioneduser has less abilities than coreuser. So you can imagine I just say that unpermissioneduser is an agent and coreuser is the human delegator.

Nix is simply a fully declarative way to do standard linux permissioning (a feature not in the snippet is allocating chmod/chown information for particular users to particular parts of the filesystem). There's no conceptual leaps from the status quo.

agents and delegation

is linux all that great for when you want to keep track of who's a delegatee and who's a delegator? do we need a more graph flavored version of linux userspace/permissions? I'm talking about once we're reasoning about proliferating agents and their permissions on various machines. Linux groups do not support inheritance, but a user can be a member of many groups. So you could in principle MVP a graph based permissions DSL (perhaps in Nix) on top of the existing Linux user/group ontology, 80% confident, but it could be hairier than making a new ontology. idk.

Examples of promising risk-targeted applications

This section reeks of the guaranteed safe AI agendas, a lot of agreement. For example, using formal methods to harden any box we try to put the AI in is a kind of defensive acceleration that doesn't work (too expensive) until certain pre-ASI stages of development. I'm working on formal verification agents along these lines right now.

@Tyra Burgess and I wrote down a royalty-aware payout function yesterday:

For a type , let be the "left closure under implication" or the admissible antecedents. I.e., the set of all the antecedents A in the public ledger such that . is the price that a proposition was listed for (admitting summing over duplicates). Suppose player have previously proven and is none other than the set of all from to .

We would like to fix an (could be fairly big, like ) and say that the royalty-aware payout given epsilon of upon an introduction of to the database is such that, where , is paid out to each player .

This seems vaguely like it has some desirable properties, like the decay of a royalty with length in implications separating it from the currently outpaying type. You might even be able to reconcile it with cartesian-closedness / currying, where behaves equivalently to under the payout function.

I think to be more theoretically classy, royalties would arise from recursive structure, but it may work well enough without recursion. It'd be fun to advance all the way to coherence and incentive-compatible proofs, but I certainly don't see myself doing that.

I want a name for the following principle:

the world-spec gap hurts you more than the spec-component gap

I wrote it out much like this a couple years ago and Zac recently said the same thing.

I'd love to be able to just say "the <one to three syllables> principle", yaknow?

I'm working on making sure we get high quality critical systems software out of early AGI. Hardened infrastructure buys us a lot in the slightly crazy story of "self-exfiltrated model attacks the power grid", but buys us even more in less crazy stories about all the software modules adjacent to AGI having vulnerabilities rapidly patched at crunchtime.

Load More