J Bostock

Sequences

Dead Ends
Statistical Mechanics
Independent AI Research
Rationality in Research

Wiki Contributions

Comments

Sorted by

Since I'm actually in that picture (I am the one with the hammer) I feel an urge to respond to this post. The following is not the entire endorsed and edited worldview/theory of change of Pause AI, it's my own views. It may also not be as well thought-out as it could be.

Why do you think "activists have an aura of evil about them?" in the UK where I'm based, we usually see a large march/protest/demonstration every week. Most of the time, the people who agree with the activists are vaguely positive and the people who disagree with the activists are vaguely negative, and stick to discussing the goals. I think if you could convince me that people generally thought we were evil upon hearing about us, just because we were activists (IME most people either see us as naïve, or have specific concerns relating to technical issues, or weirdly think we're in the pocket of Elon Musk -- which we aren't) then I would seriously update my views on our effectiveness.

One of my views is that there are lots of people adjacent to power, or adjacent to influence, who are pretty AI-risk-pilled, but who can't come out and say so without burning social capital. I think we are probably net positive in this regard, because every article about us makes the issue more salient in the public eye.

Adjacently, one common retort I've heard politicians give to lobbyists is "if this is important, where are the protests?" And while this might not be the true rejection, I still think it's worth actually doing the protests in the meantime.

Regarding aesthetics specifically, yes we do attempt to borrow the aesthetics of movements like XR. This is to make it more obvious what we're doing and create more compelling scenes and images.

(Edited because I posted half of the comment by mistake)

This, more than the original paper, or the recent Anthropic paper, is the most convincingly-worrying example of AI scheming/deception I've seen. This will be my new go-to example in most discussions. This comes from first considering a model property which is both deeply and shallowly worrying, then robustly eliciting it, and finally ruling out alternative hypotheses.

J Bostock15-26

I think it's very unlikely that a mirror bacterium would be a threat. <1% chance of a mirror-clone being a meaningfully more serious threat to humans as a pathogen than the base bacterium. The adaptive immune system just isn't chirally dependent. Antibodies are selected as needed from a huge library, and you can get antibodies to loads of unnatural things (PEG, chlorinated benzenes, etc.). They trigger attack mechanisms like MAC which attacks membranes in a similarly independent way.

In fact, mirror amino acids already somewhat common in nature! Bacterial peptidoglycans (which form part of the bacteria's casing) often use a mix of amino acid in order to resist certain enzymes, but bacteria can still be killed. Plants sometimes produce mirrored amino acids to use as signalling molecules or precursors. There are many organisms which can process and use mirrored amino acids in some way.

The most likely scenario by far is that a mirrored bacteria would be outcompeted by other bacteria and killed by achiral defenses due to having a much harder time replicating than a non-mirrored equivalent.

I'm glad they're thinking about this but I don't think it's scary at all.

I think the risk of infection to humans would be very low. The human body can generate antibodies to pretty much anything (including PEG, benzenes, which never appear in nature) by selecting protein sequences from a huge library of cells. This would activate the complement system which targets membranes and kills bacteria in a non-chiral way.

The risk to invertebrates and plants might be more significant, not sure about the specifics of plant immune system.

J Bostock242

So Sonnet 3.6 can almost certainly speed up some quite obscure areas of biotech research. Over the past hour I've got it to:

  1. Estimate a rate, correct itself (although I did have to clock that it's result was likely off by some OOMs, which turned out to be 7-8), request the right info, and then get a more reasonable answer.
  2. Come up with a better approach to a particular thing than I was able to, which I suspect has a meaningfully higher chance of working than what I was going to come up with.

Perhaps more importantly, it required almost no mental effort on my part to do this. Barely more than scrolling twitter or watching youtube videos. Actually solving the problems would have had to wait until tomorrow.

I will update in 3 months as to whether Sonnet's idea actually worked.

(in case anyone was wondering, it's not anything relating to protein design lol: Sonnet came up with a high-level strategy for approaching the problem)

In practice, sadly, developing a true ELM is currently too expensive for us to pursue (but if you want to fund us to do that, lmk). So instead, in our internal research, we focus on finetuning over pretraining. Our goal is to be able to teach a model a set of facts/constraints/instructions and be able to predict how it will generalize from them, and ensure it doesn’t learn unwanted facts (such as learning human psychology from programmer comments, or general hallucinations).

 

This has reminded me to revisit some work I was doing a couple of months ago on unsupervised unlearning. I could almost get Gemma-2-2B to forget who Michael Jordan was without needing to know any facts about him (other than that "Michael Jordan" was the target name)

Shrimp have ultra tiny brains, with less than 0.1% of human neurons.

Humans have 1e11 neurons, what's the source for shrimp neuron count? The closest I can find is lobsters having 1e5 neurons, and crabs having 1e6 (all from Google AI overview) which is a factor of much more than 1,000.

I volunteer to play Minecraft with the LLM agents. I think this might be one eval where the human evaluators are easy to come by.

Ok: I'll operationalize the ratio of first choices the first group (Stop/PauseAI) to projects in the third and fourth groups (mech interp, agent foundations) for the periods 12th-13th vs 15th-16th. I'll discount the final day since the final-day-spike is probably confounding.

It might be the case that AISC was extra late-skewed because the MATS rejection letters went out on the 14th (guess how I know) so I think a lot of people got those and then rushed to finish their AISC applications (guess why I think this) before the 17th. This would predict that the ratio of technical:less-technical applications would increase in the final few days.

Load More