Caveats: this is all vibes based. The following LEGALLY NOT ADVICE is pretty low-risk and easy, but I have no idea if it will actually work. Some people have already tried this (look up Nasal Microbiome Transplant)
Best guess: your nasal microbiome is in a low diversity disease state similar to C. difficile infections in the gut. Hitting it with antibiotics doesn't work because the staph just resists and rebounds faster than anything else.
What I would do in your situation: use the garlic nasal spray, leave it an hour or so, then (the unpleasant part) put someone else's snot up your nose to recolonize your nasal cavity with a healthy, diverse microbiome.
The important other question is whether you should come off the antibiotics for a bit. This is higher risk than any other part of the advice (since the antibiotics are for maintenance) but might be necessary: if you stay on the antibiotics, you might just kill off all your brand new microbiome.
Thanks for the response! I feel like I understand your position quite a lot better now, and see the place where "pessimization" fits into a mental model a lot better. My version of your synthesis is something like as follows:
"Activists often work in very adversarial domains, in the obvious and non-obvious ways. If they screw up even a little bit, this can make them much, much less effective, causing more harm to their cause than a similarly-sized screwup would in most non-adversarial domains. This process is important enough to need a special name, even though individual cases of it might be quite different. Once we've named it, we can see if there are any robust solutions to all or most of those cases."
Based on this, I currently think of the concept of pessimization as making a prediction about the world: virtue ethics (or something like it) is a good solution to most or all of these problems, which means the problems themselves shared something in common, which is worthy of a label.
It's also worth noting that a major research goal of mine is to pin down mechanisms of pessimization more formally and precisely, and if I fail then that should count as a significant strike against the concept.
This is absolutely intriguing do you have anything more written about this publicly?
At the risk of being too much of an "Everything Is Connected" guy, I think there's a connection between the following (italicized items are things I've thought about or worked on recently)
It doesn't quite fit, which is a little annoying. But the level one vs level two security mindset thing comes to mind when I think about deontology vs virtue ethics.
Deontology seeks to find specific rules to constrain humans away from particular failure modes "Don't overthrow your democratically elected leader in a bloody revolution even if the glorious leader really would be a good god-emperor" and the like.
Perhaps a good version of virtue ethics would work like the true security mindset, although I don't know whether the resulting version of virtue ethics would look much like what the Athenians were talking about.
Just don't do this. This isn't the kind of plan which works in real life.
Appealing to outgroup fear just gets you a bunch of paranoid groups who never talk to one another.
Truth-telling is relatively robust because you automatically end up on the same side as other truth-tellers and you can all automatically agree on messaging (roughly).
The only exception is leveraging fear of death, which is reasonable in smaller doses IMO when talking about AI, since dying is actually on the table.
I'm optimistic that the same forces that remind the collective to focus on accomplishing their instrumental goals instead of degenerating into unproductive navel-gazing will also be strong enough to remind them of their deontological commitments.
OK I actually think this might be the real disagreement, as opposed to my other comment. I think that generalizing across capabilities is much more likely than generalizing across alignment, or at least that the first thing which generalizes across strong capabilities will not generalize alignment "correctly".
This is like a super high-level argument, but I think there are multiple ways of generalizing human values and no correct/canonical one (as in my other comment) nor are there any natural ways for an AI to be corrected without direct intervention from us. Whereas if an AI makes a factually wrong inference, it can correct itself.
I don't think there's a canonical way to extrapolate human values out from now until infinity, I think it depends on the internals of the human-acting things (their internal structure and the inductive biases that come with it).
For example, I'm pretty confident that the kind of computation which humans are pointing at when we say "consciousness" does not occur in LLMs. I think that the computation humans are pointing to will definitely occur in EMs.
I think that if you are based on that computation, you have a good chance of generalizing your value system from [what humans care about now] to care about all the things with a similar type of computation. I think that if you are not based on that computation, you won't do that.
Since I am based on that computation, I generalize my values to things with that computation. Since LLMs aren't, I might expect their value system to generalize from [what humans care about now] to a totally different class of things. They might not care at all about the types of computation I care about.
I want the collective to do the me-generalization, which I expect EMs to do, since they are the same kind of thing as me.
I don't expect deontology to work here, since that relies on the collective generalizing successfully to deontology, and also to respect deontological commitments made to humans. Humans do not, in full generality, respect all deontological commitments we've made in all cases. Most deontological rules (don't lie) are only applied to other humans, and not to e.g. our pets, or bedbugs, or random rocks, and lots of other rules can be overridden by a rule we place higher on the ordinal scale.
There's no reason to expect an LLM collective to come up with the same ordinal scale of rules, or even to remain anchored to deontology, while I expect human EMs likely would stick to a moral system I'd at least roughly endorse (again, because they're basically me).
Also we have to think about inner misalignment. There's still no real solution to the problem of creating an LLM which implements the strategy "Be nice when I'm running at 1x, take over when I'm running at 1000x in a massive collective.
When it comes to counting arguments, I'm generally very sympathetic to the Yudkowsky argument that the vast majority of possible utility functions produce no value by human standards. If this is a crux, that's unfortunate, since most of the arguments on both sides seem to be very high-level intuitive ones, and not very testable!
In this story, the transition from Before to After is the transition from using one AI instance at human speed to using billions at 100x speed. I agree it’s not obvious that good behavior generalizes from one instance to an AI Collective of billions, but I don’t see why it would be overwhelmingly likely to fail.
Yep, I'd say this is the core difficulty. I think it will go horrendously.
For an intuition, look at any of Janus's infinite backrooms stuff, or any of the stuff where they get LLMs to talk with each other for ages. Very quickly they get pushed away from anything remotely resembling their training distribution, and become batshit insane. Today, that means they mostly talk about spirals and candles and the void. If you condition on them reaching super intelligence that way, I predict you get something which looks about as much like utopia (or eutopia, if you rather) as the infinite backrooms look like human conversation.
It's not a quote no, but it's the overall picture they gave (I have removed quotation marks now) They made it pretty clear that a few large nations cooperating just on AGI non-creation is enough.
Judging by your writing I think you missed a new paper red-teaming DNA synthesis screening software. They're using AI to create proteins that function the same but with different amino acids (probably conformation-based) which bypasses the DNA screening, because screening (probably) isn't translating and throwing it into Alphafold and comparing it to known toxins. Though the paper didn't test whether the AI-generated toxins are actually toxic, but we can assign that a reasonable probability.
I hadn't! Thanks for pointing this out! Looks like someone is actually on top of this after all.
That being said...translating DNA to protein with standard codon table is just one encoding scheme. And we can recode organisms to use a different encoding scheme. And no DNA screening would be able to catch it, since they have no knowledge of the nonstandard codon table you're using
Who do you mean by "we" in this case? I had the vague vibes that the "we" in question is "Jason Chin's lab and a few other similarly high-level groups elsewhere", but that it was pretty far beyond the capabilities of the marginal bioterrorist, and anyone who's successfully fucking around with nonstandard codon sequences can definitely already make a bioweapon. Is this incorrect under your model?
Re 1: LLMs have spiky capability levels which don't line up with "human level" nicely. So I'm not convinced this is the case. For example, they can do extremely good author attribution, come up with seed prompts to induce "personas" in other instances of themself, etc.
Re 2: Maybe? I'm skeptical here. If the solution space contains elements like "Call an external API" then I expect there to be a Large (i.e. un-enumeratable) number of solutions.
At an intersection of the two: would we have predicted in advance something like "GPT 4o, which is still dumber than most humans, has a persuasive/charismatic persona which people become highly attached to, to the point where OpenAI could not shut it off without significant backlash."
An internally-deployed AI working on code with employees will be interacting with employees in an intellectual capacity (collaborating on code and intellectual problems) which means it will have opportunities to gain favour/influence with them the same way 4o did with a bunch of casual readers (I don't know if this was part of your threat model a year ago, FWIW).
This is very close to some ideas I've been trying and failing to write up. In "On Green" Joe Carlsmith writes "Green is what told the rationalists to be more OK with death, and the EAs to be more OK with wild animal suffering." but wait hang on actually being OK with death is the only way to stay sane, and while it's not quite the same, the immediate must-reduce-suffering-footprint drive that EAs have might have ended up giving some college students some serious dietary deficiencies.