Harrison Dorn - LessWrong

9mo30

The part about hyperparameters being 'tuned' to sensory input, and how that interacts hyperplasticity, is very interesting to me. To pull that thread, I think that there's a mechanism of attention that may be important to include, based on my own experience. The theory of monotropism accounts for this, might be worth looking into?

I am autistic and hypersensitive. It has always struck people as odd that I can manage social situations quite easily as long as I calm my nervous system with additional input (body-focused repetitive behavior aka stimming). But it's not that strange - Neurotypical people stim too, chewing a pencil, pacing while deep in thought is common. Many autistic people need to move, pull, push, twirl, hum, sing, just to stay at baseline. Otherwise the world is just too much.

If autism is only hypersensitivity then this doesn't make sense, more input would be worse, right? I think the mechanism might be something like 'balancing' the sensory input and output, but I would love to find a more elegant explanation for why it 'cancels out'. I'm throwing out there that may be a unique adaptation in the hyperparameters, or whatever attention system responsible for processing inputs, to explain why stimming is so effective? I will have to think on it more.

'Heavy work' is another example. It's amazing for children and adults with autism, as a child my parents found I would be abnormally calm after pushing around heavy objects (I got a tire as a present once haha). There is something about deep pressure and vestibular input that is healing, though the effects only last up to a few days. Swinging in a hammock, weighted blankets, scuba diving, weightlifting/calisthenics, rock climbing, hiking, are all activities that autistic people I meet seem to gravitate to independently and use to regulate.

Daniel Kokotajlo's Shortform

Harrison Dorn

9mo52

Stamp collecting or paperclip maximising could be entertaining to watch, I'm actually serious. It's ubiquitious as an example and is just horrifying/absurd enough to grab attention. I would not be surprised if a scaffolded LLM could collect a few stamps with cold emails. If it can only attempt to manipulate a willing twitch chat then I believe that could be slightly more ethical and effective. Some will actually troll and donate money to buy stamps, and it can identify ideal targets who will donate more money and strategies to increase the likelihood that they do, including making the stream more entertaining by creating master scheming plans for stamp-maximising and the benefits thereof and asking the most devoted followers to spread propoganda. It could run polls to pick up new strategies or decide which ones to follow. I'm not sure if the proceeds from such an effort should go to stamps. It would certainly be a better outcome if it went to charity, but that sort of defeats the point. A disturbingly large pile of stamps is undeniable physical evidence. (Before the universe exponentially is tiled with stamp-tronium)

Another thought: letting an "evil" AI cause problems on a simulated parody internet could be interesting. Platforms like websim.ai with on the fly website generation make this possible. A strong narrative component, some humor, and some audience engagement could turn such a stream into a thrilling ARG or performance art piece.

Managing catastrophic misuse without robust AIs

Harrison Dorn

10moΩ130

First post, hello! I can only hope that frontier companies are thinking about this as deeply and not focusing efforts on safety-washing. I am no expert in biology so excuse me if I make some basic mistakes, but I am curious about this topic, and I have some concerns and ideas on the practical implementation of this sort of system. I am imagining I am a biology student, let's say I'm studying something incredibly suspicious, like engineering virus DNA to create vaccines. It would be frustrating to pay for a month of this service, accidentally trigger the bioterror risk, and get temporarily banned. I would instead choose a competing frontier bio-model with lower safety guardrails, and I might even pay more for that service if I know I have less of a chance of being banned.

Thus, I argue that companies developing biology-focused models have the perverse incentive to ignore more risks in order to capture a wider market. Sure, they could limit their product to industry experts, but that leaves room for another company to market to graduates and undergraduates in the field, which is exactly where you find all the least trustworthy people who could plausibly build bioweapons. Why would users ever choose a safer system that occasionally bans them?

Actually, I thought of a case in which they might. Again, I'm thinking back to a hypothetical biology student. I've finished my thesis thanks in part to the biology LLM that has partnered with my university and is working closely with the faculty. My fellow students often trip up the system, and when that happens we sometimes get frustrated, but most of the time, we beam like we've bought a lottery ticket. That's because, as the message on the "you are temporarily banned from the API" screen states, the company is using the university to help outsource its red-teaming and testing, so if you can cause a major bug or jailbreak to be discovered, you could win real money. (Perhaps by successfully provoking the model to exceed some percent risk threshold of behavior, or self-reporting if it doesn't flag it automatically).

More commonly, subscriptions are simply refunded for the month if you helped discover a novel bug, and because the temporary bans are so short, some of the students have taken to trying to come up with creative jailbreaks when they don't need to use it. Trolling and reused methods would not be rewarded, only banned. Because I finished my thesis, me and some friends are going to spend the weekend trying to jailbreak the system and try to get our refunds, and the prize for causing the model to exceed 50% effective bioweapon risk is pretty tempting, nobody's won it yet. There's money and bragging rights at stake!

This is just an example scenario, but I'll just put it out there that biology students seem to be a good candidate for testing these kinds of guardrails, I would assume there are many of them who are creative and would be willing to team up on a hard problem in order to avoid paying a subscription. They would agree to strict scrutiny. And getting banned would only slow down thesis papers, not crucial research in medicine. This could be a fairly cheap way for a potential biology-focused AI company to test its models against the riskiest group of people, while having all the freedom to disable the system entirely for safety, as you've proposed.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments