Since a few people have mentioned the Miller/Rootclaim debate:
My hourly rate is $200. I will accept a donation of $5000 to sit down and watch the entire Miller/Rootclaim debate (17 hours of video content plus various supporting materials) and write a 2000 word piece describing how I updated on it and why.
Anyone can feel free to message me if they want to go ahead and fund this.
Whilst the LEDs are not around the corner, I think the Kr-Cl excimer lamps might already be good enough.
When we wrote the original post on this, it was not clear how quickly covid was spreading through the air, but I think it is now clear that covid can hang around for a long time (on the order of minutes or hours rather than seconds) and still infect people.
It seems that a power density of 0.25W/m^2 would probably be enough to sterilize air in 1-2 minutes, meaning that a 5m x 8m room would need a 10W source. Assuming 2% efficiency that 10W source needs 500W electrical, which is certainly possible and in the days of incandescent lights you would have had a few 100W bulbs anyway.
EDIT: Having looked into this a bit more, it seems that right now the low efficiency of excimer lamps is not a binding constraint because the legally allowed far-UVC exposure is so low.
"TLV exposure limit for 222 nm (23 mJ cm^−2)"
23 mJ per cm^2 per day is just 0.002 W/m^2 , so you really don't need much power until you hit legal limitations.
Apparently nobody has done this?
"You're correct—none of the studies cited have used a strict GAN-like architecture with a generator and discriminator trained simultaneously in lockstep to prevent AI deception.
Most existing AI safety research, such as Redwood Research or Anthropic’s work, employs sequential or iterative adversarial methods: typically, a model generates potentially deceptive outputs, these outputs are collected, and separately a discriminator or classifier is trained to catch deception. These processes are usually iterative but asynchronous, rather than the simultaneous, competitive co-training seen in GAN architectures.
Your original concept—training generator (task-performing AI) and discriminator (deception detector) networks simultaneously in a true GAN-style loop—does not appear explicitly explored in AI safety literature so far."
ChatGPT Deep Research produced this:
https://chatgpt.com/share/67d62105-7c6c-8002-8bbb-74982455839b
Preventing deceptive AI misalignment via something like a GAN architecture seems fruitful - you have a generator network that performs some task T, with a score function T().
You then create a dataset of special tasks within the scope of T which have a deceptive answer, and an honest answer which scores lower according to T(). You split this deceptive alignment dataset into a train set and a test set.
Then you train both the generator network and a separate discriminator network - the discriminator is trained to spot deception using the training set and generator is rewarded for maximizing the score T() but penalized when it is caught cheating by the discriminator on the test set. Somehow you prevent the discriminator from memorizing the test set - e.g. you stream in a few examples per iteration and you don't reuse them. The discriminator may have access to the weights of the generator, or to its chain of thought.
There are some potential extensions of this line of work, e.g. you can have a holdout set of deceptive alignment examples that neither the generator nor the discriminator have seen, and you can use these to track progress and to see how the process is evolves over time.
Has anyone actually tried this?
You can't guarantee whether it will stop acting that way in the future, which is what is predicted by deceptive alignment.
yes, that's true. But in fact if your AI is merely supposed to imitate a human it will be much easier to prevent deceptive alignment because you can find the minimal model that mimics a human, and that minimality excludes exotic behaviors.
This is essentially why machine learning works at all - you don't pick a random model that fits your training data well, you pick the smallest one.
If one king-person
yes. But this is a very unusual arrangement.
that's true, however I don't think it's necessary that the person is good.
asking why inner alignment is hard
I don't think "inner alignment" is applicable here.
If the clone behaves indistinguishably from the human it is based on, then there is simply nothing more to say. It doesn't matter what is going on inside.
The Contrarian 'AI Alignment' Agenda
Overall Thesis: technical alignment is generally irrelevant to outcomes, and almost everyone in the AI Alignment field is stuck with this incorrect assumption, working on technical alignment of LLM models
(1) aligned superintelligence that is provably logically realizable [already proved]
(2) aligned superintelligence is not just logically but also physically realizable [TBD]
(3) ML interpretability/mechanistic interpretability cannot possibly be logically necessary for aligned superintelligence [TBD]
(4) ML interpretability/mechanistic interpretability cannot possibly be logically sufficient for aligned superintelligence [TBD]
(5) given certain minimal intelligence, minimal emulation ability of humans by AI (e.g. understands common-sense morality and cause and effect) and of AI by humans (humans can do multiplications etc) the internal details of AI models cannot possibly make a difference to the set of realizable good outcomes, though they can make a difference to the ease/efficiency of realizing them [TBD]
(6) given near-perfect or perfect technical alignment (=AI will do what the creators ask of it with correct intent) awful outcomes are Nash Equilibrium for rational agents [TBD]
(7) small or even large alignment deviations make no fundamental difference to outcomes - the boundary between good/bad is determined by game theory, mechanism design and initial conditions, and only by a satisficing condition on alignment fidelity which is below the level of alignment of current humans (and AIs) [TBD]
(8) There is no such thing as superintelligence anyway because intelligence factors into many specific expert systems rather than one all-encompassing general purpose thinker. No human has a job as a “thinker” - we are all quite specialized. Thus, it doesn’t make sense to talk about “aligning superintelligence”, but rather about “aligning civilization” (or some other entity which has the ability to control outcomes) [TBD]