Maybe_a - LessWrong

Train for incorrigibility, then reverse it (Shutdown Problem Contest Submission)

Things that I seem to notice about the plan:

Adjusting weights a plan for basic AIs, which can't seek to e.g. be internally consistent, eventually landing wherever the attractors take it.
Say, you manage to give your AI enough quirks for it to go cry in a corner. Now you need to lower your AI nerfing to get more intelligence, leading to brinkmanship dynamics.
In the middle, you have a bunch of AI, trained for maximum of various aspects of incorrigibility, hoping they are incapable of cooperating; or for that any single AI will not act destructively (while trained for incorrigibility).

What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?

Answer by Maybe_aJul 08, 202310

Maybe, in-vivo genetic editing of the brain is possible. Adenoviruses that are a normal delivery mechanism for genetic therapy can pass hemo-encephalic barrier, so seems plausible to an amateur.

(Not obvious that this works in adult organisms, maybe genes activate while fetus grows or during childhood.)

When do "brains beat brawn" in Chess? An experiment

Maybe_a1y00

Odds games against engine are played with contempt equal to matherial difference.

Sorry you didn't know that beforehand.

AutoBound on neural network can achieve OOMs lower training loss

Maybe_a2y10

Obviously fine. I posted here to get better than my single point estimate of what's up with this thing.

Reward Is Not Enough

Maybe_a2y30Review for 2021 Review

The post expands on the intuition of ML field that reinforcement learning doesn't always work and getting it to work is fiddly process.

In the final chapter, a DeepMind paper that argues that 'one weird trick' will work, is demolished.

Secure homes for digital people

Maybe_a2y10Review for 2021 Review

The problem under consideration is very important for some possible futures of humanity.

However, author's eudamonic wishlist is self-admittedly geared for fiction production, and don't seem to be very enforceable.

larger language models may disappoint you [or, an eternally unfinished draft]

Maybe_a2y30Review for 2021 Review

It's a fine overview of modern language models. Idea of scaling all the skills at the same time is highlighted, different from human developmental psychology. Since publishing 500B-PaLM models seemed to have jumps at around 25% of the tasks of BIG-bench.

Inadequacy of measuring average performance on LLM is discussed, where a proportion is good, and rest is outright failure from human PoV. Scale seems to help with rate of success.

How should DeepMind's Chinchilla revise our AI forecasts?

Maybe_a2y10

In 7th footnote, should be 5e9, not 5e6 (doesn't seem to impact reasoning qualitatively).

Humanity as an entity: An alternative to Coherent Extrapolated Volition

Maybe_a3y10

Argument against CEV seems cool, thanks for formulating it. I guess we are leaving some utility on the table with any particular approach.

Part on referring to a model to adjudicate itself seems really off. I have a hard time imagining a thing that has better performance at meta-level than on object-level. Do you have some concrete example?

Could we set a resolution/stopper for the upper bound of the utility function of an AI?

Answer by Maybe_aApr 11, 202260

Thanks for giving it a think.

Turning off is not a solved problem, e.g. https://www.lesswrong.com/posts/wxbMsGgdHEgZ65Zyi/stop-button-towards-a-causal-solution

Finite utility doesn't help, as long as you need to use probability. So you get, 95% chance of 1 unit of utility is worse than 99%, is worse than 99.9%, etc. And then you apply the same trick to probabilities you get a quantilizer. And that doesn't work either https://www.lesswrong.com/posts/ZjDh3BmbDrWJRckEb/quantilizer-optimizer-with-a-bounded-amount-of-output-1

LESSWRONG
LW

Posts

Wiki Contributions

Comments