LESSWRONG
LW

osmarks

Posts

Sorted by New

3Death with Awesomeness

8mo

Wiki Contributions

Comments

Sorted by

Newest

What happens if you present 500 people with an argument that AI is risky?

osmarks3mo30

There was some work I read about here years ago (https://www.lesswrong.com/posts/Zvu6ZP47dMLHXMiG3/optimized-propaganda-with-bayesian-networks-comment-on) on causal graph models of beliefs. Perhaps you could try something like that.

Death with Awesomeness

osmarks5mo10

I think we also need to teach AI researchers UI and graphics design. Most of the field's software prints boring things to console, or at most has a slow and annoying web dashboard with a few graphs. The machine which kills us all should instead have a cool scifi interface with nice tabulation, colors, rectangles, ominous targeting reticles, and cryptic text in the corners.

Refusal in LLMs is mediated by a single direction

osmarks7mo10

I think the correct solution to models powerful enough to materially help with, say, bioweapon design, is to not train them, or failing that to destroy them as soon as you find they can do that, not to release them publicly with some mitigations and hope nobody works out a clever jailbreak.

cyberpunk raccoons

osmarks2y10

As you say, you probably don't need it, but for output I'm pretty sure electromyography technology is fairly mature.

Is "Recursive Self-Improvement" Relevant in the Deep Learning Paradigm?

osmarks2y42

A misaligned model might not want to do that, though, since it would be difficult for it to ensure that the output of the new training process is aligned to its goals.