LESSWRONG
LW

The Non-Economist
0020
Message
Dialogue
Subscribe

https://thenoneconomist.substack.com/

Former securities industry researcher. Now interested in looking into use of AI in critical infra & safety/security protocols around AI usage. I'm more interested in understanding and predicting behavior, not regulating behavior (AI alignment). But there's an overlap.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
Open Thread Spring 2024
The Non-Economist1y10

Has anyone ever made an aggregator of open source LLMs and image generators with specific security vulnerabilities?

Ie. If it doesn’t have a filter for prompt injection or if it doesn’t have built in filter for dats poisoning, etc…

Looking for something that’s written to help a solution builder using one of these models and what they’d need to consider wrt deployment. .

Reply
Against Almost Every Theory of Impact of Interpretability
The Non-Economist2y10

Generally lots of value-add discussions but there are some gaps I want to fill some gaps on potentially biased PoVs.

  • Starting with Value-Adds:

1) It's great to point out how interpretability (currently doesn't) solve real life problems and types of problems it won't solve.

2)  Covering views on warning against the dangers of interpretability

3) Interpretability most of the times is unnecessary...

  • Filling in the gaps

1) There's a clear difference btw pre-deployment vs post-deployment interpretability. Post-deployment interpretability is dangerous. Pre-deployment interpretability (aka explainability) can be a powerful tool when training a complex model or trying to deploy a system in a complex organizational environment where there's a lot of scrutiny into the model.

Reply
No posts to display.