The Non-Economist - LessWrong

https://thenoneconomist.substack.com/

Former securities industry researcher. Now interested in looking into use of AI in critical infra & safety/security protocols around AI usage. I'm more interested in understanding and predicting behavior, not regulating behavior (AI alignment). But there's an overlap.

https://thenoneconomist.substack.com/

Has anyone ever made an aggregator of open source LLMs and image generators with specific security vulnerabilities?

Ie. If it doesn’t have a filter for prompt injection or if it doesn’t have built in filter for dats poisoning, etc…

Looking for something that’s written to help a solution builder using one of these models and what they’d need to consider wrt deployment. .

Generally lots of value-add discussions but there are some gaps I want to fill some gaps on potentially biased PoVs.

Starting with Value-Adds:

1) It's great to point out how interpretability (currently doesn't) solve real life problems and types of problems it won't solve.

2) Covering views on warning against the dangers of interpretability

3) Interpretability most of the times is unnecessary...

Filling in the gaps

1) There's a clear difference btw pre-deployment vs post-deployment interpretability. Post-deployment interpretability is dangerous. Pre-deployment interpretability (aka explainability) can be a powerful tool when training a complex model or trying to deploy a system in a complex organizational environment where there's a lot of scrutiny into the model.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments