LESSWRONG
LW

2129
lc
11547Ω347142210
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
The Territories
Mechanics of Tradecraft
2Shortform
6y
602
Solving adversarial attacks in computer vision as a baby version of general AI alignment
lc10d*20

This is a cool paper. Quoting for visibility:

I spent the last few months trying to tackle the problem of adversarial attacks in computer vision from the ground up. The results of this effort are written up in our new paper Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness (explainer on X/Twitter). Taking inspiration from biology, we reached state-of-the-art or above state-of-the-art robustness at 100x – 1000x less compute, got human-understandable interpretability for free, turned classifiers into generators, and designed transferable adversarial attacks on closed-source (v)LLMs such as GPT-4 or Claude 3. I strongly believe that there is a compelling case for devoting serious attention to solving the problem of adversarial robustness in computer vision, and I try to draw an analogy to the alignment of general AI systems here.

Reply
An Opinionated Guide to Privacy Despite Authoritarianism
lc12d*150

The best privacy/security guide I am aware of is Michael Bazzells book. Michael Bazzell is a former computer crimes investigator, and his methods are red teamed at least in the sense that he works with e.g. people with extremely determined stalkers. Some things that book goes over that this doesn't:

  • How to buy your home/car/P.O. box with LLCs and keep them out of your name, how to get a SIM card not tied to you personally.
  • The who/what/where of how your personal information (incl. address, phone number, etc.) gets collected in the first place, and ends up in public databases (which of course the government also leverages). What you can do about data already there, to the extent that you can do something.
  • (For people in really advanced situations) How to disinform in a way that actually works & ends up poisoning records, for example by taking up an electricity bill inside a building you don't own.

Obviously the government has capabilities that private individuals don't, so maybe the threat model here is different. For the most part though I would say that peoples' biggest privacy/security risk is that there are infinity public databases with all of their personal information, and anybody with a credit card can pull up their address. Stopping the inflow to those should be priority #1 and the solution isn't even really that digital, it's just arcane legal procedures.

Reply
AISLE discovered three new OpenSSL vulnerabilities
lc16d*240

Two of the bugs AISLE highlighted are memory corruption primitives. They could be used in certain situations to crash a program that was running OpenSSL (like a web server), which is a denial of service risk. Because of modern compiler safety techniques, they can't on their own be used to access data or run code, but they're still concerning because it sometimes turns out to be possible to chain primitives like these into more dangerous exploits. 

The third bug is a "timing side-channel bug" with a particular opt-in certificate algorithm that OpenSSL provides, when used on ARM architectures. It's a pretty niche circumstance but it does look legitimate to me. The only way to know if it's exploitable would be to try to build some kind of a PoC.

OpenSSL is a very hardened target, and lots of security researchers look at it. Any security-relevant bugs found on OpenSSL are pretty impressive.

Reply4
AISLE discovered three new OpenSSL vulnerabilities
lc16d*190

I don't know if OpenSSL actually goes through the process of minting CVEs for a lot of the security problems they patch, so this may sound more impressive than it is. My company has reported several similar memory corruption primitives to OpenSSL in the last month found by our scanner and I'm not sure if we ever got any CVEs for it.

Because AI security startups are trying to attract media attention, they have a habit of crediting findings to an AI when they actually involved a bunch of human effort - especially when their tools are not publicly available. You should be healthily skeptical of anything startups report on their own. For a practitioner's perspective on the state of security scanning, there was a blog post posted last month that provided a good independent overview at the time: https://joshua.hu/llm-engineer-review-sast-security-ai-tools-pentesters[1]

  1. ^

    Full disclosure: we've since hired this guy, but we only reached out to him after he posted this blog.

Reply
1a3orn's Shortform
lc19d2941

As a rationalist I also strongly dislike subtweeting

Reply
Cheap Labour Everywhere
lc26d30

One small thing I noticed when living in India is how the escalators would stop moving when people got off them, just to save a little power. 

Reply
Shortform
lc1mo92

As soon as you convincingly argue that there is an underestimation, it goes away

It's not a belief. It's an entire cognitive profile that affects how they relate to and interact with other people, and the wrong beliefs are adaptive. For nice people, treating other people you know as nice-until-proven-evil opens up a much wider spectrum of cooperative interactions. For evil people, genuinely believing the people around you are just as self-interested gives you a bit more cover to be self-interested too.

Reply
Shortform
lc1mo5952

Bad people underestimate how nice some people are and nice people underestimate how bad some people are.

Reply
Daniel Kokotajlo's Shortform
lc1mo*10-4

You left out the best part:

“Nishin,” you say. “Nobody is accepting your romantic overtures because of Twitter. Nobody is granting you power. Nobody is offering you mon(ey)."

Reply
abramdemski's Shortform
lc1mo10473

Even if this rumor isn't true, it is strikingly plausible and worrying

Reply9
Load More
No wikitag contributions to display.
21Beware LLMs' pathological guardrailing
2mo
1
53Female sexual attractiveness seems more egalitarian than people acknowledge
3mo
28
28Is the political right becoming actively, explicitly antisemitic?
Q
4mo
Q
16
356Recent AI model progress feels mostly like bullshit
8mo
85
46Virtue signaling, and the "humans-are-wonderful" bias, as a trust exercise
9mo
16
132My simple AGI investment & insurance strategy
2y
28
58Aligned AI is dual use technology
2y
31
169You can just spontaneously call people you haven't met in years
2y
21
5Does bulemia work?
Q
2y
Q
18
23Should people build productizations of open source AI models?
Q
2y
Q
0
Load More