tchauvin

AI safety and cybersecurity

https://tchauvin.com

Posts

Sorted by New

29End-to-end hacking with language models

4Is there an analysis of the common consideration that splitting an AI lab into two (e.g. the founding of Anthropic) speeds up the development of TAI and therefore increases AI x-risk?

10Using spaced repetition to make the most out of blog posts and books

Wikitag Contributions

Comments

Sorted by

Newest

Predicting AI Releases Through Side Channels

tchauvin3mo20

Nice attempt. This reminds of the Pizza Meter and Gay Bar Index related to Pentagon crisis situations. I found it hard to find reliable information on this when I looked (I can't even find a good link to share), but the mechanism seems plausible.

Dan Braun's Shortform

tchauvin6mo42

In general, the hacking capabilities of state actors and the likely involvement of national security when we get closer to AGI feel like significant blind spots of Lesswrong discourse.

(The Hacker and The State by Ben Buchanan is a great book to learn about the former)

Fabien's Shortform

tchauvin1y30

If you are very good at cyber and extremely smart, you can hide vulnerabilities in 10k-lines programs in a way that less smart specialists will have trouble discovering even after days of examination - code generation/analysis is not really defense favored

I think the first part of the sentence is true, but "not defense favored" isn't a clear conclusion to me. I think that backdoors work well in closed-source code, but are really hard in open-source widely used code − just look at the amount of effort that went into the recent xz / liblzma backdoor, and the fact that we don't know of any other backdoor in widely used OSS.

The main effect of a market being underground is not making transactions harder (people find ways to exchange money for vulnerabilities by building trust), but making it much harder to figure out what the market price is and reducing the effectiveness of the overall market

Note this doesn't apply to all types of underground markets: the ones that regularly get shut down (like darknet drug markets) do have a big issue with trust.

Being the target of an autocratic government is an awful experience, and you have to be extremely careful if you put anything they dislike on a computer. And because of the zero-day market, you can't assume your government will suck at hacking you just because it's a small country

This is correct. As a matter of personal policy, I assume that everything I write down somewhere will get leaked at some point (with a few exceptions, like − hopefully − disappearing signal messages).

Navigating AI Risks (NAIR) #1: Slowing Down AI

tchauvin2y32

The link of "this is a linkpost for" is not the correct one

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?

tchauvin2y20

Here are the same two GIFs but with a consistent speed (30ms/frame) and an infinite loop, in case anyone else is interested for e.g. presentations:

CoinRun training (in-distribution) CoinRun test (out-of-distribution)

Are funds (such as the Long-Term Future Fund) willing to give extra money to AI safety researchers to balance for the opportunity cost of taking an "industry" job?

Answer by tchauvinMar 16, 202351

I think you can guess a reasonable answer even as a complete outsider (like me), considering the purpose of these funds, which is to maximize the amount of expected good they cause by the allocation of their money. A few things that must come into consideration:

is it productive to pay very good researchers the bare minimum they need to survive? No:
- it consequently make the path of independent research unattractive to most;
- it produces some level of adverse selection in future applicants, i.e. you'll move toward getting more applications from people who couldn't really be hired in the industry, which isn't always bad but could be thought to correlate with not being an excellent researcher;
- you don't want your researchers to waste too much time and energy on things that could be straightforwardly solved with a bit more money.
on the other hand, paying too much has a few issues:
- the money spent on a researcher hitting decreasing returns after a certain point;
- also some level of adverse selection, if people know they can make the same amount of money here or in the industry, and your fund's approval process is lighter than industry interviews due to a lack of resources and a hits-based giving approach: you might get an influx of candidates who aren't altruistically motivated and were rejected from the industry for perhaps good reasons;
- damaging the reputation of the movement by appearing to allocate your money frivolously.

In terms of existing data, you can have a look at past public payouts, for example the December 2021 LTFF report. In it, you'll find a 6-month salary of $50,000 for a well-known independent researcher, which is above the bare minimum needed to survive, probably allowing for some level of travel and luxury, while staying well below industry standards.

Hoarding Gmail-accounts in a post-CAPTCHA world?

Answer by tchauvinMar 11, 202330

Google can decide to verify existing accounts. I think it's likely that in its efforts to avoid platforming spam, cybercrime and astroturfing, Google may decide to leave existing "reputable" accounts alone, but will ask for e.g. a phone number or other KYC for existing accounts that were basically never used, as an easy fix to the issue you're describing here.

By Default, GPTs Think In Plain Sight

tchauvin2y101

Another mitigation strategy against the "LLM learns steganography from other LLM's output" problem is to add some noise to the training data (performing a number of transformations that preserve meaning) prior to training, to break the embedded steganography schemes. I feel like this should be relatively easy and effective?

Note that the loss would obviously be worse on the outputs of the previous LLM.

Which makes me think that comparing losses could be a way to detect that some text contains steganography, even if we can't pinpoint how exactly (but our transformations break it).

EDIT: in case of disagreement, I can try being more specific about my claim that this should be relatively easy and effective, using concrete examples. Basically I believe that after scrambling, there can only remain a small number of bits of steganographic information per token, even for a steganographic encoding that originally used the null space of natural language optimally.

[$10k bounty] Read and compile Robin Hanson’s best posts

tchauvin3y*181

EDIT

Thanks for the replies and sorry for the inaccuracies. I initially reported 4,331 blog posts and 890k words; the real results are that Robin wrote 3,302 blog posts (thanks DominikPeters for pointing this out, and for finding these better urls) and 1.5M words.

(4,331 blog posts corresponds to all authors on overcomingbias. 890k words doesn't represent anything, because the posts were truncated when accessed from the monthly archive urls.)

# Get the real number of words from Robin
$ n_current_pages=331
$ echo https://www.overcomingbias.com/author/robin-hanson > /tmp/page_urls
$ for i in $(seq 2 $n_current_pages); do echo https://www.overcomingbias.com/author/robin-hanson/page/$i >> /tmp/page_urls; done
$ getwords() { curl $1 | pup '#content' | html2text --ignore-links | wc -w; }
$ export -f getwords
$  parallel getwords < /tmp/page_urls > /tmp/words_by_page
$ awk '{sum += $1} END {print sum}' /tmp/words_by_page
1481344

ORIGINAL

Scoping: 4331 blog posts and 890k words (for overcomingbias only).

# Number of blog posts
$ curl https://www.overcomingbias.com/archives | pup '#monthly-archives' | rg '\(\d+\)' | tr -d ' ()' | awk '{sum += $1} END {print sum}'
4331

# Rough number of words (bash)
$ curl https://www.overcomingbias.com/archives | pup '#monthly-archives a attr{href}' > /tmp/urls_monthly_archives
$ getwords() { curl $1 | pup '#content' | html2text --ignore-links | wc -w; }
$ export -f getwords
$ parallel getwords < /tmp/urls_monthly_archives > /tmp/words_per_month
$ awk '{sum += $1} END {print sum}' /tmp/words_per_month
891666

How You Can Gain Self Control Without "Self-Control"

tchauvin4y10

Interesting... Can you tell more about how your self-control training looked like? Like when in the day, how long, how hard, what tasks, etc? Was the most productive period in your life during or after this training? Why did you stop?

To carry on with the strength training comparison, we're usually trying to achieve a maximum deployed strength over our lifetime. Perhaps we're already deploying as much strength as we can every day for useful tasks, so that adding strength training on pointless tasks would remove strength from the other tasks?