LESSWRONG
LW

All of tchauvin's Comments + Replies

Predicting AI Releases Through Side Channels

Nice attempt. This reminds of the Pizza Meter and Gay Bar Index related to Pentagon crisis situations. I found it hard to find reliable information on this when I looked (I can't even find a good link to share), but the mechanism seems plausible.

Dan Braun's Shortform

tchauvin9mo42

In general, the hacking capabilities of state actors and the likely involvement of national security when we get closer to AGI feel like significant blind spots of Lesswrong discourse.

(The Hacker and The State by Ben Buchanan is a great book to learn about the former)

4Bogdan Ionut Cirstea9mo

But states seem quite likely to fall under 6e, no?

Fabien's Shortform

tchauvin1y30

If you are very good at cyber and extremely smart, you can hide vulnerabilities in 10k-lines programs in a way that less smart specialists will have trouble discovering even after days of examination - code generation/analysis is not really defense favored

I think the first part of the sentence is true, but "not defense favored" isn't a clear conclusion to me. I think that backdoors work well in closed-source code, but are really hard in open-source widely used code − just look at the amount of effort that went into the recent xz / liblzma backdoor, and ... (read more)

1quetzal_rainbow1y

The reason why xz backdoor was discovered is increased latency, which is textbook side channel. If attacker had more points in security mindset skill tree, it wouldn't happen.

Navigating AI Risks (NAIR) #1: Slowing Down AI

tchauvin2y32

The link of "this is a linkpost for" is not the correct one

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?

tchauvin2y20

Here are the same two GIFs but with a consistent speed (30ms/frame) and an infinite loop, in case anyone else is interested for e.g. presentations:

CoinRun training (in-distribution) CoinRun test (out-of-distribution)

Are funds (such as the Long-Term Future Fund) willing to give extra money to AI safety researchers to balance for the opportunity cost of taking an "industry" job?

Answer by tchauvinMar 16, 202351

I think you can guess a reasonable answer even as a complete outsider (like me), considering the purpose of these funds, which is to maximize the amount of expected good they cause by the allocation of their money. A few things that must come into consideration:

is it productive to pay very good researchers the bare minimum they need to survive? No:
- it consequently make the path of independent research unattractive to most;
- it produces some level of adverse selection in future applicants, i.e. you'll move toward getting more applications from people who

Answer by tchauvinMar 11, 202330

Google can decide to verify existing accounts. I think it's likely that in its efforts to avoid platforming spam, cybercrime and astroturfing, Google may decide to leave existing "reputable" accounts alone, but will ask for e.g. a phone number or other KYC for existing accounts that were basically never used, as an easy fix to the issue you're describing here.

By Default, GPTs Think In Plain Sight

tchauvin2y101

Another mitigation strategy against the "LLM learns steganography from other LLM's output" problem is to add some noise to the training data (performing a number of transformations that preserve meaning) prior to training, to break the embedded steganography schemes. I feel like this should be relatively easy and effective?

Note that the loss would obviously be worse on the outputs of the previous LLM.

Which makes me think that comparing losses could be a way to detect that some text contains steganography, even if we can't pinpoint how exactly (but our tran... (read more)

[$10k bounty] Read and compile Robin Hanson’s best posts

tchauvin4y*181

EDIT

Thanks for the replies and sorry for the inaccuracies. I initially reported 4,331 blog posts and 890k words; the real results are that Robin wrote 3,302 blog posts (thanks DominikPeters for pointing this out, and for finding these better urls) and 1.5M words.

(4,331 blog posts corresponds to all authors on overcomingbias. 890k words doesn't represent anything, because the posts were truncated when accessed from the monthly archive urls.)

# Get the real number of words from Robin
$ n_current_pages=331
$ echo https://www.overcomingbias.com/author/robin-ha

... (read more)

DominikPeters4y190

The first author archives page that throws a 404 is https://www.overcomingbias.com/author/robin-hanson/page/332, but https://www.overcomingbias.com/author/robin-hanson/page/331 exists. Each page contains 10 posts, except the last one (page 331) which contains two posts. So there are 3302 posts by Hanson.

Yoav Ravid4y110

Doesn't that include posts by other people too? Like Eliezer, for example?

How You Can Gain Self Control Without "Self-Control"

tchauvin4y10

Interesting... Can you tell more about how your self-control training looked like? Like when in the day, how long, how hard, what tasks, etc? Was the most productive period in your life during or after this training? Why did you stop?

To carry on with the strength training comparison, we're usually trying to achieve a maximum deployed strength over our lifetime. Perhaps we're already deploying as much strength as we can every day for useful tasks, so that adding strength training on pointless tasks would remove strength from the other tasks?

2kerspoon4y

It started with practice handstands for 10mins without any real plan other than that, It then built into a similar set of small things. Short duration but required focus. Brushing teeth with my other hand, small bits of CoZE exercises - silly things really. But it have both of us the real feeling that we could get better at anything. It was when I was travelling and I kept up a version for a month or so. Stopped because I was working on more valuable goals when I got home.

Vim

tchauvin4y30

In my opinion, the 4 killer features of vim beyond moving around and basic editing are:

macros
the s command used with regular expressions
the g command (see e.g. Power of g)
the ability to run text through any Unix utility (Vim and the shell)

If you know your Unix utilities (I often use awk inside of vim; it's also great outside of it), your regular expressions, and these features, you can sculpt any text as you wish. What I mean is that you can take the output of pretty much anything, in a messed up format, and transform it into the content and format ... (read more)

Predictions for future dispositions toward Twitter

tchauvin4y70

"Twitter" has a high variance. For some people (probably the vast majority of them), the comparison to smoking is certainly relevant; for a few others, Twitter is very beneficial. Here are a few variables that I think have a huge impact on the overall value a user derives from Twitter:

Who are you? What's your mental state like?
What do you want from Twitter (read thoughtful discussions? participate in them? be aware of relevant news and opportunities? meet like-minded people? influence people? increase your follower count / gain status? escape from bored

... (read more)

2Dagon4y

I currently only use Twitter to the extent that tweets are forwarded or posted somewhere I happen to see them. And I'm fully with timot.cool on this - "twitter" isn't a single thing - it's a combination of how you use it and what bubbles you're in/following. For me (and I suspect many others), it's even less of a thing - it's an adjunct to other communication channels. My prediction is counter to mike_hawke's (maybe). I have no clue if it'll be called "twitter", or if it'll share the same naming or posting mechanisms. But I'd wager that attention-oriented interactive media will continue to be an important part of human lives until humanity changes in fundamental ways. Call it 2 generations (50 years) minimum.

Moloch's Toolbox (2/2)

tchauvin4y10

Meta-problems in general [...] are issues outside the Overton window.

Does anyone have a theory about why this is the case? Thinking out loud:

voting systems: I guess any mention from a politician would be immediately dismissed as agenda-based. And well, probably any mention from anyone. Making changes to political systems also has a Chesterton's fence component: we know how our system behaves, we don't know how this new system would behave, and we know from history that we should be quite happy to have political systems that kind of work and haven't a

... (read more)