A Ray

I like pointing out this confusion. Here's a grab-bag of some of the things I use it for, to try to pull them apart:

actors/institutions far away from the compute frontier produce breakthroughs in AI/AGI tech (juxtaposing "only the top 100 labs" vs "a couple hackers in a garage")
once a sufficient AI/AGI capability is reached, that it will be quickly optimized to use much less compute
amount of "optimization pressure" (in terms of research effort) pursuing AI/AGI tech, and the likelihood that they missed low-hanging fruit
how far AI/AGI research/products are away from the highest-value-marginal-use of compute, and how changes making AI/AGI the biggest marginal profit of compute would change things
the legibility of AI/AGI research progress (e.g. in a high-overhang world, small/illegible labs can make lots of progress)
the likelihood of compute-control interventions to change the trajectory of AI/AGI research
the asymmetry between compute-to-build (~=training) and compute-to-run (~=inference) of advanced AI/AGI technology

probably also others im forgetting

A Ray3yQuick Take

Comparing AI Safety-Capabilities Dilemmas to Jervis' Cooperation Under the Security Dilemma

I've been skimming some things about the Security Dilemma (specifically Offense-Defense Theory) while looking for analogies for strategic dilemmas in the AI landscape.

I want to describe a simple comparison here, lightly held (and only lightly studied)

"AI Capabilities" -- roughly, the ability to use AI systems to take (strategically) powerful actions -- as "Offense"
"AI Safety" -- roughly, that AI systems under control and use do not present a catastrophic/existential threat -- as "Defense"

Now re-creating the two values (each) of the two variables of Offense-Defense Theory:

"Capabilities Advantaged" -- Organizations who disproportionately invest in capabilities get strategic advantage
"Safety Advantaged" -- Organizations who disproportionately invest in

A Ray3y

Recommendation: Bug Bounties and Responsible Disclosure for Advanced ML Systems

I largely agree with the above, but commenting with my own version.

What I think companies with AI services should do:

Can be done in under a week:

Have a monitored communication channel for people, primarily security researchers, to responsibly disclose potential issues ("Potential Flaw")
1. Creating an email (ml-disclosures@) which forwards to an appropriate team
2. Submissions are promptly responded to with a positive receipt ("Vendor Receipt")
Have clear guidance (even just a blog post or similar) about what constitutes an issue worth reporting ("Vulnerability")
1. Even just a simple paragraph giving a high level overview could be evolved with time
Have a internal procedure/playbook for triaging and responding to potential issues. Here's some options I think you could have, with heuristics

A Ray3yReview for 2021 Review

Coase's "Nature of the Firm" on Polyamory

Weakly positive on this one overall. I like Coase's theory of the firm, and like making analogies with it to other things. I don't think this application felt like it quite worked to me, and trying to write up why.

One thing is I think feels off is an incomplete understanding of the Coase paper. What I think the article gets correct: Coase looks at the difference between markets (economists preferred efficient mechanism) and firms / corporation, and observes that transaction costs (for people these would be contracts, but in general all transaction costs are included) are avoided in firms. What I think it misses: A primary question explored in the paper is... (read more)

Replying toRationalism before the Sequences

A Ray3yReview for 2021 Review

Rationalism before the Sequences

This post was personally meaningful to me, and I'll try to cover that in my review while still analyzing it in the context of lesswrong articles.

I don't have much to add about the 'history of rationality' or the description of interactions of specific people.

Most of my value from this post wasn't directly from the content, but how the content connected to things outside of rationality and lesswrong. So, basically, i loved the citations.

Lesswrong is very dense in self-links and self-citations, and to a lesser degree does still have a good number of links to other websites.

However it has a dearth of connections to things that aren't blog posts -- books, essays from... (read more)

Replying toCryonics signup guide #1: Overview

A Ray3yReview for 2021 Review

Cryonics signup guide #1: Overview

I read this sequence and then went through the whole thing. Without this sequence I'd probably still be procrastinating / putting it off. I think everything else I could write in review is less important than how directly this impacted me.

Still, a review: (of the whole sequence, not just this post)

First off, it signposts well what it is and who it's for. I really appreciate when posts do that, and this clearly gives the top level focus and whats in/out.

This sequence is "How to do a thing" - a pretty big thing, with a lot of steps and branches, but a single thing with a clear goal.

The post is addressing a real... (read more)

Replying toPolitics is way too meta

A Ray3yReview for 2021 Review

Politics is way too meta

Summary

public discourse of politics is too focused on meta and not enough focused on object level
the downsides are primarily in insufficient exploration of possibility space

Definitions

"politics" is topics related to government, especially candidates for elected positions, and policy proposals
opposite of meta is object level - specific policies, or specific impacts of specific actions, etc
"meta" is focused on intangibles that are an abstraction away from some object-level feature, X, e.g. someones beliefs about X, or incentives around X, or media coverage vibes about X
Currently public discourse of politics is too much meta and not enough object level

Key ideas

self-censorship based on others' predicted models of self-censorship stifles thought
worrying about meta-issues around a policy proposal can

... (read 575 more words →)

Replying toPerforming an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values

A Ray3y

Performing an SVD on a time-series matrix of gradient updates on an MNIST network produces 92.5 singular values

Thoughts, mostly on an alternative set of next experiments:

I find interpolations of effects to be a more intuitive way to study treatment effects, especially if I can modulate the treatment down to zero in a way that smoothly and predictably approaches the null case. It's not exactly clear to me what the "nothing going on case is", but here's some possible experiments to interpolate between it and your treatment case.

alpha interpolation noise: A * noise + (A - 1) * MNIST, where the null case is the all-noise case. Worth probably trying out a bunch of different noise models since mnist doesn't really look at all gaussian.
shuffle noise: Also worth looking at

A Ray4y

Steganography in Chain of Thought Reasoning

I think at this point these feel like empirical questions, which I think would be much more clearly answered by demonstrations or experiments.

Trying to encode an additional penalty on changing non-semantic information is an interesting idea.

However I think you're missing that you don't have the ability to directly compare to a reference LM in cases where you're training to improve on some performance benchmark. During training the model will change its predictions on everything to some degree (both semantic and nonsemantic content).

So your proposed detection will always show some amount of disagreement between the prior and the trained model on weird grammatical patterns as well as conceptual tokens. The question is: "is the difference merely due to the changes to improve performance, or is it also transmitting hidden information"

Replying toSteganography in Chain of Thought Reasoning

A Ray4y

Steganography in Chain of Thought Reasoning

If what you’re saying is “any change to the distribution will change KL” — I think that’s just correct.

This also applies to changes during training where the model is learning to perform better on the objective task.

So we are expecting some amount of KL divergence already.

My claims are:

The cheapest place to hide information (due to KL) are places where the model already has high entropy (ie it is uncertain between many possible outputs)
optimization pressure will try to push this extra information into the cheapest places to hide
the increase in KL won’t be clearly distinguishable from the increase due to increased performance on the task

Steganography in Chain of Thought Reasoning

A Ray

Here I give a possible phenomenon of steganography in chain of thought reasoning, where a system doing multi-stage reasoning with natural language encodes hidden information in its outputs that is not observable by humans, but can be used to boost its performance on some task. I think this could happen as a result of optimization pressure and natural language null space. At the end is a sketch of a research idea to study this phenomenon empirically.

Definitions

The term steganography originally comes from computer security, where it refers to the practice of hiding messages or data in otherwise innocent looking media, such as images, audio, or text. The goal is to avoid detection by... (read 1538 more words →)

Why I Am Skeptical of AI Regulation as an X-Risk Mitigation Strategy

A Ray

Epistemic Signpost: I'm not an expert in policy research, but I think most of the points here are straightforward.

1. Regulations are ineffective at preventing bad behaviors

Here's a question/thought experiment: Can you think of any large, possibly technology related, company who has broken a regulation. What happened as a result? Was the conclusion the total cessation of that behavior?

I'm not bothering to look up examples (at the risk of losing a tiny bit of epistemic points) because these sorts of events are so freaking common. I'm not going into the more heinous cases, where it wasn't even plausible to anybody that the behaviors were defensible (and instead just expected to not get caught)... (read 493 more words →)

My advice on finding your own path

A Ray

tl;dr - 4 steps

Give yourself permission to take the future seriously
Be willing to imagine successful or ambitious outcomes
Give yourself the gift of focused time and space
Write draw sketch diagram or whatever you need to empty your mind

Intro

(Feel free to skip to the method.)

When I was transitioning into technical AI alignment work, I benefited immensely from the mentorship and guidance of experts in the field. They helped me to understand the relevant concepts, to develop my skills, and to find my place in the research community.

In my early years of giving career advice to other people, I tried to replicate this experience for them. I listened to their questions and concerns, and I... (read 811 more words →)

I think there should be a norm about adding the big-bench canary string to any document describing AI evaluations in detail, where you wouldn't want it to be inside that AI's training data.

Maybe in the future we'll have a better tag for "dont train on me", but for now the big bench canary string is the best we have.

This is in addition to things like "maybe don't post it to the public internet" or "maybe don't link to it from public posts" or other ways of ensuring it doesn't end up in training corpora.

I think this is a situation for defense-in-depth.

More Ideas or More Consensus?

I think one aspect you can examine about a scientific field is it's "spread"-ness of ideas and resources.

High energy particle physics is an interesting extrema here -- there's broad agreement in the field about building higher energy accelerators, and this means there can be lots of consensus about supporting a shared collaborative high energy accelerator.

I think a feature of mature scientific fields that "more consensus" can unlock more progress. Perhaps if there had been more consensus, the otherwise ill-fated superconducting super collider would have worked out. (I don't know if other extenuating circumstances would still prevent it.)

I think a feature of less mature scientific fields that "more ideas"... (read more)

AGI will probably be deployed by a Moral Maze

Moral Mazes is my favorite management book ever, because instead of "how to be a good manager" it's about "empirical observations of large-scale organizational dynamics involving management".

I wish someone would write an updated version -- a lot has changed (though a lot has stayed the same) since the research for the book was done in the early 1980s.

My take (and the author's take) is that any company of nontrivial size begins to take on the characteristics of a moral maze. It seems to be a pretty good null hypothesis -- any company saying "we aren't/won't become a moral maze" has a pretty huge evidential... (read more)

Sometimes I get asked by intelligent people I trust in other fields, "what's up with AI x risk?" -- and I think at least part of it unpacks to this: Why don't more people believe in / take seriously AI x-risk?

I think that is actually a pretty reasonable question. I think two follow-ups are worthwhile and I don't know of good citations / don't know if they exist:

a sociological/anthropological/psychological/etc study of what's going on in people who are familiar with the ideas/reasonings of AI x-risk, but decide not to take it seriously / don't believe it. I expect in-depth interviews would be great here.
we should probably just write up as many obvious

... (read more)

Longtermist X-Risk Cases for working in Semiconductor Manufacturing

Two separate pitches for jobs/roles in semiconductor manufacturing for people who are primarily interested in x-risk reduction.

Securing Semiconductor Supply Chains

This is basically the "computer security for x-risk reduction" argument applied to semiconductor manufacturing.

Briefly restating: it seems exceedingly likely that technologies crucial to x-risks are on computers or connected to computers. Improving computer security increases the likelihood that those machines are not stolen or controlled by criminals. In general, this should make things like governance and control strategy more straightforward.

This argument also applies to making sure that there isn't any tampering with the semiconductor supply chain. In particular, we want to make sure that the designs... (read more)

Two Graphs for why Agent Foundations is Important (according to me)

Epistemic Signpost: These are high-level abstract reasons, and I don’t go into precise detail or gears-level models. The lack of rigor is why I’m short form-ing this.

First Graph: Agent Foundations as Aligned P2B Fixpoint

P2B (a recursive acronym for Plan to P2B Better) is a framing of agency as a recursively self-reinforcing process. It resembles an abstracted version of recursive self improvement, which also incorporates recursive empowering and recursive resource gathering. Since it’s an improvement operator we can imagine stepping, I’m going to draw an analogy to gradient descent.

Imagine a highly dimensional agency landscape. In this landscape, agents follow the P2B gradient in... (read 564 more words →)

My Cyberwarfare Concerns: A disorganized and incomplete list

A lot of internet infrastructure (e.g. BGP / routing) basically works because all the big players mostly cooperate. There have been minor incidents and attacks but nothing major so far. It seems likely to be the case that if a major superpower was backed into a corner, it could massively disrupt the internet, which would be bad.
Cyberwar has a lot of weird asymmetries where the largest attack surfaces are private companies (not militaries/governments). This gets weirder when private companies are multinational. (Is an attack on google an attack on ireland? USA? Neither/both?)
It's unclear who is on whose side. The Snowden leaks showed that american intelligence

... (read more)

Copying some brief thoughts on what I think about working on automated theorem proving relating to working on aligned AGI:

I think a pure-mathematical theorem prover is more likely to be beneficial and less likely to be catastrophic than STEM-AI / PASTA
I think it's correspondingly going to be less useful
I'm optimistic that it could be used to upgrade formal software verification and cryptographic algorithm verification
With this, i think you can tell a story about how development in better formal theorem provers can help make information security a "defense wins" world -- where information security and privacy are a globally strong default
There are some scenarios (e.g. ANI surveillance of AGI development) where this makes

... (read more)

100 Year Bunkers

I often hear that building bio-proof bunkers would be good for bio-x-risk, but it seems like not a lot of progress is being made on these.

It's worth mentioning a bunch of things I think probably make it hard for me to think about:

It seems that even if I design and build them, I might not be the right pick for an occupant, and thus wouldn't directly benefit in the event of a bio-catastrophe
In the event of a bio-catastrophe, it's probably the case that you don't want anyone from the outside coming in, so probably you need people already living in it
Living in a bio-bunker in the middle of nowhere seems

... (read more)

How to tradeoff utility and agency?

A Ray

I'm slowly working through a bunch of philosophical criticisms of consequentialism and utilitarianism, kicked off by this book: https://www.routledge.com/Risk-Philosophical-Perspectives/Lewens/p/book/9780415422840 (which I don't think is good enough to actually recommend)

One common thread is complaints about utilitarianism and consequentialism giving incorrect answers in specific cases. One is the topic of this question: When evaluating potential harms, how can we decide between a potential harm that's the result of someone's agency (their own beliefs, decisions, and actions) vs a potential harm from an outside source (e.g. imposed by the state)?

I'm open to gedanken experiments to illustrate this, but for now I'll use something dumb and simple. You can save one of two people; saving them... (read more)

Streaming Science on Twitch

A Ray

Recently I was watching a livestream of a poker professional. I was surprised and interested in how it wasn’t purely-gut calls, and also wasn’t purely technical decisions, but a blend of both (with some other stochasticity thrown in).

I’ve been thinking about how to get more good scientists, especially scientists earlier in their career.

I think it should be possible to stream science — the actual practice of it.

I think this would be disproportionally useful for younger/earlier-career people, who I expect would find twitch streams more appealing, and lack the access to advisement/mentorship that often comes later.

This probably only works first with sciences (like mine — I’m biased here) that can happen mostly/exclusively on... (read 807 more words →)

Young Scientists

A Ray

This should probably be 3 posts instead of one, but for now I’m going to go through three connected but separate ideas.

Also it’s not really been edited. Sorry.

Progress Studies and Young Scientists

I spent a bit of the last year exploring some topics and questions in Progress Studies, and I came away with a few core (to me) ideas.

The first was that, given a long time horizon (50-100 years or more), it seems like most progress is scientific and technical progress. I expect that this is a thing a lot of people disagree with (and many have disagreed with me in person about it). Some of the biggest objections are:

Social progress/population growth/other progress

... (read 1110 more words →)

Spaced Repetition Systems for Intuitions?

A Ray

I often see discussion here of Anki or other Spaced Repetition Systems for learning. To me, these are great for "higher levels" of knowledge -- things more closely associated with System 2 than System 1.

Are there similar advantages to be had (and systems that exist for) "lower levels" of knowledge? I think this would apply to things that are implicit learning, intuitions, as well as System-1 level knowledge.

I want to distinguish this from "regular practice", which I don't think has spaced repetition qualities.

For example, meditation (to me) includes regular practice in equanimity -- I (so far) don't know of an effect where I can exponentially spread out my meditation practice to make

... (read more)

Alex Ray's Shortform

A Ray

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

LESSWRONG
LW

LESSWRONG
LW

Steganography in Chain of Thought Reasoning

Why I Am Skeptical of AI Regulation as an X-Risk Mitigation Strategy

My advice on finding your own path

Young Scientists

A Ray

Steganography in Chain of Thought Reasoning

Why I Am Skeptical of AI Regulation as an X-Risk Mitigation Strategy

My advice on finding your own path

How to tradeoff utility and agency?

Streaming Science on Twitch

Young Scientists

Spaced Repetition Systems for Intuitions?

A Ray

Steganography in Chain of Thought Reasoning

Why I Am Skeptical of AI Regulation as an X-Risk Mitigation Strategy

My advice on finding your own path

Young Scientists

A Ray

Steganography in Chain of Thought Reasoning

Why I Am Skeptical of AI Regulation as an X-Risk Mitigation Strategy

My advice on finding your own path

How to tradeoff utility and agency?

Streaming Science on Twitch

Young Scientists

Spaced Repetition Systems for Intuitions?

Definitions

1. Regulations are ineffective at preventing bad behaviors

tl;dr - 4 steps

Intro