LESSWRONG
LW

John_Maxwell — LessWrong

Replying toWhere I agree and disagree with Eliezer

Power makes you dumb, stay humble.
Tell everyone in the organization that safety is their responsibility, everyone's views are important.
Try to be accessible and not intimidating, admit that you make mistakes.
Schedule regular chats with underlings so they don't have to take initiative to flag potential problems. (If you think such chats aren't a good use of your time, another idea is to contract someone outside of the organization to do periodic informal safety chats. Chapter 9 is about how organizational outsiders are uniquely well-positioned to spot safety problems. Among other things, it seems workers are sometimes more willing to share concerns frankly with an outsider than they are with their

Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

Fair point. I also haven't done much posting since adding the bounty to my profile. Was thinking it might attract the attention of people reading the archives, but maybe there just aren't many archive readers.

Replying toHow do I use caffeine optimally?

John_MaxwellJun 23, 2022

How do I use caffeine optimally?

There is some observational evidence that coffee drinking increases lifespan. I think the proposed mechanism has to do with promoting autophagy. https://www.acpjournals.org/doi/10.7326/M21-2977 But it looks like decaf works too. (Decaf has a bit of caffeine.)

I think somewhere else I read that unfiltered coffee doesn't improve lifespan, so try to drink the filtered stuff?

In my experience caffeine dependence is not a big deal and might help my sleep cycle.

Replying toSecurity Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

John_Maxwell4y

Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

Eliezer is a good example of someone who built a lot of status on the back of "breaking" others' unworkable alignment strategies. I found the AI Box experiments especially enlightening in my early days.

Fair enough.

My personal feeling is that poking holes in alignment strategies is easier than coming up with good ones, but I'm also aware that thinking that breaking is easy is probably committing some quantity of typical mind fallacy.

Yeah personally building feels more natural to me.

I agree a leaderboard would be great. I think it'd be cool to have a leaderboard for proposals as well -- "this proposal has been unbroken for X days" seems like really valuable information... (read more)

Replying toSecurity Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

John_Maxwell4y

Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

I wrote a comment on your post with feedback.

I don't have anything prepared for red teaming at the moment -- I appreciate the offer though! Can I take advantage of it in the future? (Anyone who wants to give me critical feedback on my drafts should send me a personal message!)

Replying toGetting from an unaligned AGI to an aligned AGI?

John_Maxwell4y

Getting from an unaligned AGI to an aligned AGI?

I skimmed the post, here is some feedback (context):

I'm probably not the best person to red team this since some of my own alignment ideas are along similar lines. I'm also a bit on the optimistic side about alignment more generally -- it might be better to talk to a pessimist.
This sounds a bit like the idea of a "low-bandwidth oracle".
I think the biggest difficulty is the one you explicitly acknowledged -- boxing is hard.
But there are also problems around ensuring that bandwidth is actually limited. If you have a human check to see that the AGI's output conforms to the spec, the AGI could put an infohazard in the

John_Maxwell4y*

Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

Thanks for the reply!

As some background on my thinking here, last I checked there are a lot of people on the periphery of the alignment community who have some proposal or another they're working on, and they've generally found it really difficult to get quality critical feedback. (This is based on an email I remember reading from a community organizer a year or two ago saying "there is a desperate need for critical feedback".)

I'd put myself in this category as well -- I used to write a lot of posts and especially comments here on LW summarizing how I'd go about solving some aspect or another of the alignment problem, hoping... (read 729 more words →)

Replying toSecurity Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

John_Maxwell4y*

Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment

Thanks for writing this! Do you have any thoughts on doing a red team/blue team alignment tournament as described here?

Replying toWhere I agree and disagree with Eliezer

John_Maxwell4y

Where I agree and disagree with Eliezer

Chapter 7 in this book had a few good thoughts on getting critical feedback from subordinates, specifically in the context of avoiding disasters. The book claims that merely encouraging subordinates to give critical feedback is often insufficient, and offers ideas for other things to do.

Replying toBook Review: Talent

John_Maxwell4y

Book Review: Talent

And just as I was writing this I came across another good example of the ‘you think you’re in competition with others like you but mostly you’re simply trying to be good enough’

I'm straight, so possibly unreliable, but I remember Michael Curzi as a very good-looking guy with a deep sexy voice. I believe him when he says other dudes are not competition for him 95% of the time. ;-)

Eliciting Latent Knowledge Via Hypothetical Sensors

John_Maxwell

This is a response to ARC's first technical report: Eliciting Latent Knowledge. But it should be fairly understandable even if you didn't read ARC's report, since I summarize relevant parts of their report as necessary. Here I propose some approaches to the problem ARC outlines which are very different from the approaches they explore.

Idea #1: Detecting failure using a hold-out sensor

The core challenge in ARC's report is to obtain accurate knowledge about the actual presence / absence of a diamond in a protective "SmartVault", as opposed to accurate knowledge about whether a human observer would think the diamond is present if the human observer had access to data from a... (read 1553 more words →)

Lately I've been examining the activities I do to relax and how they might be improved. If you haven't given much thought to this topic, Meaningful Rest is excellent background reading.

An interesting source of info for me has been lsusr's posts on cutting out junk media: 1, 2, 3. Although I find lsusr's posts inspiring, I'm not sure I want to pursue the same approach myself. lsusr says: "The harder a medium is to consume (or create, as applicable) the smarter it makes me." They responded to this by cutting all the easy-to-consume media out of their life.

But when I relax, I don't necessarily want to do something... (read more)

Related to the discussion of weighted voting allegedly facilitating groupthink earlier https://www.lesswrong.com/posts/kxhmiBJs6xBxjEjP7/weighted-voting-delenda-est

An interesting litmus test for groupthink might be: What has LW changed its collective mind about? By that I mean: the topic was discussed on LW, there was a particular position on the issue that was held by the majority of users, new evidence/arguments came in, and now there's a different position which is held by the majority of users. I'm a bit concerned that nothing comes to mind which meets these criteria? I'm not sure it has much to do with weighted voting because I can't think of anything from LW 1.0 either.

In this reaction to Critch's podcast, I wrote about some reasons to think that a singleton would be preferable to a multipolar scenario. Here's another rather exotic argument.

[The dark forest theory] is explained very well near the end of the science fiction novel, The Dark Forest by Liu Cixin.

...

When two [interstellar] civilizations meet, they will want to know if the other is going to be friendly or hostile. One side might act friendly, but the other side won't know if they are just faking it to put them at ease while armies are built in secret. This is called chains of suspicion. You don't know for sure what the other side's

... (read more)

A friend and I went on a long drive recently and listened to this podcast with Andrew Critch on ARCHES. On the way back from our drive we spent some time brainstorming solutions to the problems he outlines. Here are some notes on the podcast + some notes on our brainstorming.

In a possibly inaccurate nutshell, Critch argues that what we think of as the "alignment problem" is most likely going to get solved because there are strong economic incentives to solve it. However, Critch is skeptical of forming a singleton--he says people tend to resist that kind of concentration of power, and it will be hard for an AI... (read 1195 more words →)

Someone wanted to know about the outcome of my hair loss research so I thought I would quickly write up what I'm planning to try for the next year or so. No word on how well it works yet.

Most of the ideas are from this review: https://www.karger.com/Article/FullText/492035

I'm planning to replace oral caffeine consumption with caffeinated shampoo/conditioner: https://www.amazon.com/gp/product/B076FHJ3K5/ (In my experience caffeine is absorbed through the skin just fine)
Gonna start taking this as a source of procyanidin B2 and zinc: https://www.amazon.com/gp/product/B00K7GIMCU/
I understand that the largest single component of marine proteins is glycine, which also seems to be helpful for improving sleep and preventing diabetes per examine.com, so I'm going to start

John_Maxwell

This post was inspired by orthonormal's post Developmental Stages of GPTs and the discussion that followed, so only part of it is original.

First I'll aim to provide a crisper version of the argument for why GPT wants to mesa-optimize. Specifically, I'll explain a well-known optimization algorithm used in text generation, and argue that GPT can improve performance on its objective by learning to implement something like this algorithm internally.

Then I'll offer some ideas of mine about how we might change this.

Explanation of beam search

Our goal is to generate plausible text. We evaluate whether text is "plausible" by multiplying together all the individual word probabilities from our language model.

Greedy word selection... (read 2429 more words →)

Progress Studies: Hair Loss Forums

I still have about 95% of my hair. But I figure it's best to be proactive. So over the past few days I've been reading a lot about how to prevent hair loss.

My goal here is to get a broad overview (i.e. I don't want to put in the time necessary to understand what a 5-alpha-reductase inhibitor actually is, beyond just "an antiandrogenic drug that helps with hair loss"). I want to identify safe, inexpensive treatments that have both research and anecdotal support.

In the hair loss world, the "Big 3" refers to 3 well-known treatments for hair loss: finasteride, minoxidil, and ketoconazole. These treatments... (read 615 more words →)

John_Maxwell's Shortform

John_Maxwell

This is a special post for quick takes (aka "shortform"). Only the owner can create top-level comments.

Are HEPA filters likely to pull COVID-19 out of the air?

John_Maxwell

Could hospitals install HEPA filters in common spaces and run them 24/7 to decrease COVID-19 spread?

Comprehensive COVID-19 Disinfection Protocol for Packages and Envelopes

John_Maxwell

An easy way to limit your exposure to COVID-19 is to quit going out grocery shopping and buy groceries online instead. But there's a problem: This review found that coronaviruses can persist on inanimate surfaces for up to 9 days at room temperature (EDIT: Maybe as many as 17 days, actually. See this.) There's no way to know how many infected people handled your package before it got to you.

This is a protocol my roommate and I came up with for handling packages safely. Although it looks like a lot of steps, most individual steps aren't all that long. I chose to err on the side of including extra

... (read 1352 more words →)

Why don't singularitarians bet on the creation of AGI by buying stocks?

John_Maxwell

With the recent stock market sale, I've been looking over stocks to see which seem to be worth buying. (As background information, I'm buying stocks to have fun and bet my beliefs. If you believe in the efficient market hypothesis, buying stocks at random should perform roughly as well as the market as a whole, and the opportunity cost doesn't seem super high. Making a 40% return buying $ZM right before investors started paying attention to COVID-19 hasn't discouraged me.)

Standard investing advice is to stay diversified. I take the diversification suggestion further than most: A chunk of my net worth is in disaster preparation measures, and I'm also... (read 924 more words →)

When are immunostimulants/immunosuppressants likely to be helpful for COVID-19?

John_Maxwell

My aging parents live in Santa Clara County and have a relatively fatalistic attitude towards COVID-19 exposure. But they seem open to taking supplements. So I'm trying to figure out what to recommend.

This 2017 post from Sarah Constantin on the prevention of respiratory tract infections says, among other things:

Bacterial immunostimulants are safe and effective in preventing respiratory infections, cutting the risk of infection by 40-50%.

However, this article from National Geographic implies that COVID-19 may kill through immune hyper-reactivity:

...what actually happens to your body when it is infected by the coronavirus? The new strain is so genetically similar to SARS that it has inherited the title SARS-CoV-2. So combining early research

... (read 311 more words →)

The Goodhart Game

John_Maxwell

In this paper, we argue that adversarial example defense papers have, to date, mostly considered abstract, toy games that do not relate to any specific security concern. Furthermore, defense papers have not yet precisely described all the abilities and limitations of attackers that would be relevant in practical security.

From the abstract of Motivating the Rules of the Game for Adversarial Example Research by Gilmer et al (summary)

Adversarial examples have been great for getting more ML researchers to pay attention to alignment considerations. I personally have spent a fair of time thinking about adversarial examples, I think the topic is fascinating, and I've had a number of ideas for addressing them.... (read 1304 more words →)

Self-Fulfilling Prophecies Aren't Always About Self-Awareness

John_Maxwell

This is a belated follow-up to my Dualist Predict-O-Matic post, where I share some thoughts re: what could go wrong with the dualist Predict-O-Matic.

Belief in Superpredictors Could Lead to Self-Fulfilling Prophecies

In my previous post, I described a Predict-O-Matic which mostly models the world at a fuzzy resolution, and only "zooms in" to model some part of the world in greater resolution if it thinks knowing the details of that part of the world will improve its prediction. I considered two cases: the case where the Predict-O-Matic sees fit to model itself in high resolution, and the case where it doesn't, and just makes use of a fuzzier "outside view" model of... (read 1038 more words →)

What AI safety problems need solving for safe AI research assistants?

John_Maxwell

In his AI Safety “Success Stories” post, Wei Dai writes:

[This] comparison table makes Research Assistant seem a particularly attractive scenario to aim for, as a stepping stone to a more definitive success story. Is this conclusion actually justified?

I share Wei Dai's intuition that the Research Assistant path is neglected, and I want to better understand the safety problems involved in this path.

Specifically, I'm envisioning AI research assistants, built without any kind of reinforcement learning, that help AI alignment researchers identify, understand, and solve AI alignment problems. Some concrete examples:

Possible with yesterday's technology: Document clustering that automatically organizes every blog post about AI alignment. Recommendation systems that find AI alignment posts

... (read more)

LESSWRONG
LW

LESSWRONG
LW

John_Maxwell

Thoughts on designing policies for oneself

System Administrator Appreciation Day - Thanks Trike!

Applying Behavioral Psychology on Myself

The Case for a Bigger Audience

John_Maxwell

Eliciting Latent Knowledge Via Hypothetical Sensors

Why GPT wants to mesa-optimize & how we might change this

John_Maxwell's Shortform

Are HEPA filters likely to pull COVID-19 out of the air?

Comprehensive COVID-19 Disinfection Protocol for Packages and Envelopes

Why don't singularitarians bet on the creation of AGI by buying stocks?

When are immunostimulants/immunosuppressants likely to be helpful for COVID-19?

Predictions & Self-awareness

John_Maxwell

Thoughts on designing policies for oneself

System Administrator Appreciation Day - Thanks Trike!

Applying Behavioral Psychology on Myself

The Case for a Bigger Audience

John_Maxwell

Eliciting Latent Knowledge Via Hypothetical Sensors

Why GPT wants to mesa-optimize & how we might change this

John_Maxwell's Shortform

Are HEPA filters likely to pull COVID-19 out of the air?

Comprehensive COVID-19 Disinfection Protocol for Packages and Envelopes

Why don't singularitarians bet on the creation of AGI by buying stocks?

When are immunostimulants/immunosuppressants likely to be helpful for COVID-19?

Predictions & Self-awareness

Idea #1: Detecting failure using a hold-out sensor

Explanation of beam search

Belief in Superpredictors Could Lead to Self-Fulfilling Prophecies