LESSWRONG
LW

All of catherio's Comments + Replies

COI: I work at Anthropic

I confirmed internally (which felt personally important for me to do) that our partnership with Palantir is still subject to the same terms outlined in the June post "Expanding Access to Claude for Government":

For example, we have crafted a set of contractual exceptions to our general Usage Policy that are carefully calibrated to enable beneficial uses by carefully selected government agencies. These allow Claude to be used for legally authorized foreign intelligence analysis, such as combating human trafficking, identifying cover

... (read more)

4ChristianKl8mo

The United States has laws that prevent the US intelligence and defense agencies from spying on their own population. The Snowden revelations showed us that the US intelligence and defense agencies did not abide by those limits. Facebook has a usage policy that forbids running misinformation campaigns on their platform. That did not stop US intelligence and defense agencies from running disinformation campaigns on their platform. Instead of just trusting contracts, Antrophics could add oversight mechanisms, so that a few Antrophics employees can look over how the models are used in practice and whether they are used within the bounds that Antrophics expects them to be used in. If all usage of the models is classified and out of reach of checking by Antrophics employees, there's no good reason to expect the contract to be limiting US intelligence and defense agencies if those find it important to use the models outside of how Antrophics expects them to be used. This sounds to me like a very carefully worded nondenail denail. If you say that one example of how you can break your terms is to allow a select government entity to do foreign intelligence analysis in accordance with applicable law and not do disinformation campaigns, you are not denying that another example of how you could do expectations is to allow disinformation campaigns. If Antrophics would be sincere in this being the only expectation that's made, it would be easy to add a promise to Exceptions to our Usage Policy, that Anthropic will publish all expectations that they make for the sake of transparency. Don't forget, that probably only a tiny number of Anthropic employees have seen the actual contracts and there's a good chance that those are build by classification from talking with other Anthropics employees about what's in the contracts. At Antrophics you are a bunch of people who are supposed to think about AI safety and alignment in general. You could think of this as a testcase of

My understanding of Anthropic strategy

catherio2y42

A "Core Views on AI Safety" post is now available at https://www.anthropic.com/index/core-views-on-ai-safety

(Linkpost for that is here: https://www.lesswrong.com/posts/xhKr5KtvdJRssMeJ3/anthropic-s-core-views-on-ai-safety.)

Appendix: How to run a successful Hamming circle

catherio3y30

I’ve run Hamming circles within CFAR contexts a few times, and once outside. Tips from outside:

Timing can be tricky here! If you do 4x 20m with breaks, and you’re doing this in an evening, then by the time you get to the last person, people might be tired.

Especially so if you started with the Hamming Questions worksheet exercise (link as prereq at top of post).

I think next time I would drop to 15 each, and keep the worksheet.

Transformer Circuits

catherio4yΩ14250

Thanks for the writeup! The first paper covers the first half of the video series, more or less. I've been working on a second paper which will focus primarily on the induction bump phenomenon (and other things described in the second half of the video series), so much more to come there!

gwern3yΩ7100

Second paper is out: https://twitter.com/catherineols/status/1501250025661206529

Coordination Skills I Wish I Had For the Pandemic

catherio4y90

I appreciate the concept of "Numerical-Emotional Literacy". In fact, this is what I personally think/feel the "rationalist project" should be. To the extent I am a "rationalist" then precisely specifically what I mean by that is that knowing what I value, and pursuing numerical-emotional literacy around it, is important to me.

COVID-19: home stretch and fourth wave Q&A

catherio4y90

To make in-line adjustments, grab a copy of the spreadsheet (https://www.microcovid.org/spreadsheet) and do anything you like to it!

COVID-19: home stretch and fourth wave Q&A

catherio4y70

Also, if you live alone and don't have any set agreements with anyone else, then the "budgeting" lens is sort of just a useful tool to guide thinking. Absent pod agreements, as an individual decisionmaker, you should just spend uCoV when it's worth the tradeoff, and not when it's not.

COVID-19: home stretch and fourth wave Q&A

catherio4y30

You could think about it as an "annualized" risk, more than an "annual" risk; more like "192 points per week, in a typical week, on average" and it kind of amortizes out, and less like "you have 10k and once you spend it you're done"

microCOVID.org: A tool to estimate COVID risk from common activities

catherio4y20

There is now a wired article about this tool and the process of creating it: https://www.wired.com/story/group-house-covid-risk-points/

I think the reporter did a great job of capturing what an "SF group house" is like and how to live a kind of "high IQ / high EQ" rationalist-inspired live, so this might be a thing one could send to friends/family about "how we do things".

Dario Amodei leaves OpenAI

catherio5y150

It's not just Dario, it's a larger subset of OpenAI splitting off: "He and a handful of OpenAI colleagues are planning a new project, which they tell us will probably focus less on product development and more on research. We support their move and we’re grateful for the time we’ve spent working together."

Neel Nanda5y190

Jack Clark and Chris Olah announced on Twitter that they're also leaving

Ben Pace5y*190

Interested to know if Paul's team and Chris Olah's team and Jan Leike's team/project are moving with.

Rationalist Town Hall: Pandemic Edition

Answer by catherioNov 02, 2020100

I heard someone wanted to know about usage statistics for the microcovid.org calculator. Here they are!

2Ben Pace5y

That's great, thanks!

microCOVID.org: A tool to estimate COVID risk from common activities

catherio5y30

Sorry to leave you hanging for so long Richard! This is the reason why in the calculator we ask about "number of people typically near you at a given time" for the duration of the event. (You can also think of this as a proxy for "density of people packed into the room".) No reports like that that I'm aware of, alas!

Rationalist Town Hall: Pandemic Edition

catherio5y170

Want to just give credit to all the non-rationalist coauthors of microcovid.org! (7 non-rationalists and 2 "half-rationalists"?)

I've learned a LOT about the incredible power of trusted collaborations between "hardcore epistemics" folks and much more pragmatic folks with other skillsets (writing, UX design, medical expertise with ordinary people as patients, etc). By our powers combined we were able to build something usable by non-rationalist-but-still-kinda-quantitative folks, and are on our way to something usable by "normal people" 😲.

We've been able to... (read more)

3Ben Pace5y

That's all awesome to hear :)

microCOVID.org: A tool to estimate COVID risk from common activities

catherio5y30

Also, don't forget to factor in "kicking off a chain of onwards infections" into your COVID avoidance price somehow. You can't stop at valuing "cost of COVID to *me*".

We don't really know how to do this properly yet, but see discussion here: https://forum.effectivealtruism.org/posts/MACKemu3CJw7hcJcN/microcovid-org-a-tool-to-estimate-covid-risk-from-common?commentId=v4mEAeehi4d6qXSHo#No5yn8nves7ncpmMt

6Austin Chen5y

Good point, thanks. Running the "microCOVID to $" conversion from the other end of the spectrum, the recommendation of 1% COVID risk = 10k μCoV to spend/year would suggest a conversion rate of $1 per μCoV (if your yearly discretionary budget is on the order of $10k/year). I keep coming back to the "dollars conversion" because there's a very real sense in which we're trained our entire lives to evaluate how to price things in dollars; if I tell you a meal costs $25 you have an instant sense of whether that's cheap or outrageous. Since we don't have a similar fine-tuned model for risk, piggybacking one on the other could be a good way to build intuition faster.

microCOVID.org: A tool to estimate COVID risk from common activities

catherio5y30

Sadly nothing useful. As mentioned here (https://www.microcovid.org/paper/2-riskiness#fn6) we think it's not higher than 10%, but we haven't found anything to bound it further.

3William_S5y

https://www.cdc.gov/mmwr/volumes/69/wr/mm6930e1.htm found that ~1 in 5 of 18-34 year olds with no underlying health conditions had symptoms 3 weeks later (telephone survey of people who'd been symptomatic and had a positive test). Other discussion in comments of https://www.lesswrong.com/posts/ahYxBHLmG7TiGDqxG/do-we-have-updated-data-about-the-risk-of-permanent-chronic

5Tim Liptrot5y

One way to bound the risk of long term consequences is to assume the long term consequences will be less severe than the infection itself. So if 1% of people in their 20's experience reduced lung capacity during infection, you can assume that less than 1% will have permanently reduced lung capacity. I have never heard of a disease which was worse after you recover than before. I suspect that some people are hesitant to discuss the rate of long term consequences for young covid patients for fear of encouraging people not to social distance. But then the cost is a loss of trust between people and the information provider.

microCOVID.org: A tool to estimate COVID risk from common activities

catherio5y40

"I've heard people make this claim before but without explaining why. [...] the key risk factors for a dining establishment are indoor vs. outdoor, and crowded vs. spaced. The type of liquor license the place has doesn't matter."

I think you're misunderstanding how the calculator works. All the saved scenarios do is fill in the parameters below. The only substantial difference between "restaurant" and "bar" is that we assume bars are places people speak loudly. That's all. If the bar you have in mind isn't like that, just change the parameters.

0remizidae5y

I suggest clarifying in the calculator how people are supposed to use the "scenarios" versus Step 2 or Step 3. Also, you suggest that the only difference between restaurant and bar in your model is volume of talking, but that doesn't seem to fit the result when I pick Step 2 and Step 3--the bar scenario gives me 10,000 microcovids, but the indoor place with loud talking option is only 9000. Also, why do you think all bars are indoor and involve loud talking? Weird assumption. Some bars are very quiet and empty, lots have outdoor seating nowadays. Overall, I think you guys haven't quite figured out what your intended audience is. If you want to reach the general public, you'll need an easy to use calculator that does not smuggle in a lot of doubtful assumptions. Yeah, I understand you can download the spreadsheet and customize, but that option is for the nerds.

3 Cultural Infrastructure Ideas from MAPLE

catherio6y190

entry-level leadership

It has become really salient to me recently that good practice involves lots of prolific output in low-stakes throwaway contexts. Whereas a core piece of EA and rationalist mindsets is steering towards high-stakes things to work on, and treating your outputs as potentially very impactful and not to be thrown away. In my own mind “practice mindset” and “impact mindset” feel very directly in tension.

I have a feeling that something around this mindset difference is part of why world-saving orientation in a community might be correlated with inadequate opportunities for low-stakes leadership practice.

3Unreal6y

Worth noting here that the Schedule at MAPLE is very conducive for creating these low-stakes contexts. In fact, inside the Schedule, you are always in such a context... There is a world-saving mission at MAPLE, but at MAPLE, it does not define people's worth or whether they deserve care / attention or whether they belong in the community. I think the issue with both the EA and rationalist community is that people's "output" is too easily tied to their sense of worth. I could probably write many words on this phenomenon in the Bay community. It is hard to convey in mere words what MAPLE has managed to do here. There is a clearer separation between "your current output level" and "your deserving-ness / worthiness as a human." It was startling to experience this separation occurring on a visceral level within me. Now I'm much more grounded, self-confident, and less likely to take things personally, and this shift feels permanent and also ongoing.

3Hazard6y

I think this is a very important point, and that anyone trying to build community around a mission should pay attention to it.

Raemon6y*130

This is actually exactly why I think the rationality community / memeset makes more sense as entry level EA than the EA community

Tal Yarkoni: No, it's not The Incentives—it's you

catherio6y330

Here's another further-afield steelman, inspired by blameless postmortem culture.

When debriefing / investigating a bad outcome, it's better for participants to expect not to be labeled as "bad people" (implicitly or explicitly) as a result of coming forward with information about choices they made that contributed to the failure.

More social pressure against admitting publicly that one is contributing poorly contributes to systematic hiding/obfuscation of information about why people are making those choices (e.g. incentives). And we nee... (read more)

Tal Yarkoni: No, it's not The Incentives—it's you

catherio6y170

Another distinction I think is important, for the specific example of "scientific fraud vs. cow suffering" as a hypothetical:

Science is a terrible career for almost any goal other than actually contributing to the scientific endeavor.

I have a guess that "science, specifically" as a career-with-harmful-impacts in the hypothetical was not specifically important to Ray, but that it was very important to Ben. And that if the example career in Ray's "which harm is highest priority?" thought experiment had been "high-frequ... (read more)

9Raemon6y

I think your final paragraph is getting at an important element of the disagreement. To be clear, *I* treat science and high frequency trading differently, too, but yes I think to me it registers as "very important" and to Ben it seems closer to "sacred" (which, to be clear, seems like a quite reasonable outlook to me) Small background tidbit that's part of this: I think many scientists have goals that seem like more like like "do what their parents want" and "be respectable" or something. Which isn't about traditional financial success, but looks like opting into a particular weird sub-status-hierarchy that one might plausibly well suited to win at. Another background snippet informing my model: Recently I was asking an academic friend "hey, do you think your field could benefit from better intellectual infrastructure?" and they said "you mean like LessWrong?" and I said "I mean a meta-level version of it that tries to look at the local set of needs and improve communication in some fashion." And they said something like "man, sorry to disappoint you, but most of academia is not, like, trying to solve problems together, the way it looks like the rationality or AI alignment communities are. They wouldn't want to post clearer communications earlier in the idea-forming stage because they'd be worried about getting scooped. They're just trying to further their own career." This is just one datapoint, and again I know very little about academia overall. Ben's comments about how the replication crisis happened via an organic grassroots process seems quite important and quite relevant. Reiterating from my other post upthread: I am not making any claims about what people in science and/or academia should do. I'm making conditional claims, which depend on the actual state of science and academia.

Benquo6y180

You're right that I'd respond to different cases differently. Doing high frequency trading in a way that causes some harm - if you think you can do something very good with the money - seems basically sympathetic to me, in a sufficiently unjust society such as ours.

Any info good (including finance and trading) is on some level pretending to involve stewardship over our communal epistemics, but the simulacrum level of something like finance is pretty high in many respects.

Tal Yarkoni: No, it's not The Incentives—it's you

catherio6y190

One distinction I see getting elided here:

I think one's limited resources (time, money, etc) are a relevant question in one's behavior, but a "goodness budget" is not relevant at all.

For example: In a world where you could pay $50 to the electric company to convert all your electricity to renewables, or pay $50 more to switch from factory to pasture-raised beef, then if someone asks "hey, your household electrical bill is destroying the environment, why didn't you choose the green option", a relevant reply is "becaus... (read more)

You Get About Five Words

catherio6y50

The recent EA meta fund announcement linked to this post (https://www.centreforeffectivealtruism.org/blog/the-fidelity-model-of-spreading-ideas ) which highlights another parallel approach: in addition to picking idea expressions that fail gracefully, to prefer transmission methods that preserve nuance.

Boring Advice Repository

catherio6y70

Nah, it's purely a formatting error - the trailing parenthesis was included in the link erroneously. Added whitespace to fix now.

Boring Advice Repository

catherio6y*240

If you have ovaries/uterus, a non-zero interest in having kids with your own gametes, and you're at least 25 or so: Get a fertility consultation.

They do an ultrasound and a blood test to estimate your ovarian reserve. Until you either try to conceive or get other measurements, you don't know if you have normal fertility for your age, or if your fertility is already declining without knowing it.

This is important information to know, in order to make later informed decisions (such as when and whether to freeze your eggs, when to start looking for a... (read more)

2Elo6y

Also note. Consider donating eggs or sperm.

4Raemon6y

(note that the link doesn't work for me, and I assume most people, which makes it seem a bit odd to include if you intended to keep it friends-only)

How much funding and researchers were in AI, and AI Safety, in 2018?

catherio6y70

Two observations:

I'd expect that most "AI capabilities research" that goes on today isn't meaningfully moving us towards AGI at all, let alone aligned AGI. For example, applying reinforcement learning to hospital data. So "how much $ went to AI in 2018" would be a sloppy upper bound on "important thoughts/ideas/tools on the path to AGI".
There's a lot of non-capabilities non-AGI research targeted at "making the thing better for humanity, not more powerful". For example, interpretability work on models s

... (read more)

Raemon6y*130

Nod. Definitely open to better versions of the question that carve at more useful joints. (With a caveat that the question is more oriented towards "what are the easiest street lamps to look under" than "what is the best approximation")

So, I guess my return question is: do you have suggestions on subfields to focus on, or exclude, from "AI capabilities research" that more reliably points to "AGI", that you think there's likely to exist public data on? (Or some other way to carve up AI research space)

It does seem... (read more)

Current AI Safety Roles for Software Engineers

catherio7y160

Important updates to your model:

OpenAI recently hired Chris Olah (and his collaborator Ludwig Schubert), so *interpretability* is going to be a major and increasing focus at that org (not just deep RL). This is an important upcoming shift to have on your radar.
DeepMind has at least two groups doing safety-related research: the one we know of as "safety" is more properly the "Technical AGI Safety" team, but there is also a "Safe and Robust AI team" that does more like neural net verification and adversarial examples.
RE "Ge

... (read more)

5ozziegooen7y

Thanks for the updates. Sorry about getting your organization wrong, I changed that part.

The funnel of human experience

catherio7y200

Our collective total years of experience is ~119 times the age of the universe. (The universe is 13.8 billion years old, versus 1.65 trillion total human experience years so far).

Also: at 7.44 billion people alive right now, we collectively experience the age of the universe every ~2 years (https://twitter.com/karpathy/status/850772106870640640?lang=en)

catherio7y10

Can ... I set moderation norms, or not?

2Raemon7y

(note: I can reply to this because I'm an admin, others shouldn't be able to until you make it a not-draft post) Right now you can only set norms if you have 2000 karma. However, we've been meaning to make it so that you can set norms on _personal_ blogposts if you have around 100 karma or so, but haven't gotten around to it. If it's something you think you'd make active use of, could prioritize it more highly.

Personal relationships with goodness

catherio7y80

I hadn't read that link on the side-taking hypothesis of morality before, but I note that if you find that argument interesting, you would like Gillian Hadfield's book "Rules for a Flat World". She talks about law (not "what courts and congress do" but broadly "the enterprise of subjecting human conduct to rules") and emphasizes that law is similar to norms/morality, except in addition there is a canonical place that "the rules" get posted and also a canonical way to obtain a final arbitration about questio... (read more)

4Qiaochu_Yuan7y

I think there was a part of me that was still in some sense a moral realist and the side-taking hypothesis broke it.

Critch on career advice for junior AI-x-risk-concerned researchers

catherio7y170

FWIW, this claim doesn't match my intuition, and googling around, I wasn't able to quickly find any papers or blog posts supporting it.

"Explaining and Harnessing Adversarial Examples" (Goodfellow et al. 2014) is the original demonstration that "Linear behavior in high-dimensional spaces is sufficient to cause adversarial examples".

I'll emphasize that high-dimensionality is a crucial piece of the puzzle, which I haven't seen you bring up yet. You may already be aware of this, but I'll emphasize it anyway: the usu... (read more)

ESRogs7y100

As the dimension increases, a decision-boundary hyperplane that has 1% test error rapidly gets extremely close to the equator of the sphere

What does the center of the sphere represent in this case?

(I'm imaging the training and test sets consisting of points in a highly dimensional space, and the classifier as drawing a hyperplane to mostly separate them from each other. But I'm not sure what point in this space would correspond to the "center", or what sphere we'd be talking about.)

ESRogs7y100

The central argument can be understood from the intuitions presented in Counterintuitive Properties of High Dimensional Space in the section titled Concentration of Measure

Thanks for this link, that is a handy reference!

2ESRogs7y

Slightly off-topic, but quick terminology question. When I first read the abstract of this paper, I was very confused about what it was saying and had to re-read it several times, because of the way the word "tradeoff" was used. I usually think of a tradeoff as a inverse relationship between two good things that you want both of. But in this case they use "tradeoff" to refer to the inverse relationship between "test error", and "average distance to nearest error". Which is odd, because the first of those is bad and the second is good, no? Is there something I'm missing that causes this to sound like a more natural way of describing things to others' ears?

1John_Maxwell7y

Thanks for the links! (That goes for Wei and Paul too.) I'd expect this to be true or false depending on the shape of the misclassified region. If you think of the input space as a white sheet, and the misclassified region as red polka dots, then we measure test error by throwing a dart at the sheet and checking if it hits a polka dot. To measure adversarial vulnerability, we take a dart that landed on a white part of the sheet and check the distance to the nearest red polka dot. If the sheet is covered in tiny red polka dots, this distance will be small on average. If the sheet has just a few big red polka dots, this will be larger on average, even if the total amount of red is the same. As a concrete example, suppose we trained a 1-nearest-neighbor classifier for 2-dimensional RGB images. Then the sheet is mostly red (because this is a terrible model), but there are splotches of white associated with each image in our training set. So this is a model that has lots of test error despite many spheres with 0% misclassifications. To measure the size of the polka dots, you could invert the typical adversarial perturbation procedure: Start with a misclassified input and find the minimal perturbation necessary to make it correctly classified. (It's possible that this sheet analogy is misleading due to the nature of high-dimensional spaces.) Anyway, this relates back to the original topic of conversation: the extent to which capabilities research and safety research are separate. If "adversarial vulnerability" and "amount of test set error" are inextricably linked, that suggests that reducing test set error ("capabilities" research) improves safety, and addressing adversarial vulnerability ("safety" research) advances capabilities. The extreme version of this position is that software advances are all good and hardware advances are all bad. Thanks. I'd seen both papers, but I don't like linking to things I haven't fully read.

Funding the Reproducibility Crises as effective giving

catherio8y210

When evaluating whether there is a broad base of support, I think it's important to distinguish "one large-scale funder" from "narrow overall base of support". Before the Arnold foundation's funding, the reproducibility project had a broad base of committed participants contributing their personal resources and volunteering their time.

To add some details from personal experience: In late 2011 and early 2012, the Reproducibility Project was a great big underfunded labor of love. Brian Nosek had outlined a plan to replicate ~50 studies - ... (read more)

7jsalvatier8y

Thanks, this was super useful context. Seems like its more that the institutions are broken rather than few people caring. Or could be that most scientists don't care that much but a significant minority care a lot. And for that to cause lots of change you need money, but to get money you need the traditional funders (who don't care because most scientists don't care) or you need outside help.