All of chanamessinger's Comments + Replies

Leopold Aschenbrenner is starting a cross between a hedge fund and a think tank for AGI. I have read only the sections of Situational Awareness most relevant to this project, and I don't feel nearly like I understand all the implications, so I could end up being quite wrong.  Indeed, I’ve already updated towards a better and more nuanced understanding of Aschenbrenner's points, in ways that have made me less concerned than I was to begin with.  But I want to say publicly that the hedge fund idea makes me nervous.

Before I give my re... (read more)

It sounds from this back and forth like we should assume that Anthropic leadership who left from OAI (so Dario and Daniela Amodei, Jack Clark, Sam McCandlish, others?) are still under NDA because it was probably mutual. Does that sound right to others?

Oh! I think you're right, thanks!

I feel pretty sympathetic to the desire not to do things by text; I suspect you get much more practiced and checked over answers that way.

3antanaclasis
Another big thing is that you can’t get tone-of-voice information via text. The way that someone says something may convey more to you than what they said, especially for some types of journalism.

I suspect you get much more practiced and checked over answers that way.

In some contexts this would be seen as obviously a good thing.  Specifically, if the thing you're interested in is the ideas that your interviewee talks about, then you want them to be able to consider carefully and double-check their facts before sending them over.

The case where you don't want that would seem to be the case where your primary interest is in the mental state of your interviewee, or where you hope to get them to stumble into revealing things they would want to hide.

which privacy skills you are able to execute.

 

This link goes to a private google doc, just fyi.

1Mateusz Bagiński
Wouldn't a DM be a more proper way to point this out?
2Raemon
lol that is amazingly terrible.  That doc was a memo at a private retreat that a) not actually that private, but b) is mostly just a repackaging of this: https://www.lesswrong.com/posts/rz73eva3jv267Hy7B/can-you-keep-this-confidential-how-do-you-know 

This is great!

I really like this about slack:

  • If you aren’t maintaining this, err on the side of cultivating this rather than doing high-risk / high-reward investments that might leave you emotionally or financially screwed.
    • (or, if you do those things, be aware I may not help you if it fails. I am much more excited about helping people that don’t go out of their way to create crises)


Seems like a good norm and piece of advice.

I'm confused how much I should care whether an impact assessment is commissioned by some organization. The main thing I generally look for is whether the assessment / investigation is independent. The argument is that because AISC is paying for it, that will influence the assessors? 

6habryka
My guess is it matters a lot, even if people aspire towards independence. I would update if someone has a long track record of clearly neutral-seeming reports for financial compensation, but I think in the absence of such a track record, my prior would be that people are very rarely capable of making strong negative public statements about people who are paying them.
6Linda Linsefors
This depends on how much you trust the actors involved. I know that me and Remmelt asked for an honest evaluation, and did not try to influence the result. But you don't know this. Me and Remmelt obviously believe in AISC, otherwise we would not keep running these programs. But since AISC has been chronically understaffed (like most non-profit initiatives) we have not had time to do a proper follow-up study. When we asked Arb to do this assessment, it was in large part to test our own believes. So far nothing surprising has came out of the investigation, which is reassuring. But if Arb found something bad, I would not want them to hide it. Here's some other evaluations of AISC (and other things) that where not commissioned by us. I think for both of them, they did not even talk to someone from AISC before posting, although for the second link, this was only due to miscommunication.  * Takeaways from a survey on AI alignment resources — EA Forum (effectivealtruism.org) * Thoughts on AI Safety Camp — LessWrong

I have not read most of what there is to read here, just jumping in on "illegal drugs" ---> ADHD meds. Chloe's comment spoke to weed as the illegal drug on her mind.

1Rebecca
Yeah Ben should have said illicit not illegal, because they are illegal to bring across the border except if you have a valid prescription, even if the place you purchased them didn’t require a prescription. But I wouldn’t consider it an unambiguous falsehood, like the following is mostly a sliding scale of frustrating ambiguity: 1. ‘asked Alice to illegally bring Schedule II medication into the country’ [edit: entirely correct according to NL’s stating of the facts] 2. ‘asked Alice to illegally bring Schedule II drugs into the country’ [some intermediate version, still completely factually correct but would be eliding the different between meth and Adderall] 3. ‘asked Alice to bring illegal drugs across the border’ [frustratingly bad choice of words that gives people a much worse impression than is accurate, from memory basically the thing that Ben said]

To clarify, this is specifically in the context "Kat requested that Alice bring a variety of illegal drugs across the border for her." Chloe didn't come into it.

AI has immense potential, but also immense risks. AI might be misused by China, or get of control. We should balance the needs for innovation and safety." I wouldn't call this lying (though I agree it can have misleading effects, see Issue 1).


Not sure where this slots in, but there's also a sense in which this contains a missing positive mood about how unbelievably good (aligned) AI could or will be, and how much we're losing by not having it earlier.

Interesting how many of these are "democracy / citizenry-involvement" oriented. Strongly agree with 18 (whistleblower protection) and 38 (simulate cyber attacks).

20 (good internal culture), 27 (technical AI people on boards) and 29 (three lines of defense) sound good to me, I'm excited about 31 if mandatory interpretability standards exist. 

42 (on sentience) seems pretty important but I don't know what it would mean.
 


 

4ryan_greenblatt
This is super late, but I recently posted: Improving the Welfare of AIs: A Nearcasted Proposal
2Zach Stein-Perlman
Assuming you mean the second 42 ("AGI labs take measures to limit potential harms that could arise from AI systems being sentient or deserving moral patienthood")-- I also don't know what labs should do, so I asked an expert yesterday and will reply here if they know of good proposals...

The top 6 of the ones in the paper (the ones I think got >90% somewhat or strongly agree, listed below), seem pretty similar to me - are there important reasons people might support one over another?

  • Pre-deployment risk assessments
  •  Evaluations of dangerous capabilities
  • Third-party model audits
  • Red teaming
  • Pre-training risk assessments 
  • Pausing training of dangerous models
2Zach Stein-Perlman
I think 19 ideas got >90% agreement. I agree the top ideas overlap. I think reasons one might support some over others depend on the details. 

Curious if you have any updates!

2jacquesthibs
Working on a new grant proposal right now. Should be sent this weekend. If you’d like to give feedback or have a look, please send me a DM! Otherwise, I can send the grant proposal to whoever wants to have a look once it is done (still debating about posting it on LW). Outside of that, there has been a lot of progress on the Cyborgism discord (there is a VSCode plugin called Worldspider that connects to the various APIs, and there has been more progress on Loom). Most of my focus has gone towards looking at the big picture and keeping an eye on all the developments. Now, I have a better vision of what is needed to create an actually great alignment assistant and have talked to other alignment researchers about it to get feedback and brainstorm. However, I’m spread way too thin and will request additional funding to get some engineer/builder to start building the ideas out so that I can focus on the bigger picture and my alignment work. If I can get my funding again (previous funding ended last week) then my main focus will be building out the system I have in my for accelerating alignment work + continue working on the new agenda I put out with Quintin and others. There’s some other stuff I‘d like to do, but those are lower priority or will depend on timing. It’s been hard to get the funding application done because things are moving so fast and I’m trying not to build things that will be built by default. And I’ve been talking to some people about the possibility of building an org so that this work could go a lot faster.

Chat GPT gives some interesting analysis when asked, though I think not amazingly accurate. (The sentence I gave it, from here is a weird example, though)

Does it say anything about AI risk that is about the real risks? (Have not clicked the links, the text above did not indicate to me one way or another).

5MaxRa
The report mentioned "harm to the global financial system [and to global supply chains]" somewhere as examples, which I found noteworthy for being very large scale harms and therefore plausibly requiring AI systems that the AI x-risk community is most worried about.
2Evan R. Murphy
I'm not sure if the core NIST standards go into catastrophic misalignment risk, but Barrett et al.'s supplemental guidance on the NIST standards does. I was a reviewer on that work, and I think they have more coming (see link in my first comment on this post for their first part).

This is great, and speaks to my experience as well. I have my own frames that map onto some of this but don't hit some of the things you've hit and vice versa. Thanks for writing!

Is this something Stampy would want to help with?

 

https://www.lesswrong.com/posts/WXvt8bxYnwBYpy9oT/the-main-sources-of-ai-risk

2plex
It's definitely something Stampy would want to link to, and if those authors wanted to maintain their list on Stampy rather than LessWrong that would be welcome, though I could imagine them wanting to retain editing restrictions. Edit: Added a link to Stampy.

I think that incentivizes self-deception on probabilities.  Also, P <10^-10 are pretty unusual, so I'd expect that to cause very little to happen.

Thanks! 

When you say "They do, however, have the potential to form simulacra that are themselves optimizers, such as GPT modelling humans (with pretty low fidelity right now) when making predictions"

do you mean things like "write like Ernest Hemingway"?

2Jozdien
Yep.  I think it happens on a much lower scale in the background too - like if you prompt GPT with something like the occurrence of an earthquake, it might write about what reporters have to say about it, simulating various aspects of the world that may include agents without our conscious direction.

Is it true that current image systems like stable diffusion are non-optimizers? How should that change our reasoning about how likely it is that systems become optimizers? How much of a crux is "optimizeriness" for people?

7Jozdien
My take is centred more on current language models, which are also non-optimizers, so I'm afraid this won't be super relevant if you're already familiar with the rest of this and were asking specifically about the context of image systems. Language models are simulators of worlds sampled from the prior representing our world (insofar as the totality of human text is a good representation of our world), and doesn't have many of the properties we would associate with "optimizeriness".  They do, however, have the potential to form simulacra that are themselves optimizers, such as GPT modelling humans (with pretty low fidelity right now) when making predictions.  One danger from this kind of system that isn't itself an optimizer, is the possibility of instantiating deceptive simulacra that are powerful enough to act in ways that are dangerous to us (I'm biased here, but I think this section from one of my earlier posts does a not-terrible job of explaining this). There's also the possibility of these systems becoming optimizers, as you mentioned.  This could happen either during training (where the model at some point during training becomes agentic and starts to deceptively act like a non-optimizer simulator would - I describe this scenario in another section from the same post), or could happen later, as people try to use RL on it for downstream tasks.  I think what happens here mechanistically at the end could be one of a number of things - the model itself completely becoming an optimizer, an agentic head on top of the generative model that's less powerful than the previous scenario at least to begin with, a really powerful simulacra that "takes over" the computational power of the simulation, etc. I'm pretty uncertain on numbers I would assign to either outcome, but the latter seems pretty likely (although I think the former might still be a problem), especially with the application of powerful RL for tasks that benefit a lot from consequentialist reasoning.  Th

Why do people keep saying we should maximize log(odds) instead of odds? Isn't each 1% of survival equally valuable?

1JakubK
Paul's comment here is relevant, but I'm also confused.
3sen
I don't know why other people say it, but I can explain why it's nice to say it. * log P(x) behaves nicely in comparison to P(x) when it comes to placing iterated bets. When you maximize P(x), you're susceptible to high risk high reward scenarios, even when they lead to failure with probability arbitrarily close to 1. The same is not true when maximizing log P(x). I'm cheating here since this only really makes sense when big-P refers to "principal" (i.e., the thing growing or shrinking with each bet) rather than "probability". * p(x) doesn't vary linearly with the controls we typically have, so calculus intuition tends to break down when used to optimize p(x). Log p(x) does usually vary linearly with the controls we typically have, so we can apply more calculus intuition to optimizing it. I think this happens because of the way we naturally think of "dimensions of" and "factors contributing to" a probability and the resulting quirks of typical maximum entropy distributions. * Log p(x) grows monotonically with p(x) whenever x is possible, so the result is the same whether you argmax log p(x) or p(x). * p(x) is usually intractable to calculate, but there's a slick trick to approximate it using the Evidence Based Lower Bound, which requires dealing with log p(x) rather than p(x) directly. Saying log p(x) calls that trick to mind more easily than saying just p(x). * All the cool papers do it.

In addition to Daniel's point, I think an important piece is probabilistic thinking - the AGI will execute not based on what will happen but on what it expects to happen. What probability is acceptable? If none, it should do nothing.

1reallyeli
I don't think this is an important obstacle — you could use something like "and act such that your P(your actions over the next year lead to a massive disaster) < 10^-10." I think Daniel's point is the heart of the issue.

Nice! Added these to the wiki on calibration: https://www.lesswrong.com/tag/calibration

Oh, whoops. I took from this later tweet in the thread that they were talking.

After years of tinkering and incremental progress, AIs can now play Diplomacy as well as human experts.[6]

 

Maybe this happened in 2022: https://twitter.com/polynoamial/status/1580185706735218689

2Daniel Kokotajlo
That's no-press Diplomacy: Diplomacy without the talking. Doesn't count IMO.

Here's the git! https://github.com/SonOfLilit/calibrate?fbclid=IwAR2vBZ8IWfMgHTPla0CbohCUIqmrMUl-XEcYIWhKUrJ4ZRfH2Eg7Z7Zf1J4

I will talk to the developer about it being open source - I think that was both of our ideals.

Do you know how to do this kind of thing? I'd be happy to pay you for your time.

1Stephen Bennett
I haven't worked on any browser extensions before (not sure what language they're written in), but I do know javascript well enough. We can probably work something out!

This seems interesting to me but I can't yet latch onto it. Can you give examples of secrets being one or the other?

Are you distinguishing between "secrets where the existence of the secret is a big part of the secret" and "secrets where it's not"?

2Drake Morrison
I think that's the gist of it. I categorize them as Secret and Private. Where Secret information is something I deny knowing, (and therefore fails to pass the onion test), and Private information is something that people can know exist, even if I won't tell them what it is (thereby passing the onion test). Also, see this which I found relevant.

Why would they be jokes?

Don't know what you mean in the latter sentence.
 

1Martin Vlach
Thanks for the links as they clarified a lot to me. The names of the tactics/techniques sounded strange to me and after unsuccessful googling for their meanings I started to believe it was a play with your readers.l, sorry if this suspicious of mine seemed rude. The second part was curiosity to explore some potential cases of "What could we bet on?".

Conversational moves in EA / Rationality that I like for epistemics
 

  • “So you are saying that”
  • “But I’d change my mind if”
  • “But I’m open to push back here”
  • “I’m curious for your take here”
  • “My model says”
  • “My current understanding is…”
  • “...I think this because…”
  • “...but I’m uncertain about…”
  • “What could we bet on?”
  • “Can you lay out your model for me?”
  • “This is a butterfly idea
  • “Let’s do a babble
  • “I want to gesture at something / I think this gestures at something true”
-1Martin Vlach
Can I bet the last 3 points are a joke? Anyway, do we have a method to find out check-points or milestones for betting on a progress against a certain problem( ex. AI development safety, Earth warming)?

This is why less wrong needs the full suite of emoji reacts.

I meant signposting to indicate things like saying "here's a place where I have more to say but not in this context" etc, during for instance a conversation, so I'm truthfully saying that there's more to the story.

Yeah, I think "intentionally causing others to update in the wrong direction" and "leaving them with their priors" end up pretty similar (if you don't make strong distinctions between action and omission, which I think this test at least partially rests on) if you have a good model of their priors (which I think is potentially the hardest part here).

Kind is one of the four adjectives in your description of Iron Hufflepuff.

5Duncan Sabien (Deactivated)
Ah, gotcha. (Also "lol"/"whoops.") "There is something in here of Iron Hufflepuff" not meant to equal "All of Iron Hufflepuff is in here." I agree the above does not represent kindness much. Tenacious is the bit that's coming through most strongly, and also if I were rewriting the lists today I would include "principled" or "consistent" or "conscientious" as a strong piece of Hufflepuff, and that's very much on display here.

Hm, Keltham has a lot of good qualities here, but kind doesn't seem among them.

2Duncan Sabien (Deactivated)
... seems like a non-sequitur; can you connect the dots for me?

Sounds scary, but thank you for the model of what's actually going on!

Oh woah! Thanks for linking.

True! 65 Watts! That would really be something.

Unfortunately I'm not seeing anything close to that on the Amazon UK site :/

Might be bad search skills, though.

1Brendan Long
It seems like there's less of them, but searches like "60w led corn bulb e26" find a few results: https://www.amazon.co.uk/s?k=60w+led+corn+bulb+e26&i=lighting

Your link's lightbulbs have a bayonet style, not the E27 threading :) Thanks for the other link! Amazon says currently unavailable.

ETA: Found some, will add to post

Tried to buy those, didn't have any luck finding ones that fit nicely into my sockets! (An embarassing mistake I didn't describe in detail is buying corn bulbs that turned out to be...mini?) If you have an amazon UK link for ones with E27 threading, that would be awesome.

ETA: Having looked, it looks like not all corn bulbs are brighter than the ones I have, though I have now found 2000 lumen ones.  I don't know if corn bulbs are still better if they have lower lumens. I would guess not?

ETA 2: The link above does have E27 if you click through the multiple listings in the same link, wasn't obvious to me at first, thanks!

1Brendan Long
I think you can find 6000 lumen E26/27 corn bulbs relatively easily: https://www.amazon.com/dp/B07T3GQB5J/ref=cm_sw_r_apan_glt_i_1X8ZFHR91SCX5R63RQWW?psc=1 This brand also has 10,000 lumen versions if you're willing to use an adapter, but it's probably easier to just use two of the E26/27 version. One downside compared to your current setup is that these are 60-100W, so I'd be a little worried about covering them with paper lanterns.
1Derek M. Jones
Click on green text, or Amazon UK have a search box, and Google ads displays a 4000 lumen bulb.

I saw people discussing forecasting success of this on twitter and people were saying that the intelligence agencies actually called this right. Does anyone know an easy link to what those agencies were saying?

Context: https://twitter.com/ClayGraubard/status/1496699988801433602?s=20&t=mQ8sAzMRppI8Pr44O38M3w

https://twitter.com/ClayGraubard/status/1496866236973658112?s=20&t=mQ8sAzMRppI8Pr44O38M3w
 

3lc
The thing about intelligence agencies is that they are really good at insider trading.
-8[anonymous]

I definitely find it helpful to be surrounded by people who will do this for me and help me cultivate a habit of it over time. The case for it being very impactful is if people do a one-time thing, like apply for something or put themselves in the running for something that they otherwise wouldn't have that makes a big difference. The ones that are about accountability (Can I remind you about that in a week?) also are sort of a conscientiousness loan, which can be cheap since it can be easier to check in on other people than to do it for yourself. 

It is definitely important to have sense of who you're talking to and what they need (law of equal and opposite advice). For what it's worth, 5-10 and 13 are aimed to be disproportionately helpful for people who have trouble doing things (depending on the reason). 

Load More