User Comment Replies

Does this help outer alignment?

Goal: tile the universe with niceness, without knowing what niceness is.

Method

We create:
- a bunch of formulations of what niceness is.
- a tiling AI, that given some description of niceness, tiles the universe with it.
- a forecasting AI, that given a formulation of niceness, a description of the tiling AI, a description of the universe and some coordinates in the universe, generates a prediction of what the part of the universe at the coordinates looks like after the tiling AI has tiled it with the formulation of niceness.

Foll... (read more)

3Viliam8d

I like the relative simplicity of this approach, but yeah, there is a risk that a tiling agent would produce (a more sophisticated version of) humans that have a permanent smile on their faces but feel horrible pain inside. Something bad that would look convincingly good at first sight, enough to fool the forecasting AI, or rather enough to fool the people who are programming and testing the forecasting AI.

Finishing The SB-1047 Documentary In 6 Weeks

Jono18d10

I have looked around a bit and have not seen any updates since November, which estimated this be finished in early February.
Could you give another update, or link a more recent one if it exists?

2Michaël Trazzi13d

it's almost finished, planning to release in april

Jono's Shortform

Jono3mo30

P(doom) can be approximately measured.
If reality fluid describes the territory well, we should be able to see close worlds that already died off.

For nuclear war we have some examples.
We can estimate the odds that the Cuban missile crisis and Petrov's decision went badly. If we accept that luck was a huge factor in us surviving those events (or not encountering events like it), we can see how unlikely our current world is to still live.

A high P(doom) implies that we are about to (or already did) encounter some very unlikely events that worked out susp... (read more)

What should I do? (long term plan about starting an AI lab)

Jono10mo52

I don't know if you have already, but this might be the time to take a long and hard look at the probblem and consider whether deep learning is the key to solving it.

What is the problem?

reckless unilateralism? -> go work for policy or chip manufacturing
inabillity to specify human values? -> that problem looks not DL at all to me
powerful hackers stealing all the proto-AGIs in the next 4 years? -> go cybersec
deception? -> (why focus there? why make an AI that might deceive you in the first place?) but that's pretty ML, though I'm not sure interp

... (read more)

Closed-Source Evaluations

Jono10mo10

agreed

Shallow review of live agendas in alignment & safety

Jono1y43

ai-plans.com aims to collect research agendas and have people comment on their strengths and vulnerabilities. The discord also occasionally hosts a critique-a-ton, where people discuss specific agendas.

4Kabir Kumar1y

Yes, we host a bi-monthly Critique-a-Thon- the next one is from December 16th to 18th! Judges include: - Nate Soares, President of MIRI, - Ramana Kumar, researcher at DeepMind - Dr Peter S Park, MIT postdoc at the Tegmark lab, - Charbel-Raphael Segerie, head of the AI unit at EffiSciences.

AI as a science, and three obstacles to alignment strategies

Jono1yΩ63-2

We do not know, that is the relevant problem.

Looking at the output of a black box is insufficient. You can only know by putting the black box in power, or by deeply understanding it.
Humans are born into a world with others in power, so we know that most humans care about each other without knowing why.
AI has no history of demonstrating friendliness in the only circumstances where that can be provably found. We can only know in advance by way of thorough understanding.

A strong theory about AI internals should come first. Refuting Yudkowsky's theory about how it might go wrong is irrelevant.

TurnTrout1yΩ511-2

Well, if someone originally started worrying based on strident predictions of sophisticated internal reasoning with goals independent of external behavior, then realizing that's currently unsubstantiated should cause them to down-update on AI risk. That's why it's relevant. Although I think we should have good theories of AI internals.

1mesaoptimizer1y

I think the actual reason we believe humans could care about each other is because we've evolved the ability to do so, and that most humans share the same brain structure, and therefore the same tendency to care for people they consider their "ingroup".

The Löbian Obstacle, And Why You Should Care

Jono2y00

Layman here 👋
Iiuc we cannot trust the proof of an unaligned simulacra's suggestion because if it is smarter than us.
Would that be a non-issue if verifying the proof is easier than making it?
If we can know how hard it is to verify a proof without verifying, then we can find a safe protocol for communicating with this simulacra. Is this possible?

Jono3y20

it might be the case that any kind of meaningful values would be reasonably encodable as answers to the question "what next set of MPIs should be instantiated?"

What examples of (meaningless) values are not answers to "What next set of MPIs should be instantiated?"

2Tamsin Leake3y

wanting the moon to be green even when no moral patient is looking; or more generally, having any kind of preference about which computations which don't causate onto any moral patient are run.

All AGI safety questions welcome (especially basic ones) [Sept 2022]

Jono3y20

What does our world (a decade) after the employment of a successfully aligned AGI look like?

1Lone Pine3y

Interpolate Ray Kurtzweil's vision with what the world looks like today, with alpha being how skeptical of Ray you are.

AGI Safety FAQ / all-dumb-questions-allowed thread

Jono3y10

Thank you plex, I was not aware of this wiki.
The pitch is nice, I'll incorporate it.

AGI Safety FAQ / all-dumb-questions-allowed thread

Jono3y75

Why do I care if the people around me care about AI risk?

1. when AI is going to rule we'd like the people to somehow have some power I reckon.
I mean creating any superintelligence is a powergrab. Making one in secret is quite hostile, shouldn't people get a say or at least insight in what their future holds?

2. Nobody still really knows what we'd like the superint to do. I think an ML researcher is as capable of voicing their desires for the future as an artist. The field surely can benefit from interdisciplinary approaches.

3. As with nuclear war, I'm sure ... (read more)

AGI Safety FAQ / all-dumb-questions-allowed thread

Jono3y10

I'm concerned with the ethics.
Is it wrong to doom speak to strangers? Is that the most effective thing here? I'd be lying if I said I was fine, but would it be best to tell them I'm "mildly concerned"?

How do convey these grave emotions I have while maximally getting the people around me to care about mitigating AI risk?

Should I compromise on truth and downplay my concerns if that will get someone to care more? Should I expect people to be more receptive to the message of AI risk if I'm mild about it?

1Carl Feynman3y

Why do you care if people around you, who presumably have lives to live, care about AI risk? It's not a problem like AIDS or groundwater pollution, where individual carefulness is needed to make a difference. In those cases, telling everybody about the problem is important, because it will prevent them having unprotected sex, or dumping used motor oil in their backyard. Unaligned AGI is a problem like nuclear war or viral gain-of-function research, where a few people work on the problem pretty much full time. If you want to increase the population of such people, that's fine, but telling your mother-in-law that the world is doomed isn't going to help.

AGI Safety FAQ / all-dumb-questions-allowed thread

Jono3y*22

What do I tell the people who I know but can't spend lots of time with?

Clarification: How do I get relative strangers who converse with me IRL to maximally care about the dangers of AI?

Do I downplay my concerns such that they don't think I'm crazy?
Do I mention it every time I see them to make sure they don't forget?
Do I tolerate third parties butting in and making wrong statements?
Do I tell them to read up on it and pester them on whether they read it already?
Do I never mention it to laymen to avoid them propagating wrong memes?
Do I seek out and approach p... (read more)

8plex3y

Show a genuine keen interest in the things they have deep models of[1] first before bringing up alignment, unless they invite you to talk first. Steer towards deep conversation with some well-chosen questions[2], but be very open to having it about whatever they know most about rather than AI immediately. At some point, they are likely to ask about what you're interested in[3]. Then you have an A/B tested elevator pitch prepared, and adapted for your specific audience (ideally as few people as possible, which helps to lower the amount of in-the-moment status you need to spend on a brief weird-sounding monologue). Mine usually goes something like: Then you let them steer and answer their questions as honestly and accurately as you can. If there's a lull in the conversation, bringing their attention to the arc of civilization with accelerating change viewable in their lifetimes eluded to in The Most Important Century and Sapiens is a good filler, but let them lead the conversation and politely (praising them for good questions!) give them quickfire answers. Be sure to flag any question you struggle to answer well for further research and thank them for giving you a question which you don't have an answer to yet. Feel free to post it to Stampy if we're missing it from our canonical questions. This approach has a very good rate of the other person walking away seeming to take the concerns seriously, and a fairly good rate of people later joining the effort. It does depend on you actually having good answers to hand to their first few "but why don't we just" questions, which means being fairly well read (or watching all of Rob Miles). 1. ^ Almost everyone has deep models of something, for example a supermarket worker taught me about logistics and the changes to training needs and autonomy brought on by automation recently. And by learning the 5-10 minute version of everyone's deep models you become more intellectually awesome, which helps for all sorts of t

1Yonatan Cale3y

"tell" with what goal? (could you ask in other words or give an example answer so I'll understand what you're pointing at?)

LESSWRONG
LW

All of Jono's Comments + Replies