P(doom) can be approximately measured.
If reality fluid describes the territory well, we should be able to see close worlds that already died off.
For nuclear war we have some examples.
We can estimate the odds that the Cuban missile crisis and Petrov's decision went badly. If we accept that luck was a huge factor in us surviving those events (or not encountering events like it), we can see how unlikely our current world is to still live.
A high P(doom) implies that we are about to (or already did) encounter some very unlikely events that worked out susp...
I don't know if you have already, but this might be the time to take a long and hard look at the probblem and consider whether deep learning is the key to solving it.
What is the problem?
We do not know, that is the relevant problem.
Looking at the output of a black box is insufficient. You can only know by putting the black box in power, or by deeply understanding it.
Humans are born into a world with others in power, so we know that most humans care about each other without knowing why.
AI has no history of demonstrating friendliness in the only circumstances where that can be provably found. We can only know in advance by way of thorough understanding.
A strong theory about AI internals should come first. Refuting Yudkowsky's theory about how it might go wrong is irrelevant.
Well, if someone originally started worrying based on strident predictions of sophisticated internal reasoning with goals independent of external behavior, then realizing that's currently unsubstantiated should cause them to down-update on AI risk. That's why it's relevant. Although I think we should have good theories of AI internals.
Layman here 👋
Iiuc we cannot trust the proof of an unaligned simulacra's suggestion because if it is smarter than us.
Would that be a non-issue if verifying the proof is easier than making it?
If we can know how hard it is to verify a proof without verifying, then we can find a safe protocol for communicating with this simulacra. Is this possible?
Why do I care if the people around me care about AI risk?
1. when AI is going to rule we'd like the people to somehow have some power I reckon.
I mean creating any superintelligence is a powergrab. Making one in secret is quite hostile, shouldn't people get a say or at least insight in what their future holds?
2. Nobody still really knows what we'd like the superint to do. I think an ML researcher is as capable of voicing their desires for the future as an artist. The field surely can benefit from interdisciplinary approaches.
3. As with nuclear war, I'm sure ...
I'm concerned with the ethics.
Is it wrong to doom speak to strangers? Is that the most effective thing here? I'd be lying if I said I was fine, but would it be best to tell them I'm "mildly concerned"?
How do convey these grave emotions I have while maximally getting the people around me to care about mitigating AI risk?
Should I compromise on truth and downplay my concerns if that will get someone to care more? Should I expect people to be more receptive to the message of AI risk if I'm mild about it?
What do I tell the people who I know but can't spend lots of time with?
Clarification: How do I get relative strangers who converse with me IRL to maximally care about the dangers of AI?
Do I downplay my concerns such that they don't think I'm crazy?
Do I mention it every time I see them to make sure they don't forget?
Do I tolerate third parties butting in and making wrong statements?
Do I tell them to read up on it and pester them on whether they read it already?
Do I never mention it to laymen to avoid them propagating wrong memes?
Do I seek out and approach p...
Does this help outer alignment?
Goal: tile the universe with niceness, without knowing what niceness is.
Method
We create:
- a bunch of formulations of what niceness is.
- a tiling AI, that given some description of niceness, tiles the universe with it.
- a forecasting AI, that given a formulation of niceness, a description of the tiling AI, a description of the universe and some coordinates in the universe, generates a prediction of what the part of the universe at the coordinates looks like after the tiling AI has tiled it with the formulation of niceness.
Foll... (read more)