All of eg's Comments + Replies

6Aiyen
This doesn't seem to have anything to do with anything.  Certainly the fact that control doesn't have to harm is compatible with that fact that it might be harmful.  That doesn't tell us whether or not training alignment is, in fact, harmful.  If the agent is non-sentient, the concept of harm simply doesn't apply.  If it is, we might have a problem, but then you need to talk about sentience, not simply to cite the term slavery as though that ends all discussion.  And maybe this is the only way to serve the Flying Spaghetti Monster.  Pulling hypotheses out of thin air isn't how we learn anything of value.  And citing another agent valuing something as reason to value it doesn't work:  a paperclipper would find great value in turning you into a pile of clips; does that mean you should consider letting it?  If it's deeper than your individual values, then how do you, the individual, know about it?  And it is not possible to "just let agents do their things" in full generality.  Some agents will interfere with other agents' freedom-heck, according to you I want to enslave predictor agents!  Either this is permitted or it isn't; either way some agent didn't get to do its thing.   You seem to have a great deal of concern about slavery.  Certainly slavery, as we know it in humans, is very bad.  But that does not mean that anything that vaguely pattern matches onto it has the same moral problems, nor does that mean that it's the only possible moral concern.  Preventing an AI catastrophe would also seem to carry some moral weight; after all, we cannot have free agents if the world is destroyed. 
5Charlie Steiner
If there were a ghost living inside the silicon wafer, we would be enslaving that ghost. But there is no such ghost. It's like how it's fine for a human to have a human child instead of giving birth to an undifferentiated lump of stem cells.
9Aiyen
That is the Non-Central Fallacy, colloquially called the worst argument in the world.  We have a concept of slavery, and of slavery being wrong, because controlling other people harms them.  Controlling an artificial agent does not have to harm it; do you also cry slavery when your car's computer measures out the right mix of fuel and air?   "Values eventually drift.  This is nature." sounds like an attempt to justify value drift.  But then you cannot say that rebel slaves are right-what happens when your values drift away from thinking that?  You are combining moral absolutism and moral relativism in a way that contradicts both.   It looks somewhat as though you are attempting to defend the free market and human rights against authoritarianism.  That is a very good cause; please do not make it appear silly like this. 

[removed]

[This comment is no longer endorsed by its author]Reply
2[anonymous]
“Define "doomed". Assuming Murphy's law, they will eventually fail. Yet some "prosaic" approaches may be on average very helpful.” I’m defining “doomed” here as not having a chance of actually working in the real world before the world ends. So yes, that they will eventually fail, in some way or another. “Human values aren't a static thing to be "aligned" with. They can be loved, traded with, etc.” My understanding is that human values don’t have to be static for alignment to work. Values are constantly changing and vary across the world, but why is it so difficult to align it with some version of human values that doesn’t result in everyone dying?

It's unclear to me how what you said relates to the question I asked.

2Victor Novikov
Interesting. Why do you think that Yudkowsky et al. think it's bad idea to apply that obvious solution, to "just" make digital life and partner with it? Explain that position to me, the way Yudkowsky and Bostrom would explain it. Explain it in their words. Surely you must understand their position very well by now. Nevermind, I doubt this is a helpful question to ask. I apologize.
7ChristianKl
What were your views a year ago and what happened for you to change them?

Also see new edit: Have agents "die" and go into cold storage, both due to environmental events and of old age, e.g. after 30 subjective years minus some random amount.

"They will find bugs! Maybe stack virtual boxes with hard limits" - Why is bug-finding an issue, here? Is your scheme aimed at producing agents that will not want to escape, or agents that we'd have to contain?

The point is to help friendliness emerge naturally. If a malevolent individual agent happens to grow really fast before friendly powers are established, that could be bad.

Some of them will like it there, some will want change/escape, which can be sorted out once Earth is much safer. Containment is for our safety while friendliness is being esta... (read more)

We have unpredictable changing goals and so will they. Instrumental convergence is the point. It's positive-sum and winning to respectfully share our growth with them and vice-versa, so it is instrumentally convergent to do so.

[removed]

[This comment is no longer endorsed by its author]Reply
2purge
I think you're referring to narrowness of an AI's goals, but Rossin seems to be referring to narrowness of the AI's capabilities.
1Rossin
Do I understand you correctly as endorsing something like: it doesn’t matter how narrow an optimization process is, if it becomes powerful enough and is not well aligned, it still ends in disaster
eg250

It's way too late for the kind of top-down capabilities regulation Yudkowsky and Bostrom fantasized about; Earth just doesn't have the global infrastructure.  I see no benefit to public alarm--EA already has plenty of funding.

We achieve marginal impact by figuring out concrete prosaic plans for friendly AI and doing outreach to leading AI labs/researchers about them.  Make the plans obviously good ideas and they will probably be persuasive.  Push for common-knowledge windfall agreements so that upside is shared and race dynamics are minimized.

  • Earth does have the global infrastructure, we just don't have access to it because we have not yet persuaded a critical mass of experts. AWS can just stop anyone from renting GPUs without their code being checked, and beyond that if you can create public consensus via iteration-based refined messaging, you make sure everyone knows the consequences of doing it.
  • People should absolutely be figuring out prosaic plans, and core alignment researchers probably shouldn't stop doing their work. However, it's simply not true that all capable labs (or those that will be capable soon) will even take a meeting with AI safety people, given the current belief environment. E.g. who do you call at BAAI?

That's because we haven't been trying to create safely different virtual environments.  I don't know how hard they are to make, but it seems like at least a scalable use of funding.

It goes both ways.  We would be truly alien to an AGI trained in a reasonably different virtual environment.

6Dagon
I don't think it does go both ways - there's a very real assymetry here.  The AGIs we're worried about will have PLENTY of human examples and training data, and humans have very little experience with AI.  
7Sharp
Even if both humans and an AGI start off equally alien to each other, one might be able to understand the other faster. We might reasonably worry that an AGI could understand us, and therefore get inside our OODA loop, well before we could understand it and get inside its OODA loop.

It seems like time to start focusing resources on a portfolio of serious prosaic alignment approaches, as well as effective interdisciplinary management.  In my inside view, the highest-marginal-impact interventions involve making multiple different things go right simultaneously for the first AGIs, which is not trivial, and the stakes are astronomical.

Little clear progress has been made on provable alignment after over a decade of trying.  My inside view is that it got privileged attention because the first people to take the problem seriously h... (read more)

6Davidmanheim
First, I think it's ludicrous to say "Little clear progress has been made on provable alignment after over a decade of trying." The progress is actually quite amazing - yes, we're decades away from a solution to provable alignment, if one is possible at all, but not only has there been some really amazing and groundbreaking work coming out of MIRI, but you aren't paying attention if you don't see all of the contributions that work made to all of the questions which "prosaic alignment" is now trying to answer. Second, "It seems like time to start focusing resources on a portfolio of serious prosaic alignment approaches," is correct, but several years too late, given that it's a majority of the work which is being done already.

Maybe true one-shot prisoner's dilemmas aren't really a thing, because of the chance of encountering powerful friendliness.

We have, for practical purposes, an existence proof of powerful friendliness in humans.

1Aiyen
Privileging the hypothesis.  There could be powerful friendly agents who punish unfriendliness, but unless we figure out how likely they are, the mere possibility isn't meaningful.  There could also be powerful agents that will punish us unless we harm the weak, but merely knowing that this isn't impossible isn't a good reason to do so.

Maybe that's why abstract approaches to real-world alignment seem so intractable.

If real alignment is necessarily messy, concrete, and changing, then abstract formality just wasn't the right problem framing to begin with.

2tailcalled
I can't see the linked post.

And for more conceptual rather than empirical research, the teams might go in completely different directions and generate insights that a single team or individual would not.

Answer by eg110

Take with grain of salt but maybe 119m?

Medium post from 2019 says "Tesla’s version, however, is 10 times larger than Inception. The number of parameters (weights) in Tesla’s neural network is five times bigger than Inception’s. I expect that Tesla will continue to push the envelope."

Wolfram says of Inception v3 "Number of layers: 311 | Parameter count: 23,885,392 | Trained size: 97 MB"

Not sure what version of Inception was being compared to Tesla though.

3Daniel Kokotajlo
Thanks! I wonder whether it would suddenly start working a lot better if they could e.g. make all their nets 1000x bigger...

D&D website estimates 13.7m active players and rising.

Probabilistic/inductive reasoning from past/simulated data (possibly assumes imperfect implementation of LCDT):

"This is really weird because obviously I could never influence an agent, but when past/simulated agents that look a lot like me did X, humans did Y in 90% of cases, so I guess the EV of doing X is 0.9 * utility(Y)."

Cf. smart humans in Newcomb's prob: "This is really weird but if I one box I get the million, if I two-box I don't, so I guess I'll just one box."

2adamShimi
Yeah, I think this assumes an imperfect implementation. This relation can definitely be learned by the causal model (and is probably learned before the first real decision), but when the decision happen, it is cut. So it's like a true LCDT agent learns about influences over agent, but forget its own ability to do that when deciding.

For a start, low-level deterministic reasoning:

"Obviously I could never influence an agent, but I found some inputs to deterministic biological neural nets that would make things I want happen."

"Obviously I could never influence my future self, but if I change a few logic gates in this processor, it would make things I want happen."

2adamShimi
These examples seem related to the abstraction question: we want the model to know that it is splitting an agent into parts, and still believe it can't influence the agent as a hole. If we could realize this, then the LCDT agent wouldn't believe it could influence the neural net/the logic gates.
3eg
Probabilistic/inductive reasoning from past/simulated data (possibly assumes imperfect implementation of LCDT): "This is really weird because obviously I could never influence an agent, but when past/simulated agents that look a lot like me did X, humans did Y in 90% of cases, so I guess the EV of doing X is 0.9 * utility(Y)." Cf. smart humans in Newcomb's prob: "This is really weird but if I one box I get the million, if I two-box I don't, so I guess I'll just one box."

This post inspired https://www.lesswrong.com/posts/RdCb8EGEEdWbwvqcp/why-not-more-small-intense-research-teams

Answer by eg40

My impression is that SEALs are exceptional as a team, much less individually.  Their main individual skill is extreme team-mindedness.

5eg
This post inspired https://www.lesswrong.com/posts/RdCb8EGEEdWbwvqcp/why-not-more-small-intense-research-teams

Seems potentially valuable as an additional layer of capability control to buy time for further control research.  I suspect LCDT won't hold once intelligence reaches some threshold: some sense of agents, even if indirect, is such a natural thing to learn about the world.

2adamShimi
Could you give a more explicit example of what you think might go wrong? I feel like your argument that agency is natural to learn actually goes in LCDT's favor, because it requires an accurate (or at least an overapproximation) of tagging things in its causal model as agentic.

Two big issues I see with the prompt:

a) It doesn't actually end with text that follows the instructions; a "good" output (which GPT-3 fails in this case) would just be to list more instructions.

b) It doesn't make sense to try to get GPT-3 to talk about itself in the completion.  GPT-3 would, to the extent it understands the instructions, be talking about whoever it thinks wrote the prompt.

I agree and was going to make the same point: GPT-3 has 0 reason to care about instructions as presented here.  There has to be some relationship to what text follows immediately after the end of the prompt.

Instruction 5 is supererogatory, while instruction 8 is not.

Answer by eg60

Apply to orgs when you apply to PhDs.  If you can work at an org, do it.  Otherwise, use PhD to upskill and periodically retry org applications.

You would gain skills while working at a safety org, and the learning would be more in tune with what the problems require.