That is, while it was bad for the people who didn't get rule of law, they were a separate enough category that this mostly didn't "leak into" undermining the legal mechanisms that helped their societies become productive and functional in the first place.
I'm speaking speculatively here, but I don't know that it didn't leak out and undermine the mechanism that supported productive and functional societies. The sophisticated SJW in me suggests that this is part of what caused the eventual (though not yet complete) erosion of those mechanisms.
It seems like if you have "rule of law" that isn't evenly distributed, actually what you have is collusion by one class of people to maintain a set of privileges at the expense of another class of people, where one of the privileges is a sand-boxed set of norms that govern dealings within the privileged class, but with the pretense that the norms are universal.
This kind of pretense seems like it could be corrosive: people can see that the norms that society proclaims as universal actually aren't. This reinforces a a sense that the norms aren't real at all (or at least) a justified sense that the ideals that underly those norms are mostly rationalizations papering over the collusion of the privileged class.
eg when it looks like "capitalism" and "democracy" are scams supporting "white supremacy", you grow disenchanted with capitalism and democracy, and stop doing the work to maintain the incomplete versions of those social mechanisms that were previously doing work in your society?
No matter how weird the answers are, don’t correct them.
Love it.
As we argued for at the time, training on a purely predictive loss should, even in the limit, give you a predictor, not an agent—and we’ve now seen this stay true even through substantial scaling (though there is still some chance this will break at some point).
Is there anyone who significantly disputes this?
I'm not trying to ask a rhetorical question ala "everyone already thinks this, this isn't an update". I'm trying to ascertain if there's a consensus on this point.
I've understood Eliezer to sometimes assert something like "if you optimize a system for sufficiently good predictive power, a consequentialist agent will fall out, because an agent is actually the best solution to a broad range of prediction tasks."
[Though I want to emphasize that that's my summary, which he might not endorse.]
Does anyone still think that or something like that?
There are some cases where transcripts can get long and complex enough that model assistance is really useful for quickly and easily understanding them and finding issues, but not because the model is doing something that is fundamentally beyond our ability to oversee, just because it’s doing a lot of stuff.
The David Deutsch-inspired voice in me posits that this will always be the problem. There's nothing that an AI could think or think about that humans couldn't understand in principle, and so all the problems of overseeing something smarter than us are ultimately problems of the AI doing a lot of cognitive that takes more effort and time for the humans to follow.
Not that this makes the problem less scary, for being just a matter of massive quantitative differences, rather than a qualitative difference.
I think I no longer buy this comment of mine from almost 3 years ago. Or rather I think it's pointing at a real thing, but I think it's slipping in some connotations that I don't buy.
What I expect to see is agents that have a portfolio of different drives and goals, some of which are more like consequentialist objectives (eg "I want to make the number in this bank account go up") and some of which are more like deontological injunctions ("always check with my user/ owner before I make a big purchase or take a 'creative' action, one that is outside of my training distribution").
My prediction is that the consequentialist parts of the agent will basically route around any deontological constraints that are trained in.
For instance, the your personal assistant AI does ask your permission before it does anything creative, but also, it's superintelligently persuasive and so it always asks your permission in exactly the way that will result in it accomplishing what it wants. If there are a thousand action sequences in which it asks for permission, it picks the one that has the highest expected value with regard to whatever it wants. This basically nullifies the safety benefit of any deontological injunction, unless there are some injunctions that can't be gamed in this way.
To do better than this, it seems like you do have to solve the Agent Foundations problem of corrigibility (getting the agent to be sincerely indifferent between your telling it to take the action or not take the action) or you have to train in, not a deontological injunction, but an active consequentialist goal of serving the interests of the human (which means you have find a way to get the agent to be serving some correct enough idealization of human values).
This view seems to put forward that all the deontological constraints of an agent must be "dumb" static rules, because anything that isn't a dumb static rule will be dangerous maximizer-y consequentialist cognition.
I don’t buy this dichotomy, in principle. There’s space in between these two poles.
An agent can have deontology that recruits the intelligence of the agent, so that when it thinks up new strategies for accomplishing some goal that it has it intelligently evaluates whether that strategy is violating the spirit of the deontology.
I think this can be true, at least around human levels of capability, without that deontology being a maximizer-y goal in of itself. Humans can have a commitment to honesty without becoming personal-honesty maximizers that steer the world to extreme maxima of their own honesty. (Though a commitment to honesty does, for humans, in practice, entail some amount of steering into conditions that are supportive of honesty.)
However, that’s not to say that something like this can never be an issue. I can see three potential problems.
For what it's worth, Inkhaven seems awesome—among the best things that Lightcone has done recently, I think. I regret that I'm not participating.
I think it's not uncommon for people to call things they don't like "religions", as a way to tacitly assert that the followers of some movement or idea are dogmatic without directly claiming it. The stronger version is calling an idea or an idiology "a cult".
See this nicely collected list of examples courtesy of Scott, in an essay that aderesses the topic:
On the last Links thread, Eric Raymond claims that environmentalism is a religion. It has “sins” like wasting energy and driving gas-guzzling SUVs. It has “taboos” like genetically modified foods. It has an “apocalypse” in the form of global warming. It even has “rituals” in the form of weekly recycling.
This reminds me of an article I read recently claiming that transhumanism is a religion. But also of the article claiming that social justice is a religion. Also, liberalism is a religion. And conservativism is a religion. Libertarianism is a religion. Communism is a religion. Capitalism is like a religion. Objectivism is a religion. An anthropologist “confirms” that Apple is a religion. But UNIX is also a religion (apparently Linux was the Protestant Reformation).
I claim that I am unusually Good (people who know me well would agree—many of them have said as much, unprompted). This is not how it works for me.
It's also plausible to me that I am more coming at this from a deontological feeling of "One should not kill everyone if one has a good reason" rather than "The world is net positive".
I agree that these are importantly different, and easily conflated!
I mean without doing the experiment it's hard to know if writing every day is causal or not. It seems totally plausible that it's a habit one has to build that becomes easier over time, and a person who builds that habit ends up having more shots on goal and so ends up writing more good stuff and building and audience, which builds the self-sustaining loop.