Comments

Many comments pointed out that NYT does not in fact have a consistent policy of always revealing people's true names. There's even a news editorial about this which I point out in case you trust the fact-checking of NY Post more.

I think that leaves 3 possible explanations of what happened:

  1. NYT has a general policy of revealing people's true names, which it doesn't consistently apply but ended up applying in this case for no particular reason.
  2. There's an inconsistently applied policy, and Cade Metz's (and/or his editors') dislike of Scott contributed (consciously or subconsciously) to insistence on applying the policy in this particular case.
  3. There is no policy and it was a purely personal decision.

In my view, most rationalists seem to be operating under a reasonable probability distribution over these hypotheses, informed by evidence such as Metz's mention of Charles Murray, lack of a public written policy about revealing real names, and lack of evidence that a private written policy exists.

While reading this, I got a flash-forward of what my life (our lives) may be like in a few years, i.e., desperately trying to understand and evaluate complex philosophical constructs presented to us by superintelligent AI, which may or may not be actually competent at philosophy.

I gave this explanation at the start of the UDT1.1 post:

When describing UDT1 solutions to various sample problems, I've often talked about UDT1 finding the function S* that would optimize its preferences over the world program P, and then return what S* would return, given its input. But in my original description of UDT1, I never explicitly mentioned optimizing S as a whole, but instead specified UDT1 as, upon receiving input X, finding the optimal output Y* for that input, by considering the logical consequences of choosing various possible outputs. I have been implicitly assuming that the former (optimization of the global strategy) would somehow fall out of the latter (optimization of the local action) without having to be explicitly specified, due to how UDT1 takes into account logical correlations between different instances of itself. But recently I found an apparent counter-example to this assumption.

Any thoughts on why, if it's obvious, it's seldomly brought up around here (meaning rationalist/EA/AI safety circles)?

It’s difficult to trade with exponential agents

"Trade" between exponential agents could look like flipping a coin (biased to reflect relative power) and having the loser give all of their resources to the winner. It could also just look like ordinary trade, where each agent specializes in their comparative advantage, to gather resources/power to prepare for "the final trade".

"Trade" between exponential and less convex agents could look like making a bet on the size (or rather, potential resources) of the universe, such that the exponential agent gets a bigger share of big universes in exchange for giving up their share of small universes (similar to my proposed trade between a linear agent and a concave agent).

Maybe the real problem with convex agents is that their expected utilities do not converge, i.e., the probabilities of big universes can't possibly decrease enough with size that their expected utilities sum to finite numbers. (This is also a problem with linear agents, but you can perhaps patch the concept by saying they're linear in UD-weighted resources, similar to UDASSA. Is it also possible/sensible to patch convex agents in this way?)

However, convexity more closely resembles the intensity deltas needed to push reinforcement learning agent to take greater notice of small advances beyond the low-hanging fruit of its earliest findings, to counteract the naturally concave, diminishing returns that natural optimization problems tend to have.

I'm not familiar enough with RL to know how plausible this is. Can you expand on this, or anyone else want to weigh in?

Evil typically refers to an extraordinary immoral behavior, in the vicinity of purposefully inflicting harm to others in order to inflict harm intrinsically, rather than out of indifference, or as a byproduct of instrumental strategies to obtain some other goal.

Ok, I guess we just define/use it differently. I think most people we think of as "evil" probably justify inflicting harm to others as instrumental to some "greater good", or are doing it to gain or maintain power, not because they value it for its own sake. I mean if someone killed thousands of people in order to maintain their grip on power, I think we'd call them "evil" and not just "selfish"?

I just think it’s not clear it’s actually true that humans get more altruistic as they get richer.

I'm pretty sure that billionaires consume much less as percent of their income, compared to the average person. EA funding comes disproportionately from billionaires, AFAIK. I personally spend a lot more time/effort on altruistic causes, compared to if I was poorer. (Not donating much though for a number of reasons.)

For example, is it the case that selfish consumer preferences have gotten weaker in the modern world, compared to centuries ago when humans were much poorer on a per capita basis?

I'm thinking that we just haven't reached that inflection point yet, where most people run out of things to spend selfishly on (like many billionaires have, and like I have to a lesser extent). As I mentioned in my reply to your post, a large part of my view comes from not being able to imagine what people would spend selfishly on, if each person "owned" something like a significant fraction of a solar system. Why couldn't 99% of their selfish desires be met with <1% of their resources? If you had a plausible story you could tell about this, that would probably change my mind a lot. One thing I do worry about is status symbols / positional goods. I tend to view that as a separate issue from "selfish consumption" but maybe you don't?

Are humans fundamentally good or evil? (By "evil" I mean something like "willing to inflict large amounts of harm/suffering on others in pursuit of one's own interests/goals (in a way that can't be plausibly justified as justice or the like)" and by "good" I mean "most people won't do that because they terminally care about others".) People say "power corrupts", but why isn't "power reveals" equally or more true? Looking at some relevant history (people thinking Mao Zedong was sincerely idealistic in his youth, early Chinese Communist Party looked genuine about wanting to learn democracy and freedom from the West, subsequent massive abuses of power by Mao/CCP lasting to today), it's hard to escape the conclusion that altruism is merely a mask that evolution made humans wear in a context-dependent way, to be discarded when opportune (e.g., when one has secured enough power that altruism is no longer very useful).

After writing the above, I was reminded of @Matthew Barnett's AI alignment shouldn’t be conflated with AI moral achievement, which is perhaps the closest previous discussion around here. (Also related are my previous writings about "human safety" although they still used the "power corrupts" framing.) Comparing my current message to his, he talks about "selfishness" and explicitly disclaims, "most humans are not evil" (why did he say this?), and focuses on everyday (e.g. consumer) behavior instead of what "power reveals".

At the time, I replied to him, "I think I’m less worried than you about “selfishness” in particular and more worried about moral/philosophical/strategic errors in general." I guess I wasn't as worried because it seemed like humans are altruistic enough, and their selfish everyday desires limited enough that as they got richer and more powerful, their altruistic values would have more and more influence. In the few months since then, I've became more worried, perhaps due to learning more about Chinese history and politics...

I hear that there is an apparent paradox which economists have studied: If free markets are so great, why is it that the most successful corporations/businesses/etc. are top-down hierarchical planned economies internally?

Yeah, economists study this under the name "theory of the firm", dating back to a 1937 paper by Ronald Coase. (I see that jmh also mentioned this in his reply.) I remember liking Coase's "transaction cost" solution to this puzzle or paradox when I learned it, and it (and related ideas like "asymmetric information") has informed my views ever since (for example in AGI will drastically increase economies of scale).

Corporations grow bit by bit, by people hiring other people to do stuff for them.

I think this can't be a large part of the solution, because if market exchanges were more efficient (on the margin), people would learn to outsource more, or would be out-competed by others who were willing to delegate more to markets instead of underlings. In the long run, Coase's explanation that sizes of firms are driven by a tradeoff between internal and external transaction costs seemingly has to dominate.

Reading A Fire Upon the Deep was literally life-changing for me. How many Everett branches had someone like Vernor Vinge to draw people's attention to the possibility of a technological Singularity with such skillful writing, and to exhort us, at such an early date, to think about how to approach it strategically on a societal level or affect it positively on an individual level. Alas, the world has largely squandered the opportunity he gave us, and is rapidly approaching the Singularity with little forethought or preparation. I don't know which I feel sadder about, what this implies about our world and others, or the news of his passing.

In other words, I consider this counter-argument to be based on a linguistic ambiguity rather than replying to what I actually meant, and I’ll try to use more concrete language in the future to clarify what I’m talking about.

If I try to interpret "Current AIs are not able to “merge” with each other." with your clarified meaning in mind, I think I still want to argue with it, i.e., why is this meaningful evidence for how easy value handshakes will be for future agentic AIs.

In the absence of a solution to a hypothetical problem X (which we do not even know whether it will happen), it is better to try to become more intelligent to solve it.

But it matters how we get more intelligent. For example if I had to choose now, I'd want to increase the intelligence of biological humans (as I previously suggested) while holding off on AI. I want more time in part for people to think through the problem of which method of gaining intelligence is safest, in part for us to execute that method safely without undue time pressure.

If the alleged “problem” is that there might be a centralized agent in the future that can dominate the entire world, I’d intuitively reason that installing vast centralized regulatory controls over the entire world to pause AI is plausibly not actually helping to decentralize power in the way we’d prefer.

I wouldn't describe "the problem" that way, because in my mind there's roughly equal chance that the future will turn out badly after proceeding in a decentralized way (see 13-25 in The Main Sources of AI Risk? for some ideas of how) and it turns out instituting some kind of Singleton is the only way or one of the best ways to prevent that bad outcome.

Load More