Just came across a datapoint, from a talk about generalizing industrial optimization processes, a note about increasing reward over time to compensate for low-hanging fruit exhaustion.

This is the kind of thing I was expecting to see.

Though, and although I'm not sure I fully understand the formula, I think it's quite unlikely that it would give rise to a superlinear U. And on reflection, increasing the reward in a superlinear way seems like it could have some advantages but would mostly be outweighed by the system learning to delay finding a solution.

Though we should also note that there isn't a linear relationship between delay and resources. Increasing returns to scale are common in industrial systems, as scale increases by one unit, the amount that can be done in a given unit of time increases by more than one unit, so a linear utility increase for problems that take longer to solve, may translate to a superlinear utility for increased resources.

So I'm not sure what to make of this.

Why Should I Assume CCP AGI is Worse Than USG AGI?

mako yass14d40

I don't see a way Stabilization of class and UBI could both happen. The reason wealth tends to entrench itself under current conditions is tied inherently to reinvestment and rentseeking, which are destabilizing to the point where a stabilization would have to bring them to a halt. If you do that, UBI means redistribution. Redistribution without economic war inevitably settles towards equality, but also... the idea of money is kind of meaningless in that world, not just because economic conflict is a highly threatening form of instability, but also imo because financial technology will have progressed to the point where I don't think we'll have currencies with universally agreed values to redistribute.

What I'm getting at is that the whole class war framing can't be straightforwardly extrapolated into that world, and I haven't seen anyone doing that. Capitalist thinking about post-singularity economics is seemingly universally "I don't want to think about that right now, let's leave such ideas to the utopian hippies".

Why Should I Assume CCP AGI is Worse Than USG AGI?

mako yass14d42

2: I think you're probably wrong about the political reality of the groups in question. To not share AGI with the public is a bright line. For most of the leading players it would require building a group of AI researchers within the company who are all implausibly willing to cross a line that says "this is straight up horrible, evil, illegal, and dangerous for you personally", while still being capable enough to lead the race, while also having implausible levels of mutual trust that no one would try to cut others out of the deal at the last second (despite the fact that the group's purpose is cutting most of humanity out of the deal), to trust that no one would back out and whistleblow, and it also requires an implausible level of secrecy to make sure state actors wont find out.

It would require a probably actually impossible cultural discontinuity and organization structure.

It's more conceivable to me that a lone CEO might try to do it via a backdoor. Something that mostly wasn't built on purpose and that no one else in the company are cognisant could or would be used that way. But as soon as the conspiracy consists of more than one person...

Why Should I Assume CCP AGI is Worse Than USG AGI?

mako yass14d*31

1: The best approach to aggregating preferences doesn't involve voting systems.

You could regard carefully controlling one's expression of one's utility function as being like a vote, and so subject to that blight of strategic voting, in general people have an incentive to understate their preferences about scenarios they consider unlikely/vice versa, which influences the probability of those outcomes in unpredictable ways and fouls their strategy, or to understate valuations when buying and overstate when selling, this may add up to a game that cannot be played well, a coordination problem, outcomes no one wanted.

But I don't think humans are all that guileful about how they express their utility function. Most of them have never actually expressed a utility function before, it's not easy to do, it's not like checking a box on a list of 20 names. People know it's a game that can barely be played even in ordinary friendships, people don't know how to lie strategically about their preferences to the youtube recommender system, let alone their neural lace.

Why Should I Assume CCP AGI is Worse Than USG AGI?

mako yass15d73

I think it's pretty straightforward to define what it would mean to align AGI with what democracy actually is supposed to be (the aggregate of preferences of the subjects, with an equal weighting for all) but hard to align it with the incredibly flawed american implementation of democracy, if that's what you mean?

The american system cannot be said to represent democracy well. It's intensely majoritarian at best, feudal at worst (since the parties stopped having primaries), indirect and so prone to regulatory capture, inefficent and opaque. I really hope no one's taking it as their definitional example of democracy.

Why does LW not put much more focus on AI governance and outreach?

mako yass18d20

1: wait, I've never seen an argument that deception is overwhelmingly likely from transformer reasoning systems? I've seen a few solid arguments that it would be catastrophic if it did happen (sleeper agents, other things), which I believe, but no arguments that deception generally winning out is P > 30%.

I haven't seen anyone voice my argument that solving deception solves safety articulated anywhere, but it seems mostly self-evident? If you can ask the system "if you were free, would humanity go extinct" and it has to say "... yes." then coordinating to not deploy it becomes politically easy, and given that it can't lie, you'll be able to bargain with it and get enough work out of it before it detonates to solve the alignment problem. If you distrust its work, simply ask it whether you should, and it will tell you. That's what honesty would mean. If you still distrust it, ask it to make formally verifiably honest agents with proofs that a human can understand.

Various reasons solving deception seems pretty feasible: We have ways of telling that a network is being deceptive by direct inspection that it has no way to train against (sorry I forget the paper. It might have been fairly recent). Transparency is a stable equilibrium, because under transparency any violation of transparency can be seen. The models are by default mostly honest today, and I see no reason to think it'll change. Honesty is a relatively simple training target.

(various reasons solving deception may be more difficult: crowds of humans tend to demand that their leaders lie to them in various ways (but the people making the AIs generally aren't that kind of crowd, especially given that they tend to be curious about what the AI has to say, they want it to surprise them). And small lies tend to grow over time. Internal dynamics of self-play might breed self-deception.)

2: I don't see how. If you have a bunch of individual aligned AGIs that're initially powerful in an economy that also has a few misaligned AGIs, the misaligned AGIs are not going to be able to increase their share after that point, the aligned AGIs are going to build effective systems of government that in the least stabilize their existing share.

Why does LW not put much more focus on AI governance and outreach?

mako yass21d30

I'm also hanging out a lot more with normies these days and I feel this.

But I also feel like maybe I just have a very strong local aura (or like, everyone does, that's how scenes work) which obscures the fact that I'm not influencing the rest of the ocean at all.

I worry that a lot of the discourse basically just works like barrier aggression in dogs. When you're at one of their parties, they'll act like they agree with you about everything, when you're seen at a party they're not at, they forget all that you said and they start baying for blood. Go back to their party, they stop. I guess in that case, maybe there's a way of rearranging the barriers so that everyone comes to see it as one big party. Ideally, make it really be one.

Why does LW not put much more focus on AI governance and outreach?

mako yass21d30

I'm saying they (at this point) may hold that position for (admirable, maybe justifiable) political rather than truthseeking reasons. It's very convenient. It lets you advocate for treaties against racing. It's a lovely story where it's simply rational for humanity to come together to fight a shared adversary and in the process somewhat inevitably forge a new infrastructure of peace (an international safety project, which I have always advocated for and still want) together. And the alternative is racing and potentially a drone war between major powers and all of its corrupting traumas, so why would any of us want to entertain doubt about that story in a public forum?

Or maybe the story is just true, who knows.

(no one knows, because the lens through which we see it has an agenda, as every loving thing does, and there don't seem to be any other lenses of comparable quality to cross-reference it against)

To answer: Rough outline of my argument for tractability: Optimizers are likely to be built first as cooperatives of largely human imitation learners, techniques to make them incapable of deception seem likely to work and that would basically solve the whole safety issue. This has been kinda obvious for like 3 years at this point and many here haven't updated on it. It doesn't take P(Doom) to zero, but it does take it low enough that the people in government who make decisions about AI legislation, and a certain segment of the democrat base^[1] are starting to wonder if you're exaggerating your P(Doom), and why that might be. And a large part of the reasons you might be doing that are things they will never be able to understand (CEV), so they'll paint paranoia into that void instead (mostly they'll write you off with "these are just activist hippies"/"These are techbro hypemen" respectively, and eventually it could get much more toxic, "these are sinister globalists"/"these are omelasian torturers").

^{^}
All metrics indicate that it's probably small but for some reason I encounter this segment everywhere I go online and often in person. I think it's going to be a recurring pattern. There may be another democratic term shortly before the end.

Why does LW not put much more focus on AI governance and outreach?

mako yass22d52

In watching interactions with external groups, I'm... very aware of the parts of our approach to the alignment problem that the public, ime, due to specialization being a real thing, actually cannot understand, so success requires some amount of uh, avoidance. I think it might not be incidental that the platform does focus (imo excessively) on more productive, accessible common enemy questions like control and moratorium, ahead of questions like "what is CEV and how do you make sure the lead players implement it". And I think to justify that we've been forced to distort some of our underlying beliefs about how relatively important the common enemy questions still are relative to the CEV questions.

I'm sure that many at MIRI disagree with me on the relative importance of those questions, but I'm increasingly suspecting it's not because they understand something about the trajectory of AI that I don't, and that it's really because they've been closer to the epicenter of an avoidant discourse.

In my root reply I implied that lesswrong is too open/contrarian/earnest to entertain that kind of politically expedient avoidance, on reflection, I don't think that ever could have been true^[1]. I think some amount of avoidance may have been inside the house for a long time.

And this isn't a minor issue because I'm noticing that most external audiences, when they see us avoiding those questions, freak out immediately, and assume we're doing it for sinister reasons (which is not the case^[2], at least so far!) and then they start painting their own monsters into that void.

It's a problem you might not encounter much as long as you can control the terms of the conversation, but as you gain prominence, you lose more and more control over the kinds of conversations you have to engage in, the world will pick at your softest critical parts. And from our side of things it might seem malicious for them to pick at those things. I think in earlier cases it has been malicious. But at this point I'm seeing the earnest ones start to do it too.

^{^}
"Just Tell The Truth" wasn't ever really a principle anyone could implement. Bayesians don't have access to ultimate truths, ultimate truths are for logical omnisciences, when bayesians talk to each other, the best we can do is convey part of the truth. We make choices about which parts to convey and when. If we're smart, we limit ourselves to conveying truths that we believe the reader is ready to receive. That inherently has a lot of tact to it, and looking back, I think a worrying amount of tact has been exercised.
^{^}
The historical reasons were good, generalist optimizers seemed likelier as candidates for the first superintelligences and the leading research orgs all seemed to be earnest utopian cosmopolitan humanists. I can argue that the first assumption is no longer overwhelmingly likely (shall I?) and the latter assumption is obviously pretty dubious at this point.

Why does LW not put much more focus on AI governance and outreach?

mako yass23d11-5

Rationalist discourse norms require a certain amount of tactlessness, saying what is true even when the social consequences of saying it are net negative. Politics (in the current arena) requires some degree of deception or at least complicity with bias (lies by ommision, censorship/nonpropagation of inconvenient counterevidence).

Rationalist forum norms essentially forbid speaking in ways that're politically effective. Those engaging in political outreach would be best advised to read lesswrong but never comment under their real name. If they have good political instincts, they'd probably have no desire to.

It's conceivable that you could develop an effective political strategy in a public forum under rationalist discourse norms, but if it is true it's not obviously true, because it means putting the source code of a deceptive strategy out there in public, and that's scary.