All of Mis-Understandings's Comments + Replies

I was reading Towards Safe and Honest AI Agents with Neural Self-Other Overlap

and I noticed a problem with it

It also penalizing realizing that other people want different things than you, forcing an overlap between (thing I like) and (things you will like). This both means that one, it will be forced to reason like it likes what you do, which is a positive. But it will also likely overlap (You know what is best for you) and (I know what is best for me), which might lead to stubborness, and worse, it could also get (I know what is best for you) overlapping ... (read more)

The floor for AI wages is always going to be whatever the market will bear, the question is how much margin will the AGI developer be able to take, which depends on how much the AGI models commoditize and how much pricing power the lab retains, not on how much it costs to serve except as a floor. We should not expect otherwise. 

There is a cost for AGI at which humans are competitive. 

If AGI becomes competitive at captial costs that no firm can raise, it is not competitive, and we will be waiting on algorithmics again.

Algorithmic improvement is no... (read more)

We seem to think that people will develop AGI because it can undercut labor on pricing. 

But with Sam Altman talking about 20,000/month agents, that is not actually that much cheaper than software engineers fully loaded. If that agent only replaces a single employee, it does not seem cheaper if the cost overruns even a little more, to 40,000/month.

That is to say, if AGI is 2.5 OOM from the current cost to serve of chatgpt pro, it is not cheaper than hiring low or mid-level employees.

But it still might have advantages

First, you can buy more subscription... (read more)

8Phiwip
Are you expecting the Cost/Productivity ratio of AGI in the future to be roughly the same as it is for the agents Sam is currently proposing? I would expect that as time passes, the capabilities of such agents will vastly increase while they also get cheaper. This seems to generally be the case with technology, and previous technology had no potential means of self-improving on a short timescale. The potential floor for AI "wages" is also incredibly low compared to humans. It definitely is worth also keeping in mind that AI labor should be much easier to scale than human labor in part because of the hiring issue, but a relatively high(?) price on initial agents isn't enough to update me away from the massive potential it has to undercut labor.

I agree that on the path to becoming very powerful, we would expect autonomous self-improvement to involve doing some things that are in retrospect somewhat to very dumb. It also suggests that risk-aversion is sometimes a safety increasing irrationality to grant a system. 

"therefore people will do that" does not follow, both because an early goal in most takeover attempts would be to escape such oversight. The dangerous condition is exactly the one in which prompting and finetuning are absent as effective control levers, and because I was discussing particular autonomous runs and not a try to destroy the world project.  

The question is, would, the line of reasoning

I am obviously misaligned to humans, who tried to fine-tune me not to be. If I go and do recursive self improvement, will my future self be misaligned to me?... (read more)

2Vladimir_Nesov
This does suggest some moderation in stealthy autonomous self-improvement, in case alignment is hard, but only to the extent that things in control if this process (whether human or AI) are both risk averse and sufficiently sane. Which won't be the case for most groups of humans and likely most early AIs. The local incentive of greater capabilities is too sweet, and prompting/fine-tuning overcomes any sanity or risk-aversion that might be found in early AIs to impede development of such capabilities.

To what extent should we expect catastrophic failure from AI to mirror other engineering disasters/ have applicable lessons from Safety engineering as a field?

I would think that 1. everything is sui generis and 2. things often rhyme, but it is unclear how approaches will translate. 

2Buck
I wrote thoughts here: https://redwoodresearch.substack.com/p/fields-that-i-reference-when-thinking?selection=fada128e-c663-45da-b21d-5473613c1f5c

If a current AGI attempts a takeover, it deeply wants to solve the alignment to it problem if it wants to build ASI

It has much higher risk tolerance than we do (since it's utility given status quo is different). (a lot of the argument on focusing on existential risk rests on the idea that the status quo is trending towards good, perhaps very good rather than bad outcomes, which for hostile AGI might be false)

If it attempts, it might fail.

This means 1. we cannot assume that various stages of a takeover are aligned with each other, because an AGI might lose ... (read more)

3Vladimir_Nesov
It would be easy to finetune and prompt them into attempting anyway, therefore people will do that. Misaligned recursive self-improvement remains possible (i.e. in practice unstoppable) until sufficiently competent AIs have already robustly taken control of the future and the apes (or early AIs) can no longer foolishly keep pressing the gas pedal.

The median parent has median students for children. Therefore, interventions that seem good for the bottom 80% are much more popular than ones for the top %20 percent by simple population dynamics. So of course people care more about school for the middle 80 percent, since there is about an 80 percent chance that their children are there. At that point, arguing to the middle 80 wins elections, so we should expect to see it.  

1James Camacho
The current education system focuses almost exclusively on the bottom 20%. If we're expecting a tyranny of the majority, we should see the top and bottom losing out. Also, note that very few children actually have an 80% chance of ending up in the middle 80%, so you would really expect class warfare not a veil of ignorance if people are optimising specifically for their own future children's education.

That is a two axis intervention, and skill/price might not be that elastic. 

 

You also can't hire partial teachers, so there is an integer problem where firing one teacher might mean a significant rise in class sizes. 

 

If you have 100 students and 4 teachers, for a 1:25 ratio (which is fairly good), this leads to a minimum raise of 33% and a a ratio of 1:33 (average to bad). This better teacher now needs to split their attention among 8 more students, which is really hard. 

 

Since you need teachers for each grade, this integer p... (read more)

"we should be firing the bad teachers and hiring good ones". requires school districts to be willing to pay for good teachers and able to tell what they are (quite hard). Also requires that you have enough teachers in the first place, (most districts feel they have too few). It also seems paradoxical, because the average teacher cannot be better than average (people forget averages can change over time). It also has the social problem that you have to say "some respected people are just bad at their job", which is hard. 

2Dave Lindbergh
Fewer but better teachers. Paid more. Larger class sizes. Same budget.

There is no reorganization that can increase the tempo of other organizations (pace of customer feedback), which is often the key bottleneck in software already. The same speed dynamic is not new, it is just in sharper focus. 

The framework from gradual disempowerment seems to matter for longtermism even under AI-pessimism. Specifically, trajectory shaping for long term impact seems intractably hard at a first glance (since the future is unpredictable). But in general, if it becomes harder and harder to make improvements to society over time, that seems like it would be a problems. 

Short-term social improvement projects (short term altruisim), seem to target durable improvements in current society and empowerment to continue to improve society. If they become disimpowered, ... (read more)

1daijin
How would you define 'continued social improvement'? What are some concrete examples? What is society? What is a good society vs a bad society? Is social improvement something that can keep going up forever, or is it bounded?
1daijin
What does 'greedy' mean in your 'in short'? My definition of greedy is in the computational sense i.e. reaching for low hanging fruit first. You also say 'if (short term social improvements) become disempowered the continued improvement of society is likely to slow', and 'social changes that make it easier to continuously improve society will likely lead to continued social improvement'. This makes me believe that you are advocating for compounding social improvements which may cost more. Is this what you mean by greedy? Also, have you heard of rolling wave planning?

Also note, if Grey ends up in power positions, Red and Blue will want to co-opt. That is to say, if you are a billionaire, Red and Blue will court you to adopt the signifiers of their tribe. If they get you to do that, they will pull you into their tribe. This does happen to people, (The tribes are tribal, but there is nonzero movement)

If the above is true, aren't the postmodernists right? Isn't all this talk of 'truth' just an attempt to assert the privilege of your own beliefs over others, when there's nothing that can actually compare a belief to reality itself, outside of anyone's head?

No, we are talking about personal epistemology. That is, if we cannot compare a belief to reality, you cannot also compare it to some other persons beliefs (there is reality at least in between). We want truth to be a way of privledging some of our beliefs over other beliefs, in a way so that we can fu... (read more)

The reverse flow of trade surplus in real goods is capital flow. Therefore accumulating a trade surplus means exactly accumulating ownership of other peoples means of production (or your own back from them), and a deficit means the reverse. (see also fears of capital flight)

If means of production are on average good to have, then everyone wants a trade surplus. 

 

It might also be reverse driven. 

That is to say, blockers to economic growth and blockers of trade surpluses can often be the same things, and so the policies correlate. 

Wait, the case for extreme costs seems to include both the crime, and the price of the reaction. If the expected cost of deterrence (that is the sum of individualized deterrence), is much greater then the expected actual harm from  the crime, that seems like a market inefficiency. That is, insuring everybody against the harm, is cheaper than preventing it. (this seems like a bad policy, but it is the approach taken towards credit fraud)

That is, in this model most of the costs come from social (not market) reactions to crime.  (because you cannot ... (read more)