Mis-Understandings

So I ran into this

https://www.youtube.com/watch?v=AF3XJT9YKpM

And I noticed a lot of talk about error taxonomy.

Which seems like an important idea in general, but especially in interpretability for safety.

Specifically, error taxonomy is a subset of action by consequence taxonomy, which is the main goal of interpretability for safety (as it allows us to act on the fact that the model will take actions with bad consequences).

Misrepresentation as a Barrier for Interp (Part I)

Mis-Understandings4d3-2

I think this is missing the mechanics of interpretability. Interpretability is about the opposite, "what it does"

So basically, interpretability only cares about mixed features (malfunction where the thing is not as labeled) only insofar as the feature does not only do the thing that the label would make us think that it does.

That is to say, in addition to labeling representational parts of the model, interp wants to know the relation between those parts

So we know ultimatley enough about what the model will do to either debug as capabilities research, or prove that it will not try to do x, y, and z that will kill us for safety.

Basically, for an alignment researcher the polysemantics that come from being wrong sometimes, if the wrongness really is in the model, so produces the same actions, that is basically okay.

Even just plain polysemantics is not the end of the world for you interp tools, because there is not one "right" semantics. You just want to span the model behaviour.

O O's Shortform

Mis-Understandings6d10

Global GDP growth over the same period was around 3 percent.

The question is how did equities outperform gdp growth.

I think that this has to do with changes in asset prices in general.

Vladimir_Nesov's Shortform

Mis-Understandings12d105

o3 has a different base model (presumably).

All of the figures are base model equivalated between RL and not

I would expect "this paper doesn't have the actual optimal methods" is true, this is specifically a test for PPO for in distribution actions. Concretely, there is a potential story here about PPO reinforces traces that hit in self-play, consequently, there is a sense which we would expect it to only select previously on policy actions.

But if one has enough money, you can finetune GPT models, and test that.

Also note that 10k submissions is about 2 OOM out of distribution for the charts in the paper.

Pass at inf k includes every path with nonzero probability (if there is a policy of discarding exact repeat paths).

We know that RL decreases model entropy, so the first k passes will be more different for a high variance model.

Pass at k is take best, so for normal distribution take best has EV mean+variance*log(samples).

At very large K, we would expect variance to matter more than mean.

How to end credentialism

Mis-Understandings13d50

Noone cared.

You don't know what questions they did not ask you, and the assumptions of shared cultural background that they made because they saw that. They would not tell you. (unless you have comparisons to job searching before getting the degree).

Fundamentally, this is the expected phenomenology, since people do not tend to notice sources of your own status.

How to end credentialism

Mis-Understandings13d52

Credentialism is good because the limiting factor on employment is trust, not talent for most credential requiring positions (white collar, buisness and engineering work).

Universities are bad at teaching skills, but generate trust and social capital.

Trust that allows the system to underwrite new white collar workers to do things that might lose buisnesses lots of money is important and expensive.

Consequently you get credential requirements, because there is no test other than years of being in social systems that can tell you that a person has the ability to go 4 years without crashing out (which is the key skill).

Additionally, going to university has become a class signifier, and all classes wish they were bigger and more prominent.

The alternative to credentialism is selection, or real meritocracy.

The alternative to credentialism is not selection, it is hiring your buddies, hiring by visible factors, and hiring randomly. Most business are not that guy that they can run a competitive selective process (THOSE ARE REALLY EXPENSIVE).

"universities provide to employers is the ability to confirm you are clever, driven, and have relevant skills" is false. They provide that you are a member of the professional class that is not going to do stupid things that lose money/generate risk.

Fundamentally, this misunderstands the purpose of the degree to the hiring bureaucracy, and the political economy behind it.

Mis-Understandings's Shortform

Mis-Understandings14d21

In short, it seems like the current system unfairly kills drugs that take a long time to develop and do not have a patentable change in the last few years of that cycle.

Mis-Understandings's Shortform

Mis-Understandings14d30

If the story about drug prices and price controls is correct (that price controls are bad because the limiting factor for drug development is returns on capital, which this reduces), then we must rethink the political economy of drug development.

Basically, we would expect if that to be the case that the sectoral return rates of biotech to match the risk adjusted rate , but drug development is both risky and skewed, effecting costs of capital.

Most of drug prices are capital costs, and so interventions that lower the capital costs of pharmaceutical companies might produce more drugs.

Most of those capital costs from the total raise required, which is effected basically by the costs of pharmaceutical research (which is probably mostly the labor of expensive professionals).

The expected rate of return is dominated by the risks of pharmaceutical companies.

Drug prices are what the market will bear/monopoly for a time, then drop to a very low level once a compound is generic.

There is a big problem here with out of patent molecules, since if a drug is covered by a patent and stalls 20 years, there is not the return to push it through the process, which means that there might be zombie drugs around from companies that fell apart and did a bad job of selling that asset (so it did not finish the process and did not fail the process).

There seems to be space for the various approvals to become more IP like (so that all drugs have the same exclusivity, regardless of how long they took to prove out).

Why Should I Assume CCP AGI is Worse Than USG AGI?

Mis-Understandings14d21

I don't think that people from the natsec version have made that update, since they have been talking this line for a while.

But the dead organization framing matters here.

In short, people think that democratic institutions are not dead (especially electoralism). If AGI is "Democratic", that live institution, in which they are a stakeholder, will have the power to choose to do fine stuff. (and might generalize to everybody is a stakeholder) Which is + ev, especially for them.

They also expect that China as a live actor will try to kill all other actors if given the chance.

Why Should I Assume CCP AGI is Worse Than USG AGI?

Mis-Understandings15d61

I am neither an American citizen nor a Chinese citizen.

does not describe most people who make that argument.

Most of these people are US citizens, or could be. under liberalism/democracy those sorts of people get a say in the future, so think AGI will be better if it gives those sorts of people a say.

Most people talking about the USG AGI have structural investments in the US, which are better and give them more chances to bid on not destroying the world. (many are citizens or are in the US block). Since the US government is expected to treat other stakeholders in its previous block better than China treats members of it's block, it is better for people who are only US aligned if the US gets more powerful, since it will probably support its traditional allies even when it is vastly more powerful, as it did during the early cold war. (This was obvious last year and no longer obvious).

In short, the USG was commited to international liberalism, which is a great thing for AGI to have for various reasons which are hard to say, but basically of the form that liberals are commited to not doing crazy stuff.

People who can't reason well about the CCP's internal ideologies /political conflicts(like me), and predict ideological alignemnt for AGI, think that USG AGI will use the frames of international liberalism (which don't let you get away with terrible things even if you are powerful), and worry about frames of international realism (which they assign to China, since they cannot tell, and argue that if you have the power you must/should/can use it to do anything, including ruining everybody else).

As a summary, if you are not an american citizen, do not trust the US natsec framing. A lot of this is carryover from times when the US liberal international block (global international order), was stronger, and so as a block framing it is better iff the US block is somehow bigger, which at the time it was.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments