All of JustinShovelain's Comments + Replies

I agree.

Anthropic's marginal contribution to safety (compared to what we would have in a world without Anthropic) probably doesn't offset Anthropic's contribution to the AI race.

I think there are more worlds where Anthropic is contributing to the race in a negative fashion than there are worlds where Anthropic's marginal safety improvement over OpenAI/DeepMind-ish orgs is critical for securing a good future with AGI (weighing things according to the impact sizes and probabilities).

More generally you can use the following typology to inspire creating more interventions. 

Interventions points to change/form an AGI company and its surroundings towards safer x-risk results (I've used this in advising startups on AI safety, it is also related to my post on positions where people can be in the loop):

  • Type of organization: nonprofit, public benefit organization, have a partner non-profit, join the government
  • Rules of organization, event triggers:
    • Rules:
      • x-risk mission statement
      • x-risk strategic plan
    • Triggering events:
      • Gets very big: windfall
... (read more)

Thanks for asking the question!

Some things I'd especially like to see change (in as much as I know what is happening) are:

  • Making more use of available options to improve AI safety (I think there are more than I get the impression that Anthropic thinks. For instance, 30% of funds could be allocated to AI safety research if framed well and it would probably be below the noise threshold/froth of VC investing. Also, there probably is a fair degree of freedom in socially promoting concern around unaligned AGI.)
  • Explicit ways to handle various types of events lik
... (read more)
1JustinShovelain
More generally you can use the following typology to inspire creating more interventions.  Interventions points to change/form an AGI company and its surroundings towards safer x-risk results (I've used this in advising startups on AI safety, it is also related to my post on positions where people can be in the loop): * Type of organization: nonprofit, public benefit organization, have a partner non-profit, join the government * Rules of organization, event triggers: * Rules: * x-risk mission statement * x-risk strategic plan * Triggering events: * Gets very big: windfall clause * Gets sold to another party: ethics board, restrictions on potential sale * Value drift: reboot board and CEOs, shut it down, allocate more resources to safety, build a new company, put the ethics board in charge, build a monitoring system, some sort of line in the sand * AI safety isn’t viable yet but dangerous AGI is: shut it down or pivot to sub AGI research and product development * Hostile government tries to take it over: shut it down, change countries, (see also: Soft Nationalization: How the US Government Will Control AI Labs) * Path decisions for organization: ethics board, aligned investors, good CEOs, giving x-risk orgs or people choice power, voting stock to aligned investors, periodic x-risk safety reminders * Resource allocation by organization: precommitting a varying percentage of money/time focused on x-risk reduction based on conditions with some up front, a commitment devices for funding allocation into the future * Owners of organization: aligned investors, voting stock for aligned investors, necessary percentage as aligned investors * Executive decision making: good CEOs, company mission statement?, company strategic plan? * Employees: select employees preferably by alignment, have only aligned people hire folks * Education of employees and/or investors by x-risk folks: employee training in x-risks and information hazar

Gotcha. What determines the "ratios" is some sort of underlying causal structure of which some aspects can be summarized by a tech tree. For thinking about the causal structure you may also like this post: https://forum.effectivealtruism.org/posts/TfRexamDYBqSwg7er/causal-diagrams-of-the-paths-to-existential-catastrophe

Complementary ideas to this article:

... (read more)
4Charlie Steiner
The tech tree picture is in large part a reaction to single-factor "capabilities/alignment ratio" pictures (as in FAI Research Constraints), or even two-factor pictures (as in State Space), because while they summarize the results of decision-making about research priorities, they didn't help me summarize my reasoning process.
2Charlie Steiner
Thanks a bunch!

Relatedly, here is a post going beyond the framework of a ratio of progress to the effect on the ratio of research that still needs to be done for various outcomes: https://www.lesswrong.com/posts/BfKQGYJBwdHfik4Kd/fai-research-constraints-and-agi-side-effects

Extending further one can examine higher order derivatives and curvature in a space of existential risk trajectories: https://forum.effectivealtruism.org/posts/TCxik4KvTgGzMowP9/state-space-of-x-risk-trajectories

2Charlie Steiner
Thanks! I'm writing up some thoughts on dual use and liked the links.

Roughly speaking, in terms of the actions you take, various timelines should be weighted as P(AGI in year t)*DifferenceYouCanProduceInAGIAlignmentAt(t). This produces a new, non normalized distribution of how much to prioritize each time (you can renormalize it if you wish to make it more like "probability"). 

Note that this is just a first approximation and there are additional subtleties.

  • This assumes you are optimizing for each time and possible world orthogonality but much of the time optimizing for nearby times is very similar to optimizing for a p
... (read more)

I think causal diagrams naturally emerge when thinking about Goodhart's law and its implications. 

I came up with the concept of Goodhart's law causal graphs above because of a presentation someone gave at the EA Hotel in late 2019 of Scott's Goodhart Taxonomy. I thought causal diagrams were a clearer way to describe some parts of the taxonomy but their relationship to the taxonomy is complex. I also just encountered the paper you and Scott wrote a couple weeks ago when getting ready to write this Good Heart Week prompted post, and I was planning in th... (read more)

I like the distinction that you're making and that you gave it a clear name.

Relatedly, there is the method of Lagrangian multipliers for solving things in the subspace.

On a side note: there is a way to partially unify subspace optimum and local optimum by saying that the subspace optimum is a local optimum with respect to the local set of parameters you're using to define the subspace. You're at a local optimum with respect to defining the underlying space to optimize over (aka the subspace) and a local optimum within that space (the subspace). (Relatedly, moduli spaces.)

I've decided to try modelling testing and contact tracing over the weekend. If you wish to join and want to ping me my contact details are in the doc.

I think virus inactivation is a normal vaccination approach and is probably being pursued here? The hardest part is probably growing it in vitro at scale and perhaps ensuring that all of them are inactive.

2ChristianKl
You don't need to grow it at scale and out of the two vaccine we have in trials one is adenovirus-based while the other is mRNA-based.

Nice deduction about the relationship between this and conflict vs mistake theory! Similar and complementary to this post is the one I wrote on Moloch and the Pareto optimal frontier.

4romeostevensit
+1 I think this area of investigation is underexplored and potentially very fruitful.
2romeostevensit
I want some latitude to commit to things to that are only partially entangled with my values since that lets me coordinate with a broader range of people. If there is mutual knowledge about this necessity then the coordination is also more metastable.

By new "term" I meant to make the clear that this statement points to an operation that cannot be done with the original machine. Instead it calls this new module (say a halting oracle) that didn't exist originally.

0Vladimir_Nesov
What machine???

Are you trying to express the idea of adding new fundamental "terms" to your language describing things like halting oracles and such? And then discounting their weight by the shortest statement of said term's properties expressed in the language that existed previously to including this additional "term?" If so, I agree that this is the natural way to extend priors out to handle arbitrary describable objects such as halting oracles.

Stated another way. You start with a language L. Let the definition of an esoteric mathematical object (... (read more)

0Vladimir_Nesov
Why "new terms"? If the language can finitely express a concept, my scheme gives that concept plausibility. Maybe this could be extended to lengths of programs that generate axioms for a given theory (even enumerable sets of axioms), rather than lengths of individual finite statements, but I guess that can be stated within some logical language just as well.

Interesting idea.

I agree that trusting newly formed ideas is risky, but there are several reasons to convey them anyway (non-comprehensive listing):

  • To recruit assistance in developing and verifying them

  • To convey an idea that is obvious in retrospect, an idea you can be confident in immediately

  • To signal cleverness and ability to think on one's feet

  • To socially play with the ideas

What we are really after though is to asses how much weight to assign to an idea off the bat so we can calculate the opportunity costs of thinking about the idea in great... (read more)

2Erik
Solutions to hard puzzles are good examples of these. NP-problems, where finding a solution is (believed to be) exponentially harder than checking the correctness of it, is the extreme case.
2AdeleneDawner
Such ideas are prone to being flawed because they fail to take into account relevant information that has been temporarily forgotten.

Vote this up if you are the oldest child with siblings.

Vote this up if you are an only child.

Vote this up if you have older siblings.

Poll: Do you have older siblings or are an only child?

karma balance

5steven0461
I'm pretty sure that in the general population, there are at least as many people with older siblings as there are people with only younger siblings. But in this poll, it's 6 vs 19. That looks like a humongous effect (which we also found in SIAI-associated people, and which this poll was intended to further check). I could see some sort of self-selection bias and the like, and supposedly oldest children have slightly higher IQs on average, but on the whole I'm stumped for an explanation. Anyone? ETA: Here's a claim that "it is consistently found that being first-born is particularly favourable to high levels of scientific creativity". See also this.
-47JustinShovelain

Vote this up if you are the oldest child with siblings.

Vote this up if you are an only child.

Vote this up if you have older siblings.

I'm thinking of writing up a post clearly explaining update-less decision theory. I have a somewhat different way of looking at things than Wei Dia and will give my interpretation of his idea if there is demand. I might also need to do this anyway in preparation for some additional decision theory I plan to post to lesswrong. Is there demand?

0Will_Newsome
If and only if you can explain UDT in text at least as clearly as you explained it to me in person; I don't think that would take a very long post.

Closely related to your point is the paper, "The Epistemic Benefit of Transient Diversity"

It describes and models the costs and benefits of independent invention and transient disagreement.

7JenniferRM
Having read this paper in the past I'd encourage people to look into it. It offers the case of stomach ulcer etiology. A study many decades ago came to the conclusion that bacteria were not the cause of ulcers (the study was reasonably thorough, it just missed some details) and that lead no one to do research in the area because the payoff of confirming a theory that was very likely to be right was so low. This affected many many people. Ulcers caused by H. Pylori can generally be treated simply with antibiotics and some pepto for the symptoms, but for lack of this treatment many people suffered chronic ulcers for decades. After the example, the paper develops a model for both the speed of scientific progress and the likelihood of a community settling on a wrong conclusion based on the social graph of the researchers. It shows that communities where everyone knows of everyone else's research results converge more swiftly but are more likely to make group errors. By contrast, sparsely connected communities converge more slowly but are less likely to make substantive errors. Part of the trick here (not really highlighted in the paper) is the way that hearing everyone's latest results is selfishly beneficial for researchers who are rewarded for personally answering "the biggest open question in their field" whereas people whose position in a social graph of knowledge workers is more marginal are likely to be working on questions where the social utility relative to personal career utility is more pro-social than is usual. Most marginal researchers will gain no significant benefits, of course, because they'll simply confirm the answers that central researchers were already assuming based on a single study they heard about once. Romantically considered, these people are sort of the unsung heroes of science... the patent clerks who didn't come up with a theory of relativity even if they were looking in plausible places. But the big surprises and big career boosts are

Why are you more concerned about something with unlimited ability to self reflect making a calculation error than about the above being a calculation error? The AI could implement the above if the calculation implicit in it is correct.

What keeps the AI from immediately changing itself to only care about the people's current utility function? That's a change with very high expected utility defined in terms of their current utility function and one with little tendency to change their current utility function.

Will you believe that a simple hack will work with lower confidence next time?

0Stuart_Armstrong
Slightly. I was counting on this one getting bashed into shape by the comments; it wasn't so in future, I'll try and do more of the bashing myself.

Hmm, darn. When I write I do have a tendency to see what ideas I meant to describe instead of seeing my actual exposition; I don't like grammar checking my writing until I've had some time to forget details, I read right over my errors unless I pay special attention.

I did have a three LWers look over the article before I sent it and got the general criticism that it was a bit obscure and dense but understandable and interesting. I was probably too ambitious in trying to include everything within one post though, length vs clarity tradeoff.

To address you... (read more)

I think he meant that even if we are not religious, society tends to pull us into moral realism even though of course moral realism is an illusion.

You are correct, though I don't go as far as calling moral realism an illusion because of unknown unknowns (though I would be very surprised to find it isn't illusionary).

Addressing your reification point:

By means of reification something that was previously implicit, unexpressed and possibly unexpressible is explicitly formulated and made available to conceptual (logical or computational) manipulation." - Reification(computer science) from wikipedia.

I don't think I did abuse vocabulary outside of possibly generalizing meanings in straightforward ways and taking words and meanings common in one topic and using them in a context where they are rather uncommon (e.g. computer science to philosophy). I rely on contex... (read more)

Some things I use to test mental ability as well as train it are: BrainWorkshop (A free dualNback program), Cognitivefun.net (A site with assorted tests and profiles including everything from reaction time, to subitizing, to visual backward digit span), Posit Science's jewel diver demo (a multi-object tracking test), and Lumosity.com (brainshift, memory matrix, speed match, top chimp. All of these tests can be found for free on the internet).

Subjectively the regular use of these tests has increased my metacognitive and self monitoring ability. Anyone have ... (read more)

1gwern
Second Brain Workshop; the dual n-back task is one of the few things which has any research suggesting it generalizes/transfers, and so is really interesting compared to random mental trivia tests like most of the stuff on Luminosity.

I do not agree with all interpretations of the quote but primed by:

That's not right. It's not even wrong. -- Wolfgang Pauli

I interpreted it charitably with "critical" loosely implying "worth thinking about" in contrast to vague ideas that are not even wrong. Furthermore, from thefreedictionary.com definition of critical, "1. Inclined to judge severely and find fault.", vague statements may be considered useless and so judged severely but much of the time they are also slippery in that they must be broken down into precis... (read more)

1orthonormal
Given what I've read of The Tao is Silent, I'm inclined to take a more literal (and less agreeable) interpretation of his quote here.

Make everything as simple as possible, but not simpler.

-- Albert Einstein

"It is the mark of an instructed mind to rest assured with that degree of precision that the nature of the subject admits, and not to seek exactness when only an approximation of the truth is possible."

--Aristotle

Many highly intelligent people are poor thinkers. Many people of average intelligence are skilled thinkers. The power of a car is separate from the way the car is driven.

-- Edward de Bono

0spriteless
Reminds me of the Geography teacher I had with cerebral palsy, compared to college kids with no aspirations beyond working a coffee shop.
1anonym
Isn't this obvious and also arguing against only a straw-man position with respect to intelligence and intellectual skill -- namely that intellectual skill is a function of one variable (intelligence) and that all other factors (such as industriousness & creativity) have no impact on intellectual skill? It's phrased as if it conveys some deep wisdom, when the reality is that almost all reasonably intelligent people already believe this.

In a sense, words are encyclopedias of ignorance because they freeze perceptions at one moment in history and then insist we continue to use these frozen perceptions when we should be doing better.

-- Edward de Bono

3anonym
Except that language is a living, breathing thing, and words are constantly being invented, falling out of favor, taking on new meanings and losing old ones.

Some people are always critical of vague statements. I tend rather to be critical of precise statements; they are the only ones which can correctly be labeled 'wrong'.

-- Raymond Smullyan

Surely, to label a statement "vague" is a higher order of insult than to call it "wrong". Newton was wrong but at least he was not vague.

1MBlume
Statements should be as precise as possible, but no more precise.

From pwno: "Aren't true theories defined by how useful they are in some application?"

My definition of "usefulness" was built with the express purpose of relating the truth of theories to how useful they are and is very much a context specific temporary definition (hence "define:"). If I had tried to deal with it directly I would have had something uselessly messy and incomplete, or I could have used a true but also uninformative expectation approach and hid all of the complexity. Instead, I experimented and tried to force the ... (read more)

define: A theory's "truthfulness" as how much probability mass it has after appropriate selection of prior and applications of Bayes' theorem. It works as a good measure for a theory's "usefulness" as long as resource limitations and psychological side effects aren't important.

define: A theory's "usefulness" as a function of resources needed to calculate its predictions to a certain degree of accuracy, the "truthfulness" of the theory itself, and side effects. Squinting at it, I get something roughly like: usefulne... (read more)

3jimrandomh
Your definition of usefulness fails to include the utility of the predictions made, which is the most important factor. A theory is useful if there is a chain of inference from it to a concrete application, and its degree of usefulness depends on the utility of that application, whether it could have been reached without using the theory, and the resources required to follow that chain of inference. Measuring usefulness requires entangling theories with applications and decisions, whereas truthfulness does not. Consequently, it's incorrect to treat truthfulness as a special case of usefulness or vise versa.

I agree that it may plausibly be argued that the difference should rarely fall into the small margin: U(good name) - U(bad name) (up to varying priors, utility functions, ...). However, should people calculate to the point that they can resolve differences of that order of magnitude? A fast and dirty heuristic may be the way to go practically speaking; the difference in utility would be less than the utility lost in calculating it.

Is this whole bias caused by the exposure effect? Would there be any obstacle in unifying the two? Do people also prefer to live in towns that are associated with their parents' names? Do people who fall for this effect also name their pets or children after themselves to a greater extent?

1Scott Alexander
The second link in the article, the one with the words "they find", is a paper called "Name Letter Preferences Are Not Merely Mere Exposure". You should find some useful studies and stuff there. If you want the paper but can't access it, tell me your email and I'll send it to you.