Wiki Contributions

Comments

This does not feel super cruxy as the the power incentive still remains. 

"This grant was obviously ex ante bad. In fact, it's so obvious that it was ex ante bad that we should strongly update against everyone involved in making it."

This is an accurate summary. 

"arguing about the impact of grants requires much more thoroughness than you're using here"

We might not agree on the level of effort required for a quick take. I do not currently have the time available to expand this into a full write up on the EA forum but am still interested in discussing this with the community. 

"you're making a provocative claim but not really spelling out why you believe the premises."

I think this is a fair criticism and something I hope I can improve on.

I feel frustrated that your initial comment (which is now the top reply) implies I either hadn't read the 1700 word grant justification that is at the core of my argument, or was intentionally misrepresenting it to make my point. This seems to be an extremely uncharitable interpretation of my initial post.

Your reply has been quite meta, which makes it difficult to convince you on specific points.

Your argument on betting markets has updated me slightly towards your position, but I am not particularly convinced. My understanding is that Open Phil and OpenAI had a close relationship, and hence Open Phil had substantially more information to work with than the average manifold punter. 
 

So the case for the grant wasn't "we think it's good to make OAI go faster/better".

I agree. My intended meaning is not that the grant is bad because its purpose was to accelerate capabilities. I apologize that the original post was ambiguous

Rather, the grant was bad for numerous reasons, including but not limited to:

  • It appears to have had an underwhelming governance impact (as demonstrated by the board being unable to remove Sam). 
  • It enabled OpenAI to "safety-wash" their product (although how important this has been is unclear to me.)
  • From what I've seen at conferences and job boards, it seems reasonable to assert that the relationship between Open Phil and OpenAI has lead people to work at OpenAI. 
  • Less important, but the grant justification appears to take seriously the idea that making AGI open source is compatible with safety. I might be missing some key insight, but it seems trivially obvious why this is a terrible idea even if you're only concerned with human misuse and not misalignment.
  • Finally, it's giving money directly to an organisation with the stated goal of producing an AGI. There is substantial negative -EV if the grant sped up timelines. 

This last claim seems very important. I have not been able to find data that would let me confidently estimate OpenAI's value at the time the grant was given. However, wikipedia mentions that "In 2017 OpenAI spent $7.9 million, or a quarter of its functional expenses, on cloud computing alone." This certainly makes it seem that the grant provided OpenAI with a significant amount of capital, enough to have increased its research output. 

Keep in mind, the grant needs to have generated 30 million in EV just to break even. I'm now going to suggest some other uses for the money, but keep in mind these are just rough estimates and I haven't adjusted for inflation. I'm not claiming these are the best uses of 30 million dollars. 

The money could have funded an organisation the size of MIRI for roughly a decade (basing my estimate on MIRI's 2017 fundraiser, using 2020 numbers gives an estimate of ~4 years). 

Imagine the shift in public awareness if there had been an AI safety Superbowl ad for 3-5 years. 

Or it could have saved the lives of ~1300 children

This analysis is obviously much worse if in fact the grant was negative EV.

That's a good point. You have pushed me towards thinking that this is an unreasonable statement and "predicted this problem at the time" is better.

Very Spicy Take

Epistemic Note: 
Many highly respected community members with substantially greater decision making experience (and Lesswrong karma) presumably disagree strongly with my conclusion.

Premise 1: 
It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research.

Premise 2:
This was the default outcome. 

Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule. 

Premise 3:
Without repercussions for terrible decisions, decision makers have no skin in the game

Conclusion:
Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn't be allowed anywhere near AI Safety decision making in the future.

To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties. 

This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved. 

To quote OpenPhil:
"OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela."

This is your second post and you're still being vague about the method. I'm updating strongly towards this being a hoax and I'm surprised people are taking you seriously.

Edit: I'll offer you a 50 USD even money bet that your method won't replicate when tested by a 3rd party with more subjects and a proper control group.

You are given a string s corresponding to the Instructions for the construction of an AGI which has been correctly aligned with the goal of converting as much of the universe into diamonds as possible. 

What is the conditional Kolmogorov Complexity of the string s' which produces an AGI aligned with "human values" or any other suitable alignment target.

To convert an abstract string to a physical object, the "Instructions" are read by a Finite State Automata, with the state of the FSA at each step dictating the behavior of a robotic arm (with appropriate mobility and precision) with access to a large collection of physical materials. 

Answer by Stephen Fowler10

Tangential. 

Is part of the motivation behind this question to think about the level of control that a super-intelligence could have on a complex system if it was only able to only influence a small part of that system?

I was not precise enough in my language and agree with you highlighting that what "alignment" means for LLM is a bit vague. While people felt Sydney Bing was cool, if it was not possible to reign it in it would have made it very difficult for Microsoft to gain any market share. An LLM that doesn't do what it's asked or regularly expresses toxic opinions is ultimately bad for business.

In the above paragraph understand "aligned" to mean in the concrete sense of "behaves in a way that is aligned with it's parent companies profit motive", rather than "acting in line with humanities CEV". To rephrase the point I was making above, I feel much of (a majority even) of today's alignment research is focused on the the first definition of alignment, whilst neglecting the second.

A concerning amount of alignment research is focused on fixing misalignment in contemporary models, with limited justification for why we should expect these techniques to extend to more powerful future systems.

By improving the performance of today's models, this research makes investing in AI capabilities more attractive, increasing existential risk.

Imagine an alternative history in which GPT-3 had been wildly unaligned. It would not have posed an existential risk to humanity but it would have made putting money into AI companies substantially less attractive to investors.

Load More