A book review examining Elinor Ostrom's "Governance of the Commons", in light of Eliezer Yudkowsky's "Inadequate Equilibria." Are successful local institutions for governing common pool resources possible without government intervention? Under what circumstances can such institutions emerge spontaneously to solve coordination problems?

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
robo9132
16
Our current big stupid: not preparing for 40% agreement Epistemic status: lukewarm take from the gut (not brain) that feels rightish The "Big Stupid" of the AI doomers 2013-2023 was AI nerds' solution to the problem "How do we stop people from building dangerous AIs?" was "research how to build AIs".  Methods normal people would consider to stop people from building dangerous AIs, like asking governments to make it illegal to build dangerous AIs, were considered gauche.  When the public turned out to be somewhat receptive to the idea of regulating AIs, doomers were unprepared. Take: The "Big Stupid" of right now is still the same thing.  (We've not corrected enough).  Between now and transformative AGI we are likely to encounter a moment where 40% of people realize AIs really could take over (say if every month another 1% of the population loses their job).  If 40% of the world were as scared of AI loss-of-control as you, what could the world do? I think a lot!  Do we have a plan for then? Almost every LessWrong post on AIs are about analyzing AIs.  Almost none are about how, given widespread public support, people/governments could stop bad AIs from being built. [Example: if 40% of people were as worried about AI as I was, the US would treat GPU manufacture like uranium enrichment.  And fortunately GPU manufacture is hundreds of time harder than uranium enrichment!  We should be nerding out researching integrated circuit supply chains, choke points, foundry logistics in jurisdictions the US can't unilaterally sanction, that sort of thing.] TLDR, stopping deadly AIs from being built needs less research on AIs and more research on how to stop AIs from being built. *My research included 😬
On the OpenPhil / OpenAI Partnership Epistemic Note:  The implications of this argument being true are quite substantial, and I do not have any knowledge of the internal workings of Open Phil.  (Both title and this note have been edited, cheers to Ben Pace for very constructive feedback.) Premise 1:  It is becoming increasingly clear that OpenAI is not appropriately prioritizing safety over advancing capabilities research. Premise 2: This was the default outcome.  Instances in history in which private companies (or any individual humans) have intentionally turned down huge profits and power are the exception, not the rule.  Premise 3: Without repercussions for terrible decisions, decision makers have no skin in the game.  Conclusion: Anyone and everyone involved with Open Phil recommending a grant of $30 million dollars be given to OpenAI in 2017 shouldn't be allowed anywhere near AI Safety decision making in the future. To go one step further, potentially any and every major decision they have played a part in needs to be reevaluated by objective third parties.  This must include Holden Karnofsky and Paul Christiano, both of whom were closely involved.  To quote OpenPhil: "OpenAI researchers Dario Amodei and Paul Christiano are both technical advisors to Open Philanthropy and live in the same house as Holden. In addition, Holden is engaged to Dario’s sister Daniela."  
Please help me find research on aspiring AI Safety folk! I am two weeks into the strategy development phase of my movement building and almost ready to start ideating some programs for the year. But I want these programs to be solving the biggest pain points people experience when trying to have a positive impact in AI Safety . Has anyone seen any research that looks at this in depth? For example, through an interview process and then survey to quantify how painful the pain points are? Some examples of pain points I've observed so far through my interviews with Technical folk: * I often felt overwhelmed by the vast amount of material to learn. * I felt there wasn’t a clear way to navigate learning the required information * I lacked an understanding of my strengths and weaknesses in relation to different AI Safety areas  (i.e. personal fit / comparative advantage) . * I lacked an understanding of my progress after I get started (e.g. am I doing well? Poorly? Fast enough?) * I regularly experienced fear of failure * I regularly experienced fear of wasted efforts / sunk cost * Fear of admitting mistakes or starting over might prevent people from making necessary adjustments. * I found it difficult to identify my desired role / job (i.e. the end goal) * When I did think I knew my desired role, identifying the specific skills and knowledge required for a desired role was difficult * There is no clear career pipeline: Do X and then Y and then Z and then you have an A% chance of getting B% role * Finding time to get upskilled while working is difficult * I found the funding ecosystem opaque * A lot of discipline and motivation over potentially long periods was required to upskill * I felt like nobody gave me realistic expectations as to what the journey would be like 
Why no prediction markets for large infrastructure projects? Been reading this excellent piece on why prediction markets aren't popular. They say that without subsidies prediction markets won't be large enough; the information value of prediction markets is often nog high enough.  Large infrastructure projects undertaken by governments, and other large actors often go overbudget, often hilariously so: 3x,5x,10x or more is not uncommon, indeed often even the standard. One of the reasons is that government officials deciding on billion dollar infrastructure projects don't have enough skin in the game. Politicians are often not long enough in office to care on the time horizons of large infrastructure projects. Contractors don't gain by being efficient or delivering on time. To the contrary, infrastructure projects are huge cashcows. Another problem is that there are often far too many veto-stakeholders. All too often the initial bid is wildly overoptimistic.  Similar considerations apply to other government projects like defense procurement or IT projects. Okay - how to remedy this situation? Internal prediction markets theoretically could prove beneficial. All stakeholders & decisionmakers are endowed with vested equity with which they are forced to bet on building timelines and other key performance indicators. External traders may also enter the market, selling and buying the contracts. The effective subsidy could be quite large. Key decisions could save billions.  In this world, government officials could gain a large windfall which may be difficult to explain to voters. This is a legitimate objection.  A very simple mechanism would simply ask people to make an estimate on the cost C and the timeline T for completion.  Your eventual payout would be proportional to how close you ended up to the real C,T compared to the other bettors. [something something log scoring rule is proper]. 
If your endgame strategy involved relying on OpenAI, DeepMind, or Anthropic to implement your alignment solution that solves science / super-cooperation / nanotechnology, consider figuring out another endgame plan.

Popular Comments

Recent Discussion

Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board Expands.

Ilya Sutskever and Jan Leike have left OpenAI. This is almost exactly six months after Altman’s temporary firing and The Battle of the Board, the day after the release of GPT-4o, and soon after a number of other recent safety-related OpenAI departures. Many others working on safety have also left recently. This is part of a longstanding pattern at OpenAI.

Jan Leike later offered an explanation for his decision on Twitter. Leike asserts that OpenAI has lost the mission on safety and culturally been increasingly hostile to it. He says the superalignment team was starved for resources, with its public explicit compute commitments dishonored, and...

Go all-in on lobbying the US and other governments to fully prohibit the training of frontier models beyond a certain level, in a way that OpenAI can't route around (so probably block Altman's foreign chip factory initiative, for instance).

13Linch
This has been OpenAI's line (whether implicitly or explicitly) for a while iiuc. I referenced it on my Open Asteroid Impact website, under "Operation Death Star." 
1HiddenPrior
Super helpful! Thanks!
3HiddenPrior
I am limited in my means, but I would commit to a fund for strategy 2. My thoughts were on strategy 2, and it seems likely to do the most damage to OpenAI's reputation (and therefore funding) out of the above options. If someone is really protective of something, like their public image/reputation, that probably indicates that it is the most painful place to hit them.

This is a quickly-written opinion piece, of what I understand about OpenAI. I first posted it to Facebook, where it had some discussion

 

Some arguments that OpenAI is making, simultaneously:

  1. OpenAI will likely reach and own transformative AI (useful for attracting talent to work there).
  2. OpenAI cares a lot about safety (good for public PR and government regulations).
  3. OpenAI isn’t making anything dangerous and is unlikely to do so in the future (good for public PR and government regulations).
  4. OpenAI doesn’t need to spend many resources on safety, and implementing safe AI won’t put it at any competitive disadvantage (important for investors who own most of the company).
  5. Transformative AI will be incredibly valuable for all of humanity in the long term (for public PR and developers).
  6. People at OpenAI have thought long and
...

Meta’s messaging is clearer.

“AI development won’t get us to transformative AI, we don’t think that AI safety will make a difference, we’re just going to optimize for profitability.”

 

So, Meta's messaging is actually quite inconsistent. Yann LeCun says (when speaking to certain audiences, at least) that current AI is very dumb, and AGI is so far away it's not worth worrying about all that much. Mark Zuckerberg, on the other hand, is quite vocal that their goal is AGI and that they're making real progress towards it, suggesting 5+ year timelines.

You can think of a scorable function as a "box" that can produce large clusters of complex forecasts. Credit to Dall-E 3.

Introduction

Imagine if a forecasting platform had estimates for things like:

  1. "For every year until 2100, what will be the probability of a global catastrophic biological event, given different levels of biosecurity investment and technological advancement?"
  2. "What will be the impact of various AI governance policies on the likelihood of developing safe and beneficial artificial general intelligence, and how will this affect key indicators of global well-being over the next century?"
  3. "How valuable is every single project funded by Open Philanthropy, according to a person with any set of demographic information, if they would spend 1000 hours reflecting on it?"

These complex, multidimensional questions are useful for informing decision-making and resource...

This post was written by Peli Grietzer, inspired by internal writings by TJ (tushant jha), for AOI[1]. The original post, published on Feb 5, 2024, can be found here: https://ai.objectives.institute/blog/the-problem-with-alignment.

The purpose of our work at the AI Objectives Institute (AOI) is to direct the impact of AI towards human autonomy and human flourishing. In the course of articulating our mission and positioning ourselves -- a young organization -- in the landscape of AI risk orgs, we’ve come to notice what we think are serious conceptual problems with the prevalent vocabulary of ‘AI alignment.’ This essay will discuss some of the major ways in which we think the concept of ‘alignment’ creates bias and confusion, as well as our own search for clarifying concepts. 

At AOI, we try to...

This is a linkpost for https://ailabwatch.org

I'm launching AI Lab Watch. I collected actions for frontier AI labs to improve AI safety, then evaluated some frontier labs accordingly.

It's a collection of information on what labs should do and what labs are doing. It also has some adjacent resources, including a list of other safety-ish scorecard-ish stuff.

(It's much better on desktop than mobile — don't read it on mobile.)

It's in beta—leave feedback here or comment or DM me—but I basically endorse the content and you're welcome to share and discuss it publicly.

It's unincorporated, unfunded, not affiliated with any orgs/people, and is just me.

Some clarifications and disclaimers.

How you can help:

  • Give feedback on how this project is helpful or how it could be different to be much more helpful
  • Tell me what's wrong/missing; point me to sources
...
6Wei Dai
Suggest having a row for "Transparency", to cover things like whether the company encourages or discourages whistleblowing, does it report bad news about alignment/safety (such as negative research results) or only good news (new ideas and positive results), does it provide enough info to the public to judge the adequacy of its safety culture and governance, etc.

Thanks.

Any takes on what info a company could publish to demonstrate "the adequacy of its safety culture and governance"? (Or recommended reading?)

Ideally criteria are objectively evaluable / minimize illegible judgment calls.

2Martin Vlach
So Alignment program is to be updated to 0 for OpenAI now that Superalignment team is no more? ( https://docs.google.com/document/d/1uPd2S00MqfgXmKHRkVELz5PdFRVzfjDujtu8XLyREgM/edit?usp=sharing )

Thanks to Taylor Smith for doing some copy-editing this.

In this article, I tell some anecdotes and present some evidence in the form of research artifacts about how easy it is for me to work hard when I have collaborators. If you are in a hurry I recommend skipping to the research artifact section.

Bleeding Feet and Dedication

During AI Safety Camp (AISC) 2024, I was working with somebody on how to use binary search to approximate a hull that would contain a set of points, only to knock a glass off of my table. It splintered into a thousand pieces all over my floor.

A normal person might stop and remove all the glass splinters. I just spent 10 seconds picking up some of the largest pieces and then decided...

Emrik10

I think both "jog, don't sprint" and "sprint, don't jog" is too low-dimensional as advice. It's good to try to spend 100% of one's resources on doing good—sorta tautologically. What allows Johannes to work as hard as he does, I think, is not (just) that he's obsessed with the work, it's rather that he understands his own mind well enough to navigate around its limits. And that self-insight is also what enables him aim his cognition at what matters—which is a trait I care more about than ability to work hard.

People who are good at aiming their cognition at ... (read more)

3mesaoptimizer
I think what quila is pointing at is their belief in the supposed fragility of thoughts at the edge of research questions. From that perspective I think their rebuttal is understandable, and your response completely misses the point: you can be someone who spends only four hours a day working and the rest of the time relaxing, but also care a lot about not losing the subtle and supposedly fragile threads of your thought when working. Note: I have a different model of research thought, one that involves a systematic process towards insight, and because of that I also disagree with Johannes' decisions.
1Mo Putera
I think you're right that I missed their point, thanks for pointing it out. I have had experiences similar to Johannes' anecdote re: ignoring broken glass to not lose fragile threads of thought; they usually entailed extended deep work periods past healthy thresholds for unclear marginal gain, so the quotes above felt personally relevant as guardrails. But also my experiences don't necessarily generalize (as your hypothetical shows). I'd be curious to know your model, and how it compares to some of John Wentworth's posts on the same IIRC.
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

47Linch
Do we know if @paulfchristiano or other ex-lab people working on AI policy have non-disparagement agreements with OpenAI or other AI companies? I know Cullen doesn't, but I don't know about anybody else. I know NIST isn't a regulatory body, but it still seems like standards-setting should be done by people who have no unusual legal obligations.  To be clear, I want to differentiate between Non-Disclosure Agreements, which are perfectly sane and reasonable in at least a limited form as a way to prevent leaking trade secrets, and non-disparagement agreements, which prevents you from saying bad things about past employers. The latter seems clearly bad to have for anybody in a position to affect policy. Doubly so if the existence of the non-disparagement agreement itself is secretive.
7gwern
Sam Altman appears to have been using non-disparagements at least as far back as 2017-2018, even for things that really don't seem to have needed such things at all, like a research nonprofit arm of YC. It's unclear if that example is also a lifetime non-disparagement (I've asked), but nevertheless, given that track record, you should assume the OA research non-profit also tried to impose it, followed by the OA LLC (obviously), and so Paul F. Christiano (and all of the Anthropic founders) would presumably be bound. This would explain why Anthropic executives never say anything bad about OA, and refuse to explain exactly what broken promises by Altman triggered their failed attempt to remove Altman and subsequent exodus. (I have also asked Sam Altman on Twitter, since he follows me, apropos of his vested equity statement, how far back these NDAs go, if the Anthropic founders are still bound, and if they are, whether they will be unbound.)
7habryka
It seems really quite bad for Paul to work in the U.S. government on AI legislation without having disclosed that he is under a non-disparagement clause for the biggest entity in the space the regulator he is working at is regulating. And if he signed an NDA that prevents him from disclosing it, then it was IMO his job to not accept a position in which such a disclosure would obviously be required.  I am currently like 50% Paul has indeed signed such a lifetime non-disparagement agreement, so I do think I don't buy that a "presumably" is appropriate here (though I am not that far away from it).
gwern21

It would be bad, I agree. (An NDA about what he worked on at OA, sure, but then being required to never say anything bad about OA forever, as a regulator who will be running evaluations etc...?) Fortunately, this is one of those rare situations where it is probably enough for Paul to simply say his OA NDA does not cover that - then either it doesn't and can't be a problem, or he has violated the NDA's gag order by talking about it and when OA then fails to sue him to enforce it, the NDA becomes moot.

by Lucius Bushnaq, Jake Mendel, Kaarel Hänni, Stefan Heimersheim.

A short post laying out our reasoning for using integrated gradients as attribution method. It is intended as a stand-alone post based on our LIB papers [1] [2]. This work was produced at Apollo Research.

Context

Understanding circuits in neural networks requires understanding how features interact with other features. There's a lot of features and their interactions are generally non-linear. A good starting point for understanding the interactions might be to just figure out how strongly each pair of features in adjacent layers of the network interacts. But since the relationships are non-linear, how do we quantify their 'strength' in a principled manner that isn't vulnerable to common and simple counterexamples? In other words, how do we quantify how much the...

[Not very confident, but just saying my current view.]

I'm pretty skeptical about integrated gradients.

As far as why, I don't think we should care about the derivative at the baseline (zero or the mean).

As far as the axioms, I think I get off the train on "Completeness" which doesn't seem like a property we need/want.

I think you just need to eat that there isn't any sensible way to do something reasonable that gets Completeness.

The same applies with attribution in general (e.g. in decision making).

2ryan_greenblatt
Maybe I'm confused, but isn't integrated gradients strictly slower than an ablation to a baseline?
2tailcalled
Maybe? One thing I've been thinking a lot recently is that building tools to interpret networks on individual datapoints might be more relevant than attributing over a dataset. This applies if the goal is to make statistical generalizations since a richer structure on an individual datapoint gives you more to generalize with, but it also applies if the goal is the inverse, to go from general patterns to particulars, since this would provide a richer method for debugging, noticing exceptions, etc.. And basically the trouble a lot of work that attempts to generalize ends up with is that some phenomena are very particular to specific cases, so one risks losing a lot of information by only focusing on the generalizable findings. Either way, cool work, seems like we've thought about similar lines but you've put in more work.

Scarlett Johansson makes a statement about the "Sky" voice, a voice for GPT-4o that OpenAI recently pulled after less than a week of prime time.

tl;dr: OpenAI made an offer last September to Johansson; she refused. They offered again 2 days before the public demo. Scarlett Johansson claims that the voice was so similar that even friend and family noticed. She hired legal counsel to ask OpenAI to "detail the exact process by which they created the ‘Sky’ voice," which resulted in OpenAI taking the voice down.

Full statement below:

Last September, I received an offer from Sam Altman, who wanted to hire me to voice the current ChatGPT 4.0 system. He told me that he felt that by my voicing the system, I could bridge the gap between tech...

gwern40

Sam Altman has apparently provided a statement to NPR apropos of https://www.npr.org/2024/05/20/1252495087/openai-pulls-ai-voice-that-was-compared-to-scarlett-johansson-in-the-movie-her , quoted on Twitter by the NPR journalist (second):

...In response, Sam Altman has now issued a statement saying “Sky is not Scarlett Johansson’s, and it was never intended to resemble hers.”

“We cast the voice actor behind Sky’s voice before any outreach to Ms. Johansson. Out of respect for Ms. Johansson, we have paused using Sky’s voice in our products. We are sorry to Ms

... (read more)

LessOnline Festival

May 31st to June 2nd, Berkely CA