Quick Takes

I thought Superalignment was a positive bet by OpenAI, and I was happy when they committed to putting 20% of their current compute (at the time) towards it. I stopped thinking about that kind of approach because OAI already had competent people working on it. Several of them are now gone.

It seems increasingly likely that the entire effort will dissolve. If so, OAI has now made the business decision to invest its capital in keeping its moat in the AGI race rather than basic safety science. This is bad and likely another early sign of what's to come.

I think ... (read more)

kromem10

It's going to have to.

Ilya is brilliant and seems to really see the horizon of the tech, but maybe isn't the best at the business side to see how to sell it.

But this is often the curse of the ethically pragmatic. There is such a focus on the ethics part by the participants that the business side of things only sees that conversation and misses the rather extreme pragmatism.

As an example, would superaligned CEOs in the oil industry fifty years ago have still only kept their eye on quarterly share prices or considered long term costs of their choices? There'... (read more)

1Bogdan Ionut Cirstea
Strongly agree; I've been thinking for a while that something like a public-private partnership involving at least the US government and the top US AI labs might be a better way to go about this. Unfortunately, recent events seem in line with it not being ideal to only rely on labs for AI safety research, and the potential scalability of automating it should make it even more promising for government involvement. [Strongly] oversimplified, the labs could provide a lot of the in-house expertise, the government could provide the incentives, public legitimacy (related: I think of a solution to aligning superintelligence as a public good) and significant financial resources.

My timelines are lengthening. 

I've long been a skeptic of scaling LLMs to AGI *. To me I fundamentally don't understand how this is even possible. It must be said that very smart people give this view credence. davidad, dmurfet. on the other side are vanessa kosoy and steven byrnes. When pushed proponents don't actually defend the position that a large enough transformer will create nanotech or even obsolete their job. They usually mumble something about scaffolding.

I won't get into this debate here but I do want to note that my timelines have lengthe... (read more)

Showing 3 of 21 replies (Click to show all)
5Nathan Helm-Burger
My view is that there's huge algorithmic gains in peak capability, training efficiency (less data, less compute), and inference efficiency waiting to be discovered, and available to be found by a large number of parallel research hours invested by a minimally competent multimodal LLM powered research team. So it's not that scaling leads to ASI directly, it's: 1. scaling leads to brute forcing the LLM agent across the threshold of AI research usefulness 2. Using these LLM agents in a large research project can lead to rapidly finding better ML algorithms and architectures. 3. Training these newly discovered architectures at large scales leads to much more competent automated researchers. 4. This process repeats quickly over a few months or years. 5. This process results in AGI. 6. AGI, if instructed (or allowed, if it's agentically motivated on its own to do so) to improve itself will find even better architectures and algorithms. 7. This process can repeat until ASI. The resulting intelligence / capability / inference speed goes far beyond that of humans.  Note that this process isn't inevitable, there are many points along the way where humans can (and should, in my opinion) intervene. We aren't disempowered until near the end of this.
4Alexander Gietelink Oldenziel
Why do you think there are these low-hanging algorithmic improvements?

My answer to that is currently in the form of a detailed 2 hour lecture with a bibliography that has dozens of academic papers in it, which I only present to people that I'm quite confident aren't going to spread the details. It's a hard thing to discuss in detail without sharing capabilities thoughts. If I don't give details or cite sources, then... it's just, like, my opinion, man. So my unsupported opinion is all I have to offer publicly. If you'd like to bet on it, I'm open to showing my confidence in my opinion by betting that the world turns out how I expect it to.

Yesterday Greg Sadler and I met with the President of the Australian Association of Voice Actors. Like us, they've been lobbying for more and better AI regulation from government. I was surprised how much overlap we had in concerns and potential solutions:
1. Transparency and explainability of AI model data use (concern)

2. Importance of interpretability (solution)

3. Mis/dis information from deepfakes (concern)

4. Lack of liability for the creators of AI if any harms eventuate (concern + solution)

5. Unemployment without safety nets for Australians (concern)

6.... (read more)

Problem of Old Evidence, the Paradox of Ignorance and Shapley Values

Paradox of Ignorance

Paul Christiano presents the "paradox of ignorance" where a weaker, less informed agent appears to outperform a more powerful, more informed agent in certain situations. This seems to contradict the intuitive desideratum that more information should always lead to better performance.

The example given is of two agents, one powerful and one limited, trying to determine the truth of a universal statement ∀x:ϕ(x) for some Δ0 formula ϕ. The limited agent treats each new valu... (read more)

Showing 3 of 8 replies (Click to show all)
kromem10

While I agree that the potential for AI (we probably need a better term than LLMs or transformers as multimodal models with evolving architectures grow beyond those terms) in exploring less testable topics as more testable is quite high, I'm not sure the air gapping on information can be as clean as you might hope.

Does the AI generating the stories of Napoleon's victory know about the historical reality of Waterloo? Is it using something like SynthID where the other AI might inadvertently pick up on a pattern across the stories of victories distinct from t... (read more)

2Alexander Gietelink Oldenziel
Beautifully illustrated and amusingly put, sir! A variant of what you are saying is that AI may once and for all allow us to calculate the true counterfactual     Shapley value of scientific contributions. ( re: ancestor simulations I think you are onto something here. Compare the Q hypothesis:     https://twitter.com/dalcy_me/status/1780571900957339771 see also speculations about Zhuangzi hypothesis here  )
3gwern
Yup. Who knows but we are all part of a giant leave-one-out cross-validation computing counterfactual credit assignment on human history? Schmidhuber-em will be crushed by the results.

I've made a big set of expert opinions on AI and my inferred percentages from them. I guess that some people will disagree with them. 

I'd appreciate hearing your criticisms so I can improve them or fill in entries I'm missing. 

https://docs.google.com/spreadsheets/d/1HH1cpD48BqNUA1TYB2KYamJwxluwiAEG24wGM2yoLJw/edit?usp=sharing

No data wall blocking GPT-5. That seems clear. For future models, will there be data limitations? Unclear.

https://youtube.com/clip/UgkxPCwMlJXdCehOkiDq9F8eURWklIk61nyh?si=iMJYatfDAZ_E5CtR 

The first thing I noticed with GPT-4o is that “her” appears ‘flirty’ especially the interview video demo. I wonder if it was done on purpose.

(This is the tale of a potentially reasonable CEO of the leading AGI company, not the one we have in the real world. Written after a conversation with @jdp.)

You’re the CEO of the leading AGI company. You start to think that your moat is not as big as it once was. You need more compute and need to start accelerating to give yourself a bigger lead, otherwise this will be bad for business.

You start to look around for compute, and realize you have 20% of your compute you handed off to the superalignment team (and even made a public commitment!). You end up ma... (read more)

So, you go to government and lobby. Except you never intended to help the government get involved in some kind of slow-down or pause. Your intent was to use this entire story as a mirage for getting rid of those who didn’t align with you and lobby the government in such a way that they don’t think it is such a big deal that your safety researchers are resigning.

You were never the reasonable CEO, and now you have complete power.

For anyone interested in Natural Abstractions type research: https://arxiv.org/abs/2405.07987

Claude summary:

Key points of "The Platonic Representation Hypothesis" paper:

  1. Neural networks trained on different objectives, architectures, and modalities are converging to similar representations of the world as they scale up in size and capabilities.

  2. This convergence is driven by the shared structure of the underlying reality generating the data, which acts as an attractor for the learned representations.

  3. Scaling up model size, data quantity, and task dive

... (read more)

Epistemic status: not a lawyer, but I've worked with a lot of them.

As I understand it, an NDA isn't enforceable against a subpoena (though the former employer can seek a protective order for the testimony).   Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...

simeon_c12469

Idea: Daniel Kokotajlo probably lost quite a bit of money by not signing an OpenAI NDA before leaving, which I consider a public service at this point. Could some of the funders of the AI safety landscape give some money or social reward for this?

I guess reimbursing everything Daniel lost might be a bit too much for funders but providing some money, both to reward the act and incentivize future safety people to not sign NDAs would have a very high value. 

Showing 3 of 20 replies (Click to show all)
2Isaac King
They didn't change their charter. https://forum.effectivealtruism.org/posts/2Dg9t5HTqHXpZPBXP/ea-community-needs-mechanisms-to-avoid-deceptive-messaging

Thanks, I hadn't seen that, I find it convincing.

4wassname
Notably, there are some lawyers here on LessWrong who might help (possibly even for the lols, you never know). And you can look at case law and guidance to see if clauses are actually enforceable or not (many are not). To anyone reading, here's habryka doing just that
William_SΩ731669

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to t... (read more)

Reply15932
Showing 3 of 33 replies (Click to show all)
8wassname
Are you familiar with USA NDA's? I'm sure there are lots of clauses that have been ruled invalid by case law? In many cases, non-lawyers have no ideas about these, so you might be able to make a difference with very little effort. There is also the possibility that valuable OpenAI shares could be rescued? If you haven't seen it, check out this thread where one of the OpenAI leavers did not sigh the gag order.
4PhilosophicalSoul
I have reviewed his post. Two (2) things to note:  (1) Invalidity of the NDA does not guarantee William will be compensated after the trial. Even if he is, his job prospects may be hurt long-term.  (2) State's have different laws on whether the NLRA trumps internal company memorandums. More importantly, labour disputes are traditionally solved through internal bargaining. Presumably, the collective bargaining 'hand-off' involving NDA's and gag-orders at this level will waive subsequent litigation in district courts. The precedent Habryka offered refers to hostile severance agreements only, not the waiving of the dispute mechanism itself.  I honestly wish I could use this dialogue as a discrete communication to William on a way out, assuming he needs help, but I re-affirm my previous worries on the costs.  I also add here, rather cautiously, that there are solutions. However, it would depend on whether William was an independent contractor, how long he worked there, whether it actually involved a trade secret (as others have mentioned) and so on. The whole reason NDA's tend to be so effective is because they obfuscate the material needed to even know or be aware of what remedies are available.  

Interesting! For most of us, this is outside our area of competence, so appreciate your input.

exanova-1-7

New funding idea: We need an AI rationalist-adjacent girlfriend!

O O3-2

Is this paper essentially implying the scaling hypothesis will converge to a perfect world model? https://arxiv.org/pdf/2405.07987

It says models trained on text modalities and image modalities both converge to the same representation with each training step. It also hypothesizes this is a brain like representation of the world. Ilya liked this paper so I’m giving it more weight. Am I reading too much into it or is it basically fully validating the scaling hypothesis?

Ruby132

As noted in an update on LW Frontpage Experiments! (aka "Take the wheel, Shoggoth!"), yesterday we started an AB test on some users automatically being switched over to the Enriched [with recommendations] Latest Posts feed.

The first ~18 hours worth of data does seem like a real uptick in clickthrough-rate, though some of that could be novelty.

(examining members of the test (n=921) and control groups (n~=3000) for the last month, the test group seemed to have a slightly (~7%) lower clickthrough-rate baseline, I haven't investigated this)

However the specific... (read more)

Ilya Sutskever has left OpenAI https://twitter.com/ilyasut/status/1790517455628198322

This is a brief follow-up to my post “Redirecting one’s own taxes as an effective altruism method.” Since I wrote that post:

  1. Scott Alexander boosted (not to be interpreted as endorsed) my post on Astral Codex Ten, which helped to give it more than typical reach.
  2. In a flinchy spasm of post-SBF timidity, GiveWell explicitly told me they did not want to get their hands dirty with my donations of redirected taxes any more.
  3. My tax arrears for 2013 ($5,932 original tax + ~$5,467 in interest & penalties) were annulled by the statute of limitations.
  4. I made a $5,93
... (read more)

[PHOTO] I sent 19 emails to politicians, had 4 meetings, and now I get emails like this. There is SO MUCH low hanging fruit in just doing this for 30 minutes a day (I would do it but my LTFF funding does not cover this). Someone should do this!

3niplav
The image is not showing.

Thanks for letting me know!

There are things I would buy if they existed. Is there any better way to signal this to potential sellers, other than tweeting it and hoping they hear? Is there some reason to believe that sellers are already gauging demand so completely that they wouldn't start selling these things even if I could get through to them? 

elifland4534

The word "overconfident" seems overloaded. Here are some things I think that people sometimes mean when they say someone is overconfident:

  1. They gave a binary probability that is too far from 50% (I believe this is the original one)
  2. They overestimated a binary probability (e.g. they said 20% when it should be 1%)
  3. Their estimate is arrogant (e.g. they say there's a 40% chance their startup fails when it should be 95%), or maybe they give an arrogant vibe
  4. They seem too unwilling to change their mind upon arguments (maybe their credal resilience is too high)
  5. They g
... (read more)

When I accuse someone of overconfidence, I usually mean they're being too hedgehogy when they should be being more foxy.

Load More