Quick Takes

The acceptable tone of voice here feels like 3mm wide to me. I'm always having bad manners

A list of some contrarian takes I have:

  • People are currently predictably too worried about misuse risks

  • What people really mean by "open source" vs "closed source" labs is actually "responsible" vs "irresponsible" labs, which is not affected by regulations targeting open source model deployment.

  • Neuroscience as an outer alignment[1] strategy is embarrassingly underrated.

  • Better information security at labs is not clearly a good thing, and if we're worried about great power conflict, probably a bad thing.

  • Much research on deception (Anthropic's re

... (read more)
Reply1822221111

My timelines are lengthening. 

I've long been a skeptic of scaling LLMs to AGI *. To me I fundamentally don't understand how this is even possible. It must be said that very smart people give this view credence. davidad, dmurfet. on the other side are vanessa kosoy and steven byrnes. When pushed proponents don't actually defend the position that a large enough transformer will create nanotech or even obsolete their job. They usually mumble something about scaffolding.

I won't get into this debate here but I do want to note that my timelines have lengthe... (read more)

Showing 3 of 9 replies (Click to show all)
2Alexander Gietelink Oldenziel
Yes agreed. What I don't get about this position: If it was indeed just scaling - what's AI research for ? There is nothing to discover, just scale more compute. Sure you can maybe improve the speed of deploying compute a little but at the core of it it seems like a story that's in conflict with itself?
3Adam Shai
Lengthening from what to what?

I've never done explicit timelines estimates before so nothing to compare to. But since it's a gut feeling anyway, I'm saying my gut feeling is lengthening.

Nisan52

12 years ago, in The state of Computer Vision and AI: we are really, really far away, Andrej Karpathy wrote:

The picture above is funny.

But for me it is also one of those examples that make me sad about the outlook for AI and for Computer Vision. What would it take for a computer to understand this image as you or I do? [...]

In any case, we are very, very far and this depresses me. What is the way forward? :(

I just asked gpt-4o what's going on in the picture, and it understood most of it:

In this image, a group of men in business attire are seen in a l

... (read more)
Nisan42

Of course, Karpathy's post could be in the multimodal training data.

The word "overconfident" seems overloaded. Here are some things I think that people sometimes mean when they say someone is overconfident:

  1. They gave a binary probability that is too far from 50% (I believe this is the original one)
  2. They overestimated a binary probability (e.g. they said 20% when it should be 1%)
  3. Their estimate is arrogant (e.g. they say there's a 40% chance their startup fails when it should be 95%), or maybe they give an arrogant vibe
  4. They seem too unwilling to change their mind upon arguments (maybe their credal resilience is too high)
  5. They g
... (read more)

I expect (~ 75%) that the decision to "funnel" EAs into jobs at AI labs will become a contentious community issue in the next year. I think that over time more people will think it is a bad idea. This may have PR and funding consequences too.

2Garrett Baker
This has been a disagreement people have had for many years. Why expect it to come to a head this year?

More people are going to quit labs / OpenAI. Will EA refill the leaky funnel?

[PHOTO] I sent 19 emails to politicians, had 4 meetings, and now I get emails like this. There is SO MUCH low hanging fruit in just doing this for 30 minutes a day (I would do it but my LTFF funding does not cover this). Someone should do this!

RobertM3317

Vaguely feeling like OpenAI might be moving away from GPT-N+1 release model, for some combination of "political/frog-boiling" reasons and "scaling actually hitting a wall" reasons.  Seems relevant to note, since in the worlds where they hadn't been drip-feeding people incremental releases of slight improvements over the original GPT-4 capabilities, and instead just dropped GPT-5 (and it was as much of an improvement over 4 as 4 was over 3, or close), that might have prompted people to do an explicit orientation step.  As it is, I expect less of t... (read more)

Showing 3 of 6 replies (Click to show all)
2Matthew Barnett
I'm not sure if you'd categorize this under "scaling actually hitting a wall" but the main possibility that feels relevant in my mind is that progress simply is incremental in this case, as a fact about the world, rather than being a strategic choice on behalf of OpenAI. When underlying progress is itself incremental, it makes sense to release frequent small updates. This is common in the software industry, and would not at all be surprising if what's often true for most software development holds for OpenAI as well. (Though I also expect GPT-5 to be medium-sized jump, once it comes out.)
2Aaron_Scher
Sam Altman and OpenAI have both said they are aiming for incremental releases/deployment for the primary purpose of allowing society to prepare and adapt. Opposed to, say, dropping large capabilities jumps out of the blue which surprise people.  I think "They believe incremental release is safer because it promotes societal preparation" should certainly be in the hypothesis space for the reasons behind these actions, along with scaling slowing and frog-boiling. My guess is that it is more likely than both of those reasons (they have stated it as their reasoning multiple times; I don't think scaling is hitting a wall).
jmh20

I wonder if that is actually a sound view though. I just started reading Like War (interesting and seems correct/on target so far but really just starting it). Given the subject area of impact, reaction and use of social media and networking technologies and the general results socially, seems like society generally is not really even yet prepared and adapted for that inovation. If all the fears about AI are even close to getting things right I suspect the "allowing society to prepare and adapt" suggests putting everything on hold, freezing in place, for a... (read more)

How can you make the case that a model is safe to deploy? For now, you can do risk assessment and notice that it doesn't have dangerous capabilities. What about in the future, when models do have dangerous capabilities? Here are four options:

  1. Implement safety measures as a function of risk assessment results, such that the measures feel like they should be sufficient to abate the risks
    1. This is mostly what Anthropic's RSP does (at least so far — maybe it'll change when they define ASL-4)
  2. Use risk assessment techniques that evaluate safety given deployment safe
... (read more)

I made this collage of people I think are cool and put it in my room. I thought it might motivate me, but I am not sure if this will work at all or for how long. Feel free to steal. Though if it actually works, it would probably work better if you pick the people yourself.

Emrik10

I did nearly this in ~2015. I made a folder with pictures of inspiring people (it had Eliezer Yudkowsky, Brian Tomasik, David Pearce, Grigori Perelman, Feynman, more idr), and used it as my desktop background or screensaver or both (idr).

I say this because I am surprised at how much our thoughts/actions have converged, and wish to highlight examples that demonstrate this. And I wish to communicate that because basically senpai notice me. kya.

Be Confident in your Processes

I thought a lot about what kinds of things make sense for me to do to solve AI alignment. That did not make me confident that any particular narrow idea that I have will eventually lead to something important.

Rather, I'm confident that executing my research process will over time lead to something good. The research process is:

  1. Take some vague intuitions
  2. Iteratively unroll them into something concrete
  3. Update my models based on new observations I make during this overall process.

I think being confident, i.e. not feeling hop... (read more)

2Emrik
This exact thought, from my diary in ~June 2022: "I advocate keeping a clear separation between how confident you are that your plan will work with how confident you are that pursuing the plan is optimal."
1Johannes C. Mayer
I think this is a useful model. If I understand correctly what you're saying, then it is that for any particular thing we can think about whether that thing is optimal to do, and whether I could get this thing to work seperately. I think what I was saying is different. I was advocating confidence not at the object level of some concrete things you might do. Rather I think being confident in the overall process that you engage in to make process is a thing that you can have confidence in. Imagine there is a really good researcher, but now this person forgets everything that they ever researched, except for their methodology. It some sense they still know how to do research. If they fill in some basic factual knowledge in their brain, which I expect wouldn't take that long, I expect they would be able to continue being an effective researcher.
Emrik10

I wrote the entry in the context of the question "how can I gain the effectiveness-benefits of confidence and extreme ambition, without distorting my world-model/expectations?"

I had recently been discovering abstract arguments that seemed to strongly suggest it would be most altruistic/effective for me to pursue extremely ambitious projects;  both because 1) the low-likelihood high-payoff quadrant had highest expected utility, but also because 2) the likelihood of success for extremely ambitious projects seemed higher than I thought.  (Plus some other reasons.)  I figured that I needn't feel confident about success in order to feel confident about the approach.

Wildlife Welfare Will Win

The long arc of history bend towards gentleness and compassion. Future generations will look with horror on factory farming. And already young people are following this moral thread to its logical conclusion; turning their eyes in disgust to mother nature, red in tooth and claw. Wildlife Welfare Done Right, compassion towards our pets followed to its forceful conclusion would entail the forced uploading of all higher animals, and judging by the memetic virulences of shrimp welfare to lower animals as well. 

Morality-upon-reflex... (read more)

In Magna Alta Doctrina Jacob Cannell talks about exponential gradient descent as a way of approximating solomonoff induction using ANNs

While that approach is potentially interesting by itself, it's probably better to stay within the real algebra. The Solmonoff style partial continuous update for real-valued weights would then correspond to a multiplicative weight update rather than an additive weight update as in standard SGD.

Has this been tried/evaluated? Why actually yes - it's called exponentiated gradient descent, as exponentiating the result of addi

... (read more)

What are you Doing? What did you Plan?

[Suno]

What are you doing? What did you plan? Are they aligned? If not then comprehend, if what you are doing now is better than the original thing. Be open-minded about, what is the optimal thing.

Don't fix the bottom line too: "Whatever the initial plan was is the best thing to do."

There are sub-agents in your mind. You don't want to fight, with them, as usually they win in the end. You might then just feel bad and don't even understand why. As a protective skin your sub-agent hides, the reasons for why, you feel so b... (read more)

Project idea: virtual water coolers for LessWrong

Previous: Virtual water coolers

Here's an idea: what if there was a virtual water cooler for LessWrong?

  • There'd be Zoom chats with three people per chat. Each chat is a virtual water cooler.
  • The user journey would begin by the user expressing that they'd like to join a virtual water cooler.
  • Once they do, they'd be invited to join one.
  • I think it'd make sense to restrict access to users based on karma. Maybe only 100+ karma users are allowed.
  • To start, that could be it. In the future you could do some investigation
... (read more)
Yitz20

Personally I think this would be pretty cool!

Linch358

(x-posted from the EA Forum)

We should expect that the incentives and culture for AI-focused companies to make them uniquely terrible for producing safe AGI. 

From a “safety from catastrophic risk” perspective, I suspect an “AI-focused company” (e.g. Anthropic, OpenAI, Mistral) is abstractly pretty close to the worst possible organizational structure for getting us towards AGI. I have two distinct but related reasons:

  1. Incentives
  2. Culture

From an incentives perspective, consider realistic alternative organizational structures to “AI-focused company” tha... (read more)

Linch61

I'm interested in what people think of are the strongest arguments against this view. Here are a few counterarguments that I'm aware of: 

1. Empirically the AI-focused scaling labs seem to care quite a lot about safety, and make credible commitments for safety. If anything, they seem to be "ahead of the curve" compared to larger tech companies or governments.

2. Government/intergovernmental agencies, and to a lesser degree larger companies, are bureaucratic and sclerotic and generally less competent. 

3. The AGI safety issues that EAs worry about th... (read more)

12mesaoptimizer
On the other hand, institutional scars can cause what effectively looks like institutional traumatic responses, ones that block the ability to explore and experiment and to try to make non-incremental changes or improvements to the status quo, to the system that makes up the institution, or to the system that the institution is embedded in. There's a real and concrete issue with the amount of roadblocks that seem to be in place to prevent people from doing things that make gigantic changes to the status quo. Here's a simple example: would it be possible for people to get a nuclear plant set up in the United States within the next decade, barring financial constraints? Seems pretty unlikely to me. What about the FDA response to the COVID crisis? That sure seemed like a concrete example of how 'institutional memories' serve as gigantic roadblocks to the ability for our civilization to orient and act fast enough to deal with the sort of issues we are and will be facing this century. In the end, capital flows towards AGI companies for the sole reason that it is the least bottlenecked / regulated way to multiply your capital, that seems to have the highest upside for the investors. If you could modulate this, you wouldn't need to worry about the incentives and culture of these startups as much.
2dr_s
You're right, but while those heuristics of "better safe than sorry" might be too conservative for some fields, they're pretty spot on for powerful AGI, where the dangers of failure vastly outstrip opportunity costs.
William_SΩ731669

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to t... (read more)

Reply15931
Showing 3 of 32 replies (Click to show all)
8wassname
Are you familiar with USA NDA's? I'm sure there are lots of clauses that have been ruled invalid by case law? In many cases, non-lawyers have no ideas about these, so you might be able to make a difference with very little effort. There is also the possibility that valuable OpenAI shares could be rescued? If you haven't seen it, check out this thread where one of the OpenAI leavers did not sigh the gag order.

I have reviewed his post. Two (2) things to note: 

(1) Invalidity of the NDA does not guarantee William will be compensated after the trial. Even if he is, his job prospects may be hurt long-term. 

(2) State's have different laws on whether the NLRA trumps internal company memorandums. More importantly, labour disputes are traditionally solved through internal bargaining. Presumably, the collective bargaining 'hand-off' involving NDA's and gag-orders at this level will waive subsequent litigation in district courts. The precedent Habryka offered re... (read more)

2wassname
It could just be because it reaches a strong conclusion on anecdotal/clustered evidence (e.g. it might say more about her friend group than anything else). Along with claims to being better calibrated for weak reasons - which could be true, but seems not very epistemically humble. Full disclosure I downvoted karma, because I don't think it should be top reply, but I did not agree or disagree. But Jen seems cool, I like weird takes, and downvotes are not a big deal - just a part of a healthy contentious discussion.

I had the impression that SPAR was focused on UC Berkeley undergrads and had therefore dismissed the idea of being a SPAR mentor or mentee. It was only recently that I looked at the website when someone mentioned that they wanted to learn from this one SPAR mentor, and then I looked at the website, and SPAR now seems to focus on the same niche as AI Safety Camp.

Did SPAR pivot in the past six months, or did I just misinterpret SPAR when I first encountered it?

I’m confused: if the dating apps keep getting worse, how come nobody has come up with a good one, or at least a clone of OkCupid? Like, as far as I can understand not even “a good matching system is somehow less profitable than making people swipe all the time (surely it’d still be profitable on the absolute scale)” or “it requires a decently big initial investment” can explain a complete lack of good products in a very demanded area. Has anyone digged into it / tried to start a good dating app as a summer project?

Showing 3 of 11 replies (Click to show all)
3Shoshannah Tekofsky
These are quizzes you make yourself. Did OKC ever have those? It's not for a matching percentage. A quiz in paiq is 6 questions, 3 multiple choice and 3 open. If someone gets the right answer on the multiple choice, then you get to see their open question answers as a match request, and you can accept or reject the match based in that. I think it's really great. You can also browse other people's tests and see if you want to take any. The tests seem more descriptive of someone than most written profiles I've read cause it's much harder to misrepresent personal traits in a quiz then in a self-declared profile

Hm. You could make quizzes yourself, but that was some effort. It seems the paiq quizzes are standardized and easy to make. Nice. Many Okcupid tests were more like MBTI tests. Here is where people are discussing one of the bigger ones. 

4Gunnar_Zarncke
People try new dating platforms all the time. It's what Y Combinator calls a tarpit. The problem sounds solvable, but the solution is elusive. As I have said elsewhere: Dating apps are broken because the incentives of the usual core approach don't work. On the supplier side: Misaligned incentives (keep users on the platform) and opaque algorithms lead to bad matches.  On the demand side: Misaligned incentives (first impressions, low cost to exit) and no plausible deniability lead to predators being favored.

It's not obvious to me why training LLMs on synthetic data produced by other LLMs wouldn't work (up to a point).  Under the model where LLMs are gradient-descending their way into learning algorithms that predict tokens that are generated by various expressions of causal structure in the universe, tokens produced by other LLMs don't seem redundant with respect to the data used to train those LLMs.  LLMs seem pretty different from most other things in the universe, including the data used to train them!  It would surprise me if the algorithms... (read more)

Load More