LESSWRONG
LW

All of Kabir Kumar's Comments + Replies

In sixth form, I wore a suit for 2 years. Was fun! Then, got kinda bored of suits

Recent AI model progress feels mostly like bullshit

Why does it seem very unlikely?

2Jazi Zilber2d

those conspiracies don't work most of the time "you can only keep a secret between two people, provided one of them is dead". the personal risk for anyone involved + the human psychological tendency to chat and to have a hard time holding on to immortal secrets mean it's usually irrational for both organisations to do intentional cheating.

How We Might All Die in A Year

Kabir Kumar11d1-1

The companies being merged and working together seems unrealistic.

1Greg C9d

They don't have a choice in the matter - it's forced by the government (nationalisation). This kind of thing has happened before in wartime (without the companies or people involved staging a rebellion).

Thoughts on “AI is easy to control” by Pope & Belrose

Kabir Kumar13d140

the fact that good humans have been able to keep rogue bad humans more-or-less under control

Isn't stuff like the transatlantic slave trade, genocide of native americans, etc evidence that the amount isn't sufficient??

shouldn't we try to get media attention?

Answer by Kabir KumarMar 23, 202530

pauseai, controlai, etc, are doing this

The Case Against AI Control Research

Kabir Kumar22d10

Helps me decide which research to focus on

Kabir Kumar's Shortform

Kabir Kumar1mo10

Both. Not sure, its something like lesswrong/EA speak mixed with the VC speak.

Kabir Kumar's Shortform

Kabir Kumar1mo21

What I liked about applying for VC funding was the specific questions.

"How is this going to make money?"

"What proof do you have this is going to make money"

and it being clear the bullshit that they wanted was numbers, testimonials from paying customers, unambiguous ways the product was actually better, etc. And then standard bs about progress, security, avoiding weird wibbly wobbly talk, 'woke', 'safety', etc.

With Alignment funders, they really obviously have language they're looking for as well, or language that makes them more and less willing to put more effort into understanding the proposal. Actually, they have it more than the VCs. But they act as if they don't.

Kabir Kumar's Shortform

Kabir Kumar1mo8-1

it's so unnecessarily hard to get funding in alignment.

they say 'Don't Bullshit' but what that actually means is 'Only do our specific kind of bullshit'.

and they don't specify because they want to pretend that they don't have their own bullshit

1evalu1mo

Have you felt this from your own experience trying to get funding, or from others, or both? Also, I'm curious what you think is their specific kind of bullshit, and if there's things you think are real but others thought to be bullshit.

5Dagon1mo

This seems generally applicable. Any significant money transaction includes expectations, both legible and il-, which some participants will classify as bullshit. Those holding the expectations may believe it to be legitimately useful, or semi-legitimately necessary due to lack of perfect alignment. If you want to specify a bit, we can probably guess at why it's being required.

Study Guide

Kabir Kumar1mo30

I would not call this a "Guide".

It's more a list of recommendations and some thoughts on them.

A Bear Case: My Predictions Regarding AI Progress

Kabir Kumar1mo50

What observations would change your mind?

2Thane Ruthenis1mo

See here.

The Hidden Cost of Our Lies to AI

Kabir Kumar1mo30

You can split your brain and treat LLMs differently, in a different language. Rather, I can and I think most people could as well

Challenges with Breaking into MIRI-Style Research

Kabir Kumar1mo10

Ok, I want to make that at scale. If multiple people have done it and there's value in it, then there is a formula of some kind.

We can write it down, make it much easier to understand unambiguously (read: less unhelpful confusion about what to do or what the writer meant and less time wasted figuring that out) than any of the current agent foundations type stuff.

I'm extremely skeptical that needing to hear a dozen stories dancing around some vague ideas of a point and then 10 analogies (exagerrating to get emotions across) is the best we can do.

The Compliment Sandwich 🥪 aka: How to criticize a normie without making them upset.

Kabir Kumar1mo107

regardless of if it works, I think it's disrespectful for being manipulative at worst and wasting the persons time at best.

The Compliment Sandwich 🥪 aka: How to criticize a normie without making them upset.

Kabir Kumar1mo119

You can just say the actual criticism in a constructive way. Or if you don't know how to, just ask - "hey I have some feedback to give that I think would help, but I don't know how to say it without it potentially sounding bad - can I tell you and you know I don't dislike you and I don't mean to be disrespectful?" and respect it if they say no, they're not interested.

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

Kabir Kumar1mo40

yup.

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

Kabir Kumar1mo50

Multiple talented researchers I know got into alignment because of PauseAI.

The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better

Kabir Kumar1mo10

You can also give them the clipboard and pen, works well

Kabir Kumar's Shortform

Kabir Kumar2mo10

in general, when it comes to things which are the 'hard part of alignment', is the crux
```
a flawless method of ensuring the AI system is pointed at and will always continue to be pointed at good things
```
?
the key part being flawless - and that seeming to need a mathematical proof?

Kabir Kumar's Shortform

Kabir Kumar2mo10

Trying to put together a better explainer for the hard part of alignment, while not having a good math background https://docs.google.com/document/d/1ePSNT1XR2qOpq8POSADKXtqxguK9hSx_uACR8l0tDGE/edit?usp=sharing
Please give feedback!

The Risk of Gradual Disempowerment from AI

Kabir Kumar2mo21

Make the (!aligned!) AGI solve a list of problems, then end all other AIs, convince (!harmlessly!) all humans to never make another AI, in a way that they will pass down to future humans, then end itself.

Steering Gemini with BiDPO

Kabir Kumar2mo136

Thank you for sharing negative results!!

The Gentle Romance

Kabir Kumar2mo10

Sure? I agree this is less bad than 'literally everyone dying and that's it', assuming there's humans around, living, still empowered, etc in the background.

I was saying overall, as a story, I find it horrifying, especially contrasting with how some seem to see it utopic.

3Raemon2mo

Nod. I'm just answering your question of why I consider it optimistic.

The Gentle Romance

Kabir Kumar2mo23

Sure, but it seems like everyone died at some point anyway, and some collective copies of them went on?
I don't think so. I think they seem to be extremely lonely and sad and the AIs are the only way for them to get any form of empowerment. And each time they try to inch further with empowering themselves with the AIs, it leads to the AI actually getting more powerful and themselves only getting a brief moment of more power, but ultimately degrading in mental capacity. And needing to empower the AI more and more, like an addict needing an ever g

... (read more)

2Raemon2mo

Yeah, but I'm contrasting this with (IMO more likely) futures where everyone dies, and nothing that's remotely like a human copy goes on. Even if you conceptualize it as "these people died", I think there are much worse possibilities for what sort of entity continues into the future. (i.e. a non sentient AI with no human/social/creative/emotional values, that just tiles the universe with simple struggles). or "this story happens, but with even less agency and more blatantly dystopian outcomes.") [of course, the reason I described this as "optimistic" instead of "less pessimistic than I expect” is that I don't think the characters died, I think if you slowly augment yourself with AI tools, the pattern of you counts as "you" even as it starts to be instantiated in silicon, so I think this is just a pretty good outcome. I also think the world (implies) many people thinking about moral / personhood philosophy before taking the final plunge. I don't think there's anything even plausibly wrong with the first couple chunks, and I think the second half contains a lot of qualifiers (such as integrating his multiple memories into a central node) that make it pretty unobjectionable. I realize you don't believe that, and, seems fine for you to see it as horror. It's been awhile since I discussed the "does a copy of you count as you" and I might be up for discussing that if you want to argue about it, but also seems fine to leave as-is]

The Gentle Romance

Kabir Kumar2mo21

How is this optimistic.

2Jozdien2mo

I would be curious whether you consider The Gentle Seduction to be optimistic. I think it has fewer elements that you mentioned finding dystopian in another comment, but I find the two trajectories similarly good.

Raemon2mo*133

Well, in this world:

1. AI didn't just kill everyone 5% of the way through the story

2. IMO, the characters in this story basically get the opportunity to reflect on what is good for them before taking each additional step. (they maybe feel some pressure to Keep Up With The Joneses, re: AI assisted thinking. But, that pressure isn't super crazy strong. Like the character's boss isn't strongly implying that if they don't take these upgrades they lose their job.)

3. Even if you think the way the characters are making their choices here are more dystopian and th... (read more)

The Gentle Romance

Kabir Kumar2mo21

Oh yes. It's extremely dystopian. And extremely lonely, too. Rather than having a person, actual people around him to help, his only help comes from tech. It's horrifyingly lonely and isolated. There is no community, only tech.

Also, when they died together, it was horrible. They literally offloaded more and more of themselves into their tech until they were powerless to do anything but die. I don't buy the whole 'the thoughts were basically them' thing at all. It was at best, some copy of them.

There can be made an argument for it qualitatively being them, but quantitatively, obviously not.

The Gentle Romance

Kabir Kumar2mo01

A few months later, he and Elena decide to make the jump to full virtuality. He lies next to Elena in the hospital, holding her hand, as their physical bodies drift into a final sleep. He barely feels the transition

this is horrifying. Was it intentionally made that way?

Kabir Kumar's Shortform

Kabir Kumar2mo10

Thoughts on this?

### Limitations of HHH and other Static Dataset benchmarks

A Static Dataset is a dataset which will not grow or change - it will remain the same. Static dataset type benchmarks are inherently limited in what information they will tell us about a model. This is especially the case when we care about AI Alignment and want to measure how 'aligned' the AI is.

### Purpose of AI Alignment Benchmarks

When measuring AI Alignment, our aim is to find out exactly how close the model is to being the ultimate 'aligned' model that we're seeking - a model w... (read more)

Kabir Kumar's Shortform

Kabir Kumar3mo40

this might basically be me, but I'm not sure how exactly to change for the better. theorizing seems to take time and money which i don't have.

Kabir Kumar's Shortform

Kabir Kumar3mo40

Thinking about judgement criteria for the coming ai safety evals hackathon (https://lu.ma/xjkxqcya )
These are the things that need to be judged:
1. Is the benchmark actually measuring alignment (the real, scale, if we dont get this fully right right we die, problem)
2. Is the way of Deceiving the benchmark to get high scores actually deception, or have they somehow done alignment?

Both of these things need:
- a strong deep learning & ml background (ideally, muliple influential papers where they're one of the main authors/co-authors, or do... (read more)

Kabir Kumar's Shortform

Kabir Kumar3mo10

I'm looking for feedback on the hackathon page
mind telling me what you think?
https://docs.google.com/document/d/1Wf9vju3TIEaqQwXzmPY--R0z41SMcRjAFyn9iq9r-ag/edit?usp=sharing

Why I'm Moving from Mechanistic to Prosaic Interpretability

Kabir Kumar3mo10

Intelligence is computation. It's measure is success. General intelligence is more generally successful.

Kabir Kumar's Shortform

Kabir Kumar3mo10

https://kkumar97.blogspot.com/2025/01/pain-of-writing.html

Shallow review of live agendas in alignment & safety

Kabir Kumar3mo20

We're doing this on https://ai-plans.com !

3Alex_Altair3mo

FWIW I can't really tell what this website is supposed to be/do by looking at the landing page and menu

johnswentworth's Shortform

Kabir Kumar3mo3-3

Personally, I think o1 is uniquely trash, I think o1-preview was actually better. Getting on average, better things from deepseek and sonnet 3.5 atm.

Oliver Daniels-Koch's Shortform

Kabir Kumar3mo10

I like bluesky for this atm

Kabir Kumar's Shortform

Kabir Kumar3mo10

I'd like some feedback on my theory of impact for my currently chosen research path

**End goal**: Reduce x-risk from AI and risk of human disempowerment.
for x-risk:
- solving AI alignment - very important,
- knowing exactly how well we're doing in alignment, exactly how close we are to solving it, how much is left, etc seems important.
- how well different methods work,
- which companies are making progress in this, which aren't, which are acting like they're making progress vs actually making progress, etc
- put all on ... (read more)

5Daniel Tan3mo

What is the proposed research path and its theory of impact? It’s not clear from reading your note / generally seems too abstract to really offer any feedback

Kabir Kumar's Shortform

Kabir Kumar4mo30

Fair enough. Personally, so far, I've found Jaynes more comprehensible than The Sequences.

3Nathan Helm-Burger4mo

I think most people with a natural inclination towards math probably would feel likewise.

Kabir Kumar's Shortform

Kabir Kumar4mo01

I'm finally reading The Sequences and it screams midwittery to me, I'm sorry.

Compare this:
to Jaynes:

Jaynes is better organized, more respectful to the reader, more respectful to the work he's building on and more useful

9Nathan Helm-Burger4mo

The Sequences highly praise Jaynes and recommend reading his work directly. The Sequences aren't trying to be a replacement, they're trying to be a pop sci intro to the style of thinking. An easier on-ramp. If Jaynes already seems exciting and comprehensible to you, read that instead of the Sequences on probability.

Kabir Kumar's Shortform

Kabir Kumar4mo10

I think this is a really good opportunity to work on a topic you might not normally work on, with people you might not normally work with, and have a big impact: https://lu.ma/sjd7r89v

I'm running the event because I think this is something really valuable and underdone.

Stupid Question: Why am I getting consistently downvoted?

Kabir Kumar4mo10

Pretty much drove me away from wanting to post non alignment stuff here.

Stupid Question: Why am I getting consistently downvoted?

Kabir Kumar4mo11

That seems unhelpful then? Probably best to express that frustration to a friend or someone who'd sympathize.

Seeking Collaborators

Kabir Kumar4mo00

Thank you for continuing this very important work.

Kabir Kumar's Shortform

Kabir Kumar4mo30

ok, options.
- Review of 108 ai alignment plans
- write-up of Beyond Distribution - planned benchmark for alignment evals beyond a models distribution, send to the quant who just joined the team who wants to make it
- get familiar with the TPUs I just got access to
- run hhh and it's variants, testing the idea behind Beyond Distribution, maybe make a guide on itr
- continue improving site design

- fill out the form i said i was going to fill out and send today
- make progress on cross coders - would prob need to get familiar with those tpus
- writeup o... (read more)

[New Feature] Your Subscribed Feed

[+]Kabir Kumar4mo-6-3

Automatically finding feature vectors in the OV circuits of Transformers without using probing

Kabir Kumar4mo10

I think the Conclusion could serve well as an abstract

Automatically finding feature vectors in the OV circuits of Transformers without using probing

Kabir Kumar4mo10

An abstract which is easier to understand and a couple sentences at each section that explain their general meaning and significance would make this much more accessible

1Kabir Kumar4mo

I think the Conclusion could serve well as an abstract

AI & Liability Ideathon

Kabir Kumar4mo10

I plan to send the winning proposals from this to as many governing bodies/places that are enacting laws as possible - one country is lined up atm.

AI & Liability Ideathon

Kabir Kumar4mo10

Let me know if you have any questions!

Yonatan Cale's Shortform

Kabir Kumar4mo10

options to vary rules/environment/language as well, to see how the alignment generalizes ood. will try this today

1Yonatan Cale4mo

This all sounds pretty in-distribution for an LLM, and also like it avoids problems like "maybe thinking in different abstractions" [minecraft isn't amazing at this either, but at least has a bit], "having the AI act/think way faster than a human", "having the AI be clearly superhuman". I'm less interested in "will the AI say it kills its friend" (in a situation that very clearly involves killing and a person and perhaps a very clear tradeoff between that and having 100 more gold that can be used for something else), I'm more interested in noticing if it has a clear grasp of what people care about or mean. The example of chopping down the tree house of the player in order to get wood (which the player wanted to use for the tree house) is a nice toy example of that. The AI would never say "I'll go cut down your tree house", but it.. "misunderstood" [not the exact word, but I'm trying to point at something here] wdyt?