All of Kabir Kumar's Comments + Replies

In sixth form, I wore a suit for 2 years. Was fun! Then, got kinda bored of suits

2Jazi Zilber
those conspiracies don't work most of the time "you can only keep a secret between two people, provided one of them is dead".   the personal risk for anyone involved + the human psychological tendency to chat and to have a hard time holding on to immortal secrets mean it's usually irrational for both organisations to do intentional cheating. 

The companies being merged and working together seems unrealistic. 

1Greg C
They don't have a choice in the matter - it's forced by the government (nationalisation). This kind of thing has happened before in wartime (without the companies or people involved staging a rebellion).

the fact that good humans have been able to keep rogue bad humans more-or-less under control

Isn't stuff like the transatlantic slave trade, genocide of native americans, etc evidence that the amount isn't sufficient?? 

Answer by Kabir Kumar30

pauseai, controlai, etc, are doing this

Helps me decide which research to focus on

Both. Not sure, its something like lesswrong/EA speak mixed with the VC speak. 

What I liked about applying for VC funding was the specific questions. 

"How is this going to make money?"

"What proof do you have this is going to make money"

and it being clear the bullshit that they wanted was numbers, testimonials from paying customers, unambiguous ways the product was actually better, etc. And then standard bs about progress, security, avoiding weird wibbly wobbly talk, 'woke', 'safety', etc. 

With Alignment funders, they really obviously have language they're looking for as well, or language that makes them more and less willing to put more effort into understanding the proposal. Actually, they have it more than the VCs. But they act as if they don't. 

it's so unnecessarily hard to get funding in alignment.

they say 'Don't Bullshit' but what that actually means is 'Only do our specific kind of bullshit'.

and they don't specify because they want to pretend that they don't have their own bullshit

1evalu
Have you felt this from your own experience trying to get funding, or from others, or both? Also, I'm curious what you think is their specific kind of bullshit, and if there's things you think are real but others thought to be bullshit. 
5Dagon
This seems generally applicable.  Any significant money transaction includes expectations, both legible and il-, which some participants will classify as bullshit.  Those holding the expectations may believe it to be legitimately useful, or semi-legitimately necessary due to lack of perfect alignment. If you want to specify a bit, we can probably guess at why it's being required.

I would not call this a "Guide". 

It's more a list of recommendations and some thoughts on them. 

What observations would change your mind? 

2Thane Ruthenis
See here.

You can split your brain and treat LLMs differently, in a different language. Rather, I can and I think most people could as well

Ok, I want to make that at scale. If multiple people have done it and there's value in it, then there is a formula of some kind. 

We can write it down, make it much easier to understand unambiguously (read: less unhelpful confusion about what to do or what the writer meant and less time wasted figuring that out) than any of the current agent foundations type stuff. 

I'm extremely skeptical that needing to hear a dozen stories dancing around some vague ideas of a point and then 10 analogies (exagerrating to get emotions across) is the best we can do. 

regardless of if it works, I think it's disrespectful for being manipulative at worst and wasting the persons time at best.

You can just say the actual criticism in a constructive way. Or if you don't know how to, just ask - "hey I have some feedback to give that I think would help, but I don't know how to say it without it potentially sounding bad - can I tell you and you know I don't dislike you and I don't mean to be disrespectful?" and respect it if they say no, they're not interested. 

Multiple talented researchers I know got into alignment because of PauseAI. 

You can also give them the clipboard and pen, works well

in general, when it comes to things which are the 'hard part of alignment', is the crux 
```
a flawless method of ensuring the AI system is pointed at and will always continue to be pointed at good things
```
?
the key part being flawless - and that seeming to need a mathematical proof?

Trying to put together a better explainer for the hard part of alignment, while not having a good math background https://docs.google.com/document/d/1ePSNT1XR2qOpq8POSADKXtqxguK9hSx_uACR8l0tDGE/edit?usp=sharing
Please give feedback!

Make the (!aligned!) AGI solve a list of problems, then end all other AIs, convince (!harmlessly!) all humans to never make another AI, in a way that they will pass down to future humans, then end itself. 

Thank you for sharing negative results!! 

Sure? I agree this is less bad than 'literally everyone dying and that's it', assuming there's humans around, living, still empowered, etc in the background. 

I was saying overall, as a story, I find it horrifying, especially contrasting with how some seem to see it utopic. 

3Raemon
Nod. I'm just answering your question of why I consider it optimistic. 
  1. Sure, but it seems like everyone died at some point anyway, and some collective copies of them went on? 

     

  2. I don't think so. I think they seem to be extremely lonely and sad and the AIs are the only way for them to get any form of empowerment. And each time they try to inch further with empowering themselves with the AIs, it leads to the AI actually getting more powerful and themselves only getting a brief moment of more power, but ultimately degrading in mental capacity. And needing to empower the AI more and more, like an addict needing an ever g

... (read more)
2Raemon
Yeah, but I'm contrasting this with (IMO more likely) futures where everyone dies, and nothing that's remotely like a human copy goes on. Even if you conceptualize it as "these people died", I think there are much worse possibilities for what sort of entity continues into the future. (i.e. a non sentient AI with no human/social/creative/emotional values, that just tiles the universe with simple struggles).  or "this story happens, but with even less agency and more blatantly dystopian outcomes.") [of course, the reason I described this as "optimistic" instead of "less pessimistic than I expect” is that I don't think the characters died, I think if you slowly augment yourself with AI tools, the pattern of you counts as "you" even as it starts to be instantiated in silicon, so I think this is just a pretty good outcome. I also think the world (implies) many people thinking about moral / personhood philosophy before taking the final plunge. I don't think there's anything even plausibly wrong with the first couple chunks, and I think the second half contains a lot of qualifiers (such as integrating his multiple memories into a central node) that make it pretty unobjectionable. I realize you don't believe that, and, seems fine for you to see it as horror. It's been awhile since I discussed the "does a copy of you count as you" and I might be up for discussing that if you want to argue about it, but also seems fine to leave as-is]

How is this optimistic. 

2Jozdien
I would be curious whether you consider The Gentle Seduction to be optimistic. I think it has fewer elements that you mentioned finding dystopian in another comment, but I find the two trajectories similarly good.
Raemon*133

Well, in this world:

1. AI didn't just kill everyone 5% of the way through the story

2. IMO, the characters in this story basically get the opportunity to reflect on what is good for them before taking each additional step. (they maybe feel some pressure to Keep Up With The Joneses, re: AI assisted thinking. But, that pressure isn't super crazy strong. Like the character's boss isn't strongly implying that if they don't take these upgrades they lose their job.)

3. Even if you think the way the characters are making their choices here are more dystopian and th... (read more)

Oh yes. It's extremely dystopian. And extremely lonely, too. Rather than having a person, actual people around him to help, his only help comes from tech. It's horrifyingly lonely and isolated. There is no community, only tech. 

Also, when they died together, it was horrible. They literally offloaded more and more of themselves into their tech until they were powerless to do anything but die. I don't buy the whole 'the thoughts were basically them' thing at all. It was at best, some copy of them. 

There can be made an argument for it qualitatively being them, but quantitatively, obviously not. 

A few months later, he and Elena decide to make the jump to full virtuality. He lies next to Elena in the hospital, holding her hand, as their physical bodies drift into a final sleep. He barely feels the transition

this is horrifying. Was it intentionally made that way?

Thoughts on this?


### Limitations of HHH and other Static Dataset benchmarks

A Static Dataset is a dataset which will not grow or change - it will remain the same. Static dataset type benchmarks are inherently limited in what information they will tell us about a model. This is especially the case when we care about AI Alignment and want to measure how 'aligned' the AI is.

### Purpose of AI Alignment Benchmarks

When measuring AI Alignment, our aim is to find out exactly how close the model is to being the ultimate 'aligned' model that we're seeking - a model w... (read more)

this might basically be me, but I'm not sure how exactly to change for the better. theorizing seems to take time and money which i don't have. 

Thinking about judgement criteria for the coming ai safety evals hackathon (https://lu.ma/xjkxqcya )
These are the things that need to be judged: 
1. Is the benchmark actually measuring alignment (the real, scale, if we dont get this fully right right we die, problem) 
2. Is the way of Deceiving the benchmark to get high scores actually deception, or have they somehow done alignment?

Both of these things need: 
- a strong deep learning & ml background (ideally, muliple influential papers where they're one of the main authors/co-authors, or do... (read more)

 I'm looking for feedback on the hackathon page
mind telling me what you think?
https://docs.google.com/document/d/1Wf9vju3TIEaqQwXzmPY--R0z41SMcRjAFyn9iq9r-ag/edit?usp=sharing

Intelligence is computation. It's measure is success. General intelligence is more generally successful. 

3Alex_Altair
FWIW I can't really tell what this website is supposed to be/do by looking at the landing page and menu

Personally, I think o1 is uniquely trash, I think o1-preview was actually better. Getting on average, better things from deepseek and sonnet 3.5 atm. 

I like bluesky for this atm

I'd like some feedback on my theory of impact for my currently chosen research path

**End goal**: Reduce x-risk from AI and risk of human disempowerment. 
for x-risk: 
- solving AI alignment - very important, 
- knowing exactly how well we're doing in alignment, exactly how close we are to solving it, how much is left, etc seems important.
 - how well different methods work, 
 - which companies are making progress in this, which aren't, which are acting like they're making progress vs actually making progress, etc
 - put all on ... (read more)

5Daniel Tan
What is the proposed research path and its theory of impact? It’s not clear from reading your note / generally seems too abstract to really offer any feedback

Fair enough. Personally, so far, I've found Jaynes more comprehensible than The Sequences.

3Nathan Helm-Burger
I think most people with a natural inclination towards math probably would feel likewise.

I'm finally reading The Sequences and it screams midwittery to me, I'm sorry. 

Compare this:
to Jaynes:


Jaynes is better organized, more respectful to the reader, more respectful to the work he's building on and more useful
 

9Nathan Helm-Burger
The Sequences highly praise Jaynes and recommend reading his work directly. The Sequences aren't trying to be a replacement, they're trying to be a pop sci intro to the style of thinking. An easier on-ramp. If Jaynes already seems exciting and comprehensible to you, read that instead of the Sequences on probability.

I think this is a really good opportunity to work on a topic you might not normally work on, with people you might not normally work with, and have a big impact: https://lu.ma/sjd7r89v 

I'm running the event because I think this is something really valuable and underdone.

Pretty much drove me away from wanting to post non alignment stuff here.

That seems unhelpful then? Probably best to express that frustration to a friend or someone who'd sympathize.

Thank you for continuing this very important work.

ok, options. 
- Review of 108 ai alignment plans
- write-up of Beyond Distribution - planned benchmark for alignment evals beyond a models distribution, send to the quant who just joined the team who wants to make it
- get familiar with the TPUs I just got access to
- run hhh and it's variants, testing the idea behind Beyond Distribution, maybe make a guide on itr 
- continue improving site design

- fill out the form i said i was going to fill out and send today
- make progress on cross coders - would prob need to get familiar with those tpus
- writeup o... (read more)

An abstract which is easier to understand and a couple sentences at each section that explain their general meaning and significance would make this much more accessible

1Kabir Kumar
I think the Conclusion could serve well as an abstract

I plan to send the winning proposals from this to as many governing bodies/places that are enacting laws as possible - one country is lined up atm. 

Let me know if you have any questions!

options to vary rules/environment/language as well, to see how the alignment generalizes ood. will try this today

1Yonatan Cale
This all sounds pretty in-distribution for an LLM, and also like it avoids problems like "maybe thinking in different abstractions" [minecraft isn't amazing at this either, but at least has a bit], "having the AI act/think way faster than a human", "having the AI be clearly superhuman".   I'm less interested in "will the AI say it kills its friend" (in a situation that very clearly involves killing and a person and perhaps a very clear tradeoff between that and having 100 more gold that can be used for something else), I'm more interested in noticing if it has a clear grasp of what people care about or mean. The example of chopping down the tree house of the player in order to get wood (which the player wanted to use for the tree house) is a nice toy example of that. The AI would never say "I'll go cut down your tree house", but it.. "misunderstood" [not the exact word, but I'm trying to point at something here]   wdyt?
Load More