All of Kabir Kumar's Comments + Replies

Ok, I want to make that at scale. If multiple people have done it and there's value in it, then there is a formula of some kind. 

We can write it down, make it much easier to understand unambiguously (read: less unhelpful confusion about what to do or what the writer meant and less time wasted figuring that out) than any of the current agent foundations type stuff. 

I'm extremely skeptical that needing to hear a dozen stories dancing around some vague ideas of a point and then 10 analogies (exagerrating to get emotions across) is the best we can do. 

regardless of if it works, I think it's disrespectful for being manipulative at worst and wasting the persons time at best.

You can just say the actual criticism in a constructive way. Or if you don't know how to, just ask - "hey I have some feedback to give that I think would help, but I don't know how to say it without it potentially sounding bad - can I tell you and you know I don't dislike you and I don't mean to be disrespectful?" and respect it if they say no, they're not interested. 

Multiple talented researchers I know got into alignment because of PauseAI. 

You can also give them the clipboard and pen, works well

in general, when it comes to things which are the 'hard part of alignment', is the crux 
```
a flawless method of ensuring the AI system is pointed at and will always continue to be pointed at good things
```
?
the key part being flawless - and that seeming to need a mathematical proof?

Trying to put together a better explainer for the hard part of alignment, while not having a good math background https://docs.google.com/document/d/1ePSNT1XR2qOpq8POSADKXtqxguK9hSx_uACR8l0tDGE/edit?usp=sharing
Please give feedback!

Make the (!aligned!) AGI solve a list of problems, then end all other AIs, convince (!harmlessly!) all humans to never make another AI, in a way that they will pass down to future humans, then end itself. 

Thank you for sharing negative results!! 

Sure? I agree this is less bad than 'literally everyone dying and that's it', assuming there's humans around, living, still empowered, etc in the background. 

I was saying overall, as a story, I find it horrifying, especially contrasting with how some seem to see it utopic. 

3Raemon
Nod. I'm just answering your question of why I consider it optimistic. 
  1. Sure, but it seems like everyone died at some point anyway, and some collective copies of them went on? 

     

  2. I don't think so. I think they seem to be extremely lonely and sad and the AIs are the only way for them to get any form of empowerment. And each time they try to inch further with empowering themselves with the AIs, it leads to the AI actually getting more powerful and themselves only getting a brief moment of more power, but ultimately degrading in mental capacity. And needing to empower the AI more and more, like an addict needing an ever g

... (read more)
2Raemon
Yeah, but I'm contrasting this with (IMO more likely) futures where everyone dies, and nothing that's remotely like a human copy goes on. Even if you conceptualize it as "these people died", I think there are much worse possibilities for what sort of entity continues into the future. (i.e. a non sentient AI with no human/social/creative/emotional values, that just tiles the universe with simple struggles).  or "this story happens, but with even less agency and more blatantly dystopian outcomes.") [of course, the reason I described this as "optimistic" instead of "less pessimistic than I expect” is that I don't think the characters died, I think if you slowly augment yourself with AI tools, the pattern of you counts as "you" even as it starts to be instantiated in silicon, so I think this is just a pretty good outcome. I also think the world (implies) many people thinking about moral / personhood philosophy before taking the final plunge. I don't think there's anything even plausibly wrong with the first couple chunks, and I think the second half contains a lot of qualifiers (such as integrating his multiple memories into a central node) that make it pretty unobjectionable. I realize you don't believe that, and, seems fine for you to see it as horror. It's been awhile since I discussed the "does a copy of you count as you" and I might be up for discussing that if you want to argue about it, but also seems fine to leave as-is]

How is this optimistic. 

2Jozdien
I would be curious whether you consider The Gentle Seduction to be optimistic. I think it has fewer elements that you mentioned finding dystopian in another comment, but I find the two trajectories similarly good.
Raemon*133

Well, in this world:

1. AI didn't just kill everyone 5% of the way through the story

2. IMO, the characters in this story basically get the opportunity to reflect on what is good for them before taking each additional step. (they maybe feel some pressure to Keep Up With The Joneses, re: AI assisted thinking. But, that pressure isn't super crazy strong. Like the character's boss isn't strongly implying that if they don't take these upgrades they lose their job.)

3. Even if you think the way the characters are making their choices here are more dystopian and th... (read more)

Oh yes. It's extremely dystopian. And extremely lonely, too. Rather than having a person, actual people around him to help, his only help comes from tech. It's horrifyingly lonely and isolated. There is no community, only tech. 

Also, when they died together, it was horrible. They literally offloaded more and more of themselves into their tech until they were powerless to do anything but die. I don't buy the whole 'the thoughts were basically them' thing at all. It was at best, some copy of them. 

There can be made an argument for it qualitatively being them, but quantitatively, obviously not. 

A few months later, he and Elena decide to make the jump to full virtuality. He lies next to Elena in the hospital, holding her hand, as their physical bodies drift into a final sleep. He barely feels the transition

this is horrifying. Was it intentionally made that way?

Thoughts on this?


### Limitations of HHH and other Static Dataset benchmarks

A Static Dataset is a dataset which will not grow or change - it will remain the same. Static dataset type benchmarks are inherently limited in what information they will tell us about a model. This is especially the case when we care about AI Alignment and want to measure how 'aligned' the AI is.

### Purpose of AI Alignment Benchmarks

When measuring AI Alignment, our aim is to find out exactly how close the model is to being the ultimate 'aligned' model that we're seeking - a model w... (read more)

this might basically be me, but I'm not sure how exactly to change for the better. theorizing seems to take time and money which i don't have. 

Thinking about judgement criteria for the coming ai safety evals hackathon (https://lu.ma/xjkxqcya )
These are the things that need to be judged: 
1. Is the benchmark actually measuring alignment (the real, scale, if we dont get this fully right right we die, problem) 
2. Is the way of Deceiving the benchmark to get high scores actually deception, or have they somehow done alignment?

Both of these things need: 
- a strong deep learning & ml background (ideally, muliple influential papers where they're one of the main authors/co-authors, or do... (read more)

 I'm looking for feedback on the hackathon page
mind telling me what you think?
https://docs.google.com/document/d/1Wf9vju3TIEaqQwXzmPY--R0z41SMcRjAFyn9iq9r-ag/edit?usp=sharing

Intelligence is computation. It's measure is success. General intelligence is more generally successful. 

3Alex_Altair
FWIW I can't really tell what this website is supposed to be/do by looking at the landing page and menu

Personally, I think o1 is uniquely trash, I think o1-preview was actually better. Getting on average, better things from deepseek and sonnet 3.5 atm. 

I like bluesky for this atm

I'd like some feedback on my theory of impact for my currently chosen research path

**End goal**: Reduce x-risk from AI and risk of human disempowerment. 
for x-risk: 
- solving AI alignment - very important, 
- knowing exactly how well we're doing in alignment, exactly how close we are to solving it, how much is left, etc seems important.
 - how well different methods work, 
 - which companies are making progress in this, which aren't, which are acting like they're making progress vs actually making progress, etc
 - put all on ... (read more)

5Daniel Tan
What is the proposed research path and its theory of impact? It’s not clear from reading your note / generally seems too abstract to really offer any feedback

Fair enough. Personally, so far, I've found Jaynes more comprehensible than The Sequences.

3Nathan Helm-Burger
I think most people with a natural inclination towards math probably would feel likewise.

I'm finally reading The Sequences and it screams midwittery to me, I'm sorry. 

Compare this:
to Jaynes:


Jaynes is better organized, more respectful to the reader, more respectful to the work he's building on and more useful
 

9Nathan Helm-Burger
The Sequences highly praise Jaynes and recommend reading his work directly. The Sequences aren't trying to be a replacement, they're trying to be a pop sci intro to the style of thinking. An easier on-ramp. If Jaynes already seems exciting and comprehensible to you, read that instead of the Sequences on probability.

I think this is a really good opportunity to work on a topic you might not normally work on, with people you might not normally work with, and have a big impact: https://lu.ma/sjd7r89v 

I'm running the event because I think this is something really valuable and underdone.

Pretty much drove me away from wanting to post non alignment stuff here.

That seems unhelpful then? Probably best to express that frustration to a friend or someone who'd sympathize.

Thank you for continuing this very important work.

ok, options. 
- Review of 108 ai alignment plans
- write-up of Beyond Distribution - planned benchmark for alignment evals beyond a models distribution, send to the quant who just joined the team who wants to make it
- get familiar with the TPUs I just got access to
- run hhh and it's variants, testing the idea behind Beyond Distribution, maybe make a guide on itr 
- continue improving site design

- fill out the form i said i was going to fill out and send today
- make progress on cross coders - would prob need to get familiar with those tpus
- writeup o... (read more)

An abstract which is easier to understand and a couple sentences at each section that explain their general meaning and significance would make this much more accessible

1Kabir Kumar
I think the Conclusion could serve well as an abstract

I plan to send the winning proposals from this to as many governing bodies/places that are enacting laws as possible - one country is lined up atm. 

Let me know if you have any questions!

options to vary rules/environment/language as well, to see how the alignment generalizes ood. will try this today

1Yonatan Cale
This all sounds pretty in-distribution for an LLM, and also like it avoids problems like "maybe thinking in different abstractions" [minecraft isn't amazing at this either, but at least has a bit], "having the AI act/think way faster than a human", "having the AI be clearly superhuman".   I'm less interested in "will the AI say it kills its friend" (in a situation that very clearly involves killing and a person and perhaps a very clear tradeoff between that and having 100 more gold that can be used for something else), I'm more interested in noticing if it has a clear grasp of what people care about or mean. The example of chopping down the tree house of the player in order to get wood (which the player wanted to use for the tree house) is a nice toy example of that. The AI would never say "I'll go cut down your tree house", but it.. "misunderstood" [not the exact word, but I'm trying to point at something here]   wdyt?

it would basically be DnD like. 

1Kabir Kumar
options to vary rules/environment/language as well, to see how the alignment generalizes ood. will try this today

Making a thing like Papers Please, but as a text adventure, popping an ai agent into that. 
Also, could literally just put the ai agent into a text rpg adventure - something like the equivalent of Skyrim, where there are a number of ways to achieve the endgame, level up, etc, both more and less morally. Maybe something like https://www.choiceofgames.com/werewolves-3-evolutions-end/ 
Will bring it up at the alignment eval hackathon

1Kabir Kumar
it would basically be DnD like. 

I see them in o1-preview all the time as well. Also, french occasionally

If developments like this continue, could open weights models be made into a case for not racing? E.g. if everyone's getting access to the weights, what's the point in spending billions to get there 2 weeks earlier?

this can be done more scalably in a text game, no? 

1Yonatan Cale
I think there are lots of technical difficulties in literally using minecraft (some I wrote here), so +1 to that. I do think the main crux is "would the minecraft version be useful as an alignment test", and if so - it's worth looking for some other solution that preserves the good properties but avoids some/all of the downsides. (agree?)   Still I'm not sure how I'd do this in a text game. Say more?

People Cannot Handle Gambling on Smartphones

this seems a very strange way to say "Smartphone Gambling is Unhealthy"
It's like saying "People's Lungs Cannot Handle Cigarettes"

To be a bit less useless - I think this fundamentally misses the problem of respect and actually being able to communicate with yourself and fully do things, if you've done so - and that you can do these when you have full faith and respect in yourself (meaning all of yourself - may include love as well, not sure how necessary that is for this). Could maybe be done in other ways as well, but I find those less beautiful, personally. 

I think this is really along the wrong path and misunderstanding a lot of things, but so far along the incorrect path of thought and misunderstanding so much, that it's hard to untangle

3Kabir Kumar
To be a bit less useless - I think this fundamentally misses the problem of respect and actually being able to communicate with yourself and fully do things, if you've done so - and that you can do these when you have full faith and respect in yourself (meaning all of yourself - may include love as well, not sure how necessary that is for this). Could maybe be done in other ways as well, but I find those less beautiful, personally. 

I thought this was going to be an allegory for interpretability.

give better names to actual formal math things, jesus christ. 

I think posts like this are net harmful, by discouraging people from joining those doing good things without providing an alternative and so wasting energy on meaningless ruminating that doesn't culminate in any useful action.

oh, sorry, I thought slatestar codex wrote something about it and you were saying that's where it comes from

Load More