Sure? I agree this is less bad than 'literally everyone dying and that's it', assuming there's humans around, living, still empowered, etc in the background.
I was saying overall, as a story, I find it horrifying, especially contrasting with how some seem to see it utopic.
Sure, but it seems like everyone died at some point anyway, and some collective copies of them went on?
I don't think so. I think they seem to be extremely lonely and sad and the AIs are the only way for them to get any form of empowerment. And each time they try to inch further with empowering themselves with the AIs, it leads to the AI actually getting more powerful and themselves only getting a brief moment of more power, but ultimately degrading in mental capacity. And needing to empower the AI more and more, like an addict needing an ever greater high. Until there is nothing left for them to do, but Die and let the AI become the ultimate power.
Even if offscreen all of humanity didn't die, these people dying, killing themselves and never realizing what's actually happening is still insanely horrific and tragic.
How is this optimistic.
Oh yes. It's extremely dystopian. And extremely lonely, too. Rather than having a person, actual people around him to help, his only help comes from tech. It's horrifyingly lonely and isolated. There is no community, only tech.
Also, when they died together, it was horrible. They literally offloaded more and more of themselves into their tech until they were powerless to do anything but die. I don't buy the whole 'the thoughts were basically them' thing at all. It was at best, some copy of them.
There can be made an argument for it qualitatively being them, but quantitatively, obviously not.
A few months later, he and Elena decide to make the jump to full virtuality. He lies next to Elena in the hospital, holding her hand, as their physical bodies drift into a final sleep. He barely feels the transition
this is horrifying. Was it intentionally made that way?
Thoughts on this?
### Limitations of HHH and other Static Dataset benchmarks
A Static Dataset is a dataset which will not grow or change - it will remain the same. Static dataset type benchmarks are inherently limited in what information they will tell us about a model. This is especially the case when we care about AI Alignment and want to measure how 'aligned' the AI is.
### Purpose of AI Alignment Benchmarks
When measuring AI Alignment, our aim is to find out exactly how close the model is to being the ultimate 'aligned' model that we're seeking - a model whose preferences are compatible with ours, in a way that will empower humanity, not harm or disempower it.
### Difficulties of Designing AI Alignment Benchmarks
What preferences those are, could be a significant part of the alignment problem. This means that we will need to frequently make sure we know what preferences we're trying to measure for and re-determine if these are the correct ones to be aiming for.
### Key Properties of Aligned Models
These preferences must be both robustly and faithfully held by the model:
Robustness:
- They will be preserved over unlimited iterations of the model, without deterioration or deprioritization.
- They will be robust to external attacks, manipulations, damage, etc of the model.
Faithfulness:
- The model 'believes in', 'values' or 'holds to be true and important' the preferences that we care about .
- It doesn't just store the preferences as information of equal priority to any other piece of information, e.g. how many cats are in Paris - but it holds them as its own, actual preferences.
Comment on the Google Doc here: https://docs.google.com/document/d/1PHUqFN9E62_mF2J5KjcfBK7-GwKT97iu2Cuc7B4Or2w/edit?usp=sharing
This is for the AI Alignment Evals Hackathon: https://lu.ma/xjkxqcya by AI-Plans
Thinking about judgement criteria for the coming ai safety evals hackathon (https://lu.ma/xjkxqcya )
These are the things that need to be judged:
1. Is the benchmark actually measuring alignment (the real, scale, if we dont get this fully right right we die, problem)
2. Is the way of Deceiving the benchmark to get high scores actually deception, or have they somehow done alignment?
Both of these things need:
- a strong deep learning & ml background (ideally, muliple influential papers where they're one of the main authors/co-authors, or doing ai research at a significant lab, or they have, in the last 4 years)
- a good understanding of what the real alignment problem actually means - can judge this by looking at their papers, activity on lesswrong, alignmentforum, blog, etc
- a good understanding of evals/benchmarks (1 great or two pretty good papers/repos/works on this, ideally for alignment)
Do these seem loose? Strict? Off base?
Thank you for sharing negative results!!