Making a research platform for AI Alignment at https://ai-plans.com/
Come critique AI Alignment plans and get feedback on your alignment plan!
Yup, that's definitely something that can be argued by people Against during the Debate Stage!
And they might come to the same conclusion!
I'd also read Elementary Analysis before
I'm not a grad physics student- I don't have a STEM degree, or the equivalent- I found the book very readable, nonetheless. It's by far my favourite textbook- feels like it was actually written by someone sane, unlike most.
I'm really glad you wrote this!
I think you address an important distinction there, but I think there might be a further one to be made- in that how we measure/tell if a model is aligned in the first place.
There seems to be a growing voice which says that if a model's output seems to be the output we might expect from an aligned AI, then it's aligned.
I think it's important to distinguish that from the idea that the model is aligned if you actually have a strong idea of what it's values are, how it's gotten them, etc.
I'm really excited to see this!!
I'd like it if this became embed-able so it could be used on ai-plans.com and on other sites!!
Goodness knows, I'd like to be able to get summaries and answers to obscure questions on some alignmentforum posts!
What do you think someone who knows about PDP knows that someone with a good knowledge of DL doesn't?
And why would it be useful?
I think folks in AI Safety tend to underestimate how powerful and useful liability and an established duty of care would be for this.
I think calling things a 'game' makes sense to lesswrongers, but just seems unserious to non lesswrongers.
I don't think a lack of IQ is the reason we've been failing at making AI sensibly. Rather, it's a lack of good incentive making.
Making an AI recklessly is current much more profitable than not doing do- which imo, shows a flaw in the efforts which have gone towards making AI safe - as in, not accepting that some people have a very different mindset/beliefs/core values and figuring out a structure/argument that would incentivize people of a broad range of mindsets.
Thank you! Changed it to that!