Mod note, in the spirit of our experiment in more involved moderation:
This seems a bit more grounded than many first posts about AI on LessWrong, but my guess is this doesn't meet the quality bar for. See the advice in my AI Takes section of my linked comment.
seems like a reasonable take. it's been discussed in papers already. I'll put some resources here in a few hours.
Assuming that agi agent is being built with q-learning and an LLM for the world model, we can do the following to emulate empathy:
As i say, emulated empathy. Doesn't solve alignment, obviously, but would help a bit against immediate danger. If people are gonna be stupid and create agents, we can at least make sure agents aren't electronic psychopaths. This solution assumes a lot about how first agi will be built, but i believe my assumptions are in the right direction. I make this judgement from looking into suggestions such as JEPA and from my general understanding of the human mind.
Now, I'm a nobody in this field, but i haven't heard this approach being discussed, so hoping someone with actual media presence(Eliezer) can get the idea over to the people developing ai. If this was internally suggested at openai/anthropic/ms/google - good, if not it's paramount i get it over to them. Any suggestions on how to contact important people in the field, and actually get them to pay attention to some literal who are welcome, also feel free to rephrase my idea and disseminate it yourself.
If first agi isn't created with some form of friendliness baked into the architecture, we'll all be dead before 2030s, i believe anything that can help is of infinite importance. And it's evident people are right now trying to create agi.