Thanks, this is helpful!
Thanks! This is interesting.
Thanks!
I think my question is deeper - why do machines 'want' or 'have a goal to' follow the algorithm to maximize reward? How can machines 'find stuff rewarding'?
This might be a crux, because I'm inclined to think they depend on qualia.
Why does AI 'behave' in that way? How do engineers make it 'want' to do things?
My comment-box got glitchy but just to add: this category of intervention might be a good thing to do for people who care about AI safety and don't have ML/programming skills, but do have people skills/comms skills/political skills/etc.
Maybe lots of people are indeed working on this sort of thing, I've just heard much less discussion of this kind of solution relative to technical solutions.
To be clear, I'm not claiming that this will be easy - this is not a "why don't we just-" point. I agree with the things Yudkowsky says in that paragraph about why it would be difficult. I'm just saying that it's not obvious to me that this is fundamentally intractable or harder than solving the technical alignment problem. Reasons for relative optimism:
This is very basic/fundamental compared to many questions in this thread, but I am taking 'all dumb questions allowed' hyper-literally, lol. I have little technical background and though I've absorbed some stuff about AI safety by osmosis, I've only recently been trying to dig deeper into it (and there's lots of basic/fundamental texts I haven't read).
Writers on AGI often talk about AGI in anthropomorphic terms - they talk about it having 'goals', being an 'agent', 'thinking' 'wanting', 'rewards' etc. As I understand it, most AI researchers don't think that AIs will have human-style qualia, sentience, or consciousness.
But if AI don't have qualia/sentience, how can they 'want things' 'have goals' 'be rewarded', etc? (since in humans, these things seem to depend on our qualia, and specifically our ability to feel pleasure and pain).
I first realised that I was confused about this when reading Richard Ngo's introduction to AI safety and he was talking about reward functions and reinforcement learning. I realised that I don't understand how reinforcement learning works in machines. I understand how it works in humans and other animals - give the animal something pleasant when it does the desired behaviour and/or painful when it does the bad behaviour. But how can you make a machine without qualia "feel" pleasure or pain?
When I talked to some friends about this, I came to the conclusion that this is just a subset of 'not knowing how computers work', and it might be addressed by me getting more knowledge about how computers work (on a hardware, or software-communicating-with-hardware, level). But I'm interested in people's answers here.
I got a Fatebook account thanks to this post!