This is just my layman theory. Maybe it’s obvious to experts, probably has flaws. But it seems to make sense to me, perhaps will give you some ideas. I would love to hear your thoughts/feedback!
Consume input
The data you need from the world(like video), and useful metrics we want to optimize for, like number of paperclips in the world.
Make predictions and take action
Like deep learning does.
How do human brains convert their structure into action?
Maybe like:
- Take the current picture of the world as an input.
- Come up with random action.
- “Imagine” what will happen.
Take the current world + action, and run it through the ANN. Predict the outcome of the action applied to the world.
- Does the output increase the metrics we want? If yes — send out the signals to take action. If no — come up with another random action and repeat.
Update beliefs
Look at the outcome of the action. Does the picture of the world correspond to the picture we’ve imagined? Did this action increase the good metrics? Did the number of paperclips in the world increase? If it did — positive reinforcement. Backpropagation, and reinforce the weights.
Repeat
Take current picture of the world=> Imagine applying an action to it => Take action => Positive/Negative reinforcement to improve our model => Repeat until the metrics we want equal to the goal we have set.
Consciousness
Consciousness is neurons observing/recognizing patterns of other neurons.
When you see the word “cat”— photons from the page come to your retina and are converted to neural signal. A network of cells recognizes the shape of letters C, A, and T. And then a higher level, more abstract network recognizes that these letters together form the concept of a cat.
You can also recognize signals coming from the nerve cells within your body, like feeling a pain when stabbing a toe.
The same way, neurons in the brain recognize the signals coming from the other neurons within the brain. So the brain “observes/feels/experiences” itself. Builds a model of itself, just like it builds a map of the world around, “mirrors” itself(GEB).
Sentient and self-improving
So the structure of the network itself is fed as one of it’s inputs, along with the video and metrics we want to optimize for. It can see itself as a part of the state of the world it bases predictions on. That’s what being sentient means.
And then one of the possible actions it can take is to modify it’s own structure. “Imagine” modifyng the structure a certain way, if you predict that it leads to the better predictions/outcomes —modify it. If it did lead to more paperclips — reinforce the weights to do more of that. So it keeps continually self improving.
Friendly
We don’t want this to lead to the infinite amount of paperclips, and we don’t know how to quantify the things we value as humans. We can’t turn the “amount of happiness” in the world into a concrete metrics without the unintended consequences(like all human brains being hooked up to wires that stimulate our pleasure centers).
That’s why instead of trying to encode the abstract values to maximize for, we encode very specific goals.
- Make 100 paperclips (utility function is “Did I make 100 paperclips?”)
- Build 1000 cars
- Write a paper on how to cure cancer
Humans remain in charge, determine the goals we want, and let AI figure out how to accomplish them. Still could go wrong, but less likely.
(originally published on my main blog)
In real life, the problem with metrics is that if you don't make it perfectly right (which is difficult), you can easily get something useless, often even actively harmful.
And yet, metrics often are useful in real life. You generally want to measure things. You need to know how much money you have, and it is better to know in detail the structure of your incomes and expenses. If you want to e.g. exercise regularly or stop eating chocolate, keeping a log of which days you exercised or avoided the chocolate is often a good first step.
Thus we find ourselves in a paradox that we need good metrics, but we need to remember that they are mere approximations of reality, lest we start optimizing for the metrics at the expense of the real things. (Good advice for a human, not very useful for constructing the AI.)
Yes, the "utility" of evolution is not the same as that of the evolved human.
Sometimes following your impulse can make you unhappy and still on average increase your fitness, for example jealousy. (Jealous people are made less happy by the idea that their partners might be cheating on them. But feeling this discomfort and guarding one's partner increases the reproductive fitness in average.) I mean, yes, finding out that despite your suspicions your partner does not cheat on you makes you more happy (or less unhappy) than finding out that they actually do. But not worrying about the possibility would make you even more happy. Humans are instinctively not even happiness maximizers.