In addition, the entire network needs to learn somehow to determine which parts of the network in the past were responsible for current reward signals which are delayed and noisy.
This is a well-known problem, called reinforcement learning. It is a significant component in the reported results. (What happens in practice is that a network's ability to assign "credit" or "blame" for reward signals falls off exponentially with increasing delay. This is a significant limitation, but reinforcement learning is nevertheless very helpful given tight feedback loops.)
Yes, but as I wrote above, the problems of credit assignment, reward delay and noise are non-existent in this setting, and hence their work does not contribute at all to solving AI.
DeepMind's go AI, called AlphaGo, has beaten the European champion with a score of 5-0. A match against top ranked human, Lee Se-dol, is scheduled for March.