Credit assignment and reward delay are nonexistent? What do you think happens when one diffs the board strength of two potential boards?
"Nonexistent problems" was meant as a hyperbole to say that they weren't solved in interesting ways and are extremely simple in this setting because the states and rewards are noise-free. I am not sure what you mean by the second question. They just apply gradient descent on the entire history of moves of the current game such that expected reward is maximized.
DeepMind's go AI, called AlphaGo, has beaten the European champion with a score of 5-0. A match against top ranked human, Lee Se-dol, is scheduled for March.