For the SL phase, they trained 340 million updates with a batch size of 16, so 5.4 billion position-updates. However the database had only 29 million unique positions. That's about 200 gradient iterations per unique position.
The self-play RL phase for AlphaGo consisted of 10,000 minibatches of 128 games each, so about 1 million games total. They only trained that part for a day.
They spent more time training the value network: 50 million minibatches of 32 board positions, so about 1.6 billion positions. That's still much smaller than the SL training phase.
DeepMind's go AI, called AlphaGo, has beaten the European champion with a score of 5-0. A match against top ranked human, Lee Se-dol, is scheduled for March.