SebNickel comments on Model Combination and Adjustment - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (35)
(cont'd from previous comment)
Did the industry predict these problems and their consequences?
People in the industry were well aware of these limitations, long before they actually became critical. However, whether solutions and workarounds would be found was a matter of much greater uncertainty.
Robert Dennard et al's seminal 1974 paper Design of Ion-Implanted MOSFETs with Very Small Physical Dimensions, that described the very favourable scaling properties of MOSFET transistors and gave rise to the term "Dennard scaling", explicitly mentions the scaling limitations posed by subthreshold leakage:
This influential 1995 paper by Davari, Dennard and Shahidi presents guidelines for transistor scaling for the years up to 2004. This paper contains a subsection titled "Performance/Power Tradeoff and Nonscalability of the Threshold Voltage", which explains the problems described above in a lot more detail than I have. The paper also mentions tunnelling through the gate oxide layer, concluding on both issues that they would remain relatively unproblematic up until 2004.
Subthreshold leakage is textbook material. The main textbook I have consulted is Digital Integrated Circuits by Jan Rabaey, and I have compared some aspects of the 1995 and 2003 editions. Both contained this sentence in the "History of…" chapter:
Regarding gate oxide leakage, this comparatively very accessible 2007 article from the IEEE spectrum recounts the story of how engineers at Intel and elsewhere have developed transistors that use high-
dielectrics as a way of maintaining the shrinking gate oxide's capacitance even as its thickness would cease to be reduced to prevent excessive tunnelling. According to this article, work on such solutions began in the mid-1990s, and Intel eventually launched new chips that made use of this technology in 2007. The main impression this article leaves me with is that the problem was very easy to foresee, but that finding out which solutions might work was a matter of extensive tinkering with highly unpredictable results.
The leakage issues are all mentioned in 2001 ITRS roadmap, the earliest edition that is available online. One example from the Executive Summary:
From reading the reports, it is hard to make out whether the implications of these issues were correctly understood, and I have had to draw on a lot of other literature to get a better sense of where the industry stood on this. Getting a hold of earlier editions (the 1999 one in particular) and talking to industry insiders might shed a lot more light on the weight that was given to the different issues that were flagged as part of the "Red Brick Wall" I've mentioned above, i.e. as issues that had no known manufacturable solutions (I did not receive an answer to my inquiry about this from the contact person at the ITRS website). The Executive Summary of the 2001 edition states:
Two accessible articles from 2000 give a clearer impression of how this Red Brick Wall was perceived in the industry at the time. Both particularly emphasise gate oxide leakage.
(cont'd)
(cont'd from previous comment)
As I have mentioned at the beginning, the reports up to 2005 contained highly overoptimistic projections for on-chip frequency and supply voltage, which became dramatically more pessimistic in the 2007 edition. The reports clearly state, however, that these numbers are meant as targets and are not necessarily "on the road to sure implementation", especially where it has been highlighted that solutions were needed and not yet known. They can therefore not necessarily serve as a clear indictment of the ITRS' predictive powers, but I remain puzzled by some of their projections and comments on these before 2007. Getting clarification on this from industry insiders was the next thing I had planned for this project before we paused it.
Specifically, tables 4c and 4d in the Overall Roadmap Technology Characteristics, found in a subsection of the Executive Summary titled Performance of Packaged Chips, contain on-chip frequency forecasts in MHz, which became dramatically more pessimistic in 2007 than they had been in the previous 3 editions. A footnote in the 2007 edition states:
Later editions seem to have reduced the expected scaling factor even further (1.04 in the 2011 edition), but there were also changes made to the metric employed, so I am not sure how to interpret the numbers (though I would expect the scaling factor to be unaffected by those changes).
Relatedly, a paragraph in the System Drivers document titled Maximum on-chip (global) clock frequency states that the on-chip clock frequency would not continue scaling at a factor of 2 per generation for several reasons. The 2001 edition states 3 reasons for this, the 2003 and 2005 edition state 4. But only in 2007 was the limitation from maximum allowable power dissipation added to this list of reasons. This strikes me as very puzzling. The paragraph, as it appears in the 2007 edition, is (emphasis added):
Finally, the Overall Roadmap Technology Characteristics tables 6a and 6b (found in a subsection titled Power Supply and Power Dissipation in the Executive Summary) contains projected values of the supply power (
) which also became dramatically more pessimistic in the 2007 edition.
I have indicated my puzzlement at these points in an email I have sent out to a number of industry insiders, then asking:
I have received some very kind replies to those emails, but most have focused on the technical reasons for the "breakdown" in Dennard scaling. The only comment I have received on this last question was from Robert Dennard, who sent me a particularly thoughtful email that came with 4 attachments (which mainly provided more technical detail on transistor design, however). At the end of his email, he wrote:
Indeed, which bets it is most rational to make depends on expected payoff ratios as well as on probability estimates. This distinction between targets and mere predictions complicates the question quite a bit.
This was an interesting project, it would be great to pick it up again.
Update: see also Erik DeBenedictis' comments on this topic.