Tutorial Summary:
Radiation-induced soft errors are getting worse in digital systems manufactured in advanced technologies.
Stringent data integrity and availability requirements of enterprise computing and networking applications demand special attention to soft errors in sequential elements and combinational logic.
This tutorial will discuss the impact of technology scaling on soft error rates, circuit-level modeling of soft errors, architectural impact of soft errors, challenges associated with evaluation of run-time behaviors of systems in the presence of soft errors, actual data on system behaviors in the presence of soft errors, metrics for quantifying soft error vulnerabilities, design of architectures with Built-in-Soft-Error-Resilience techniques, and actual case studies.
Two of the presenters co-founded a new workshop on soft errors (SELSE 2005-2007). Lessons learnt from these workshops will also be included in the tutorial.
[back to top]
Keywords:
Soft errors, Memory soft errors, Logic soft errors, Error Correcting Codes, FITs, timing derating, logic derating, architectural derating, radiation hardening, Built-In Soft Error Resilience, error detection, recovery, data integrity, reliability, availability.
[back to top]
Program:
Major Topics: Causes of soft errors, circuit-level impact of soft errors, technology trends backed up with actual soft error measurement data, circuit-level soft error modeling for memories, latches, flip-flops and combinational logic, timing and logic derating, architectural vulnerability factors, derating estimation and associated challenges, circuit design techniques for soft error protection, Built-In Soft Error Resilience techniques, concurrent error detection techniques, recovery, comparisons of various techniques, actual case studies of system designs.
[ back to top]
Strengths of the Tutorial: The unique strengths of this tutorial are: (1) the audience will learn circuit and system-level modeling and design aspects of soft errors (and not just technology and physics aspects); (2) supporting data on designs and technology that many tutorials don’t provide; (3) technology trends of SRAM and logic soft errors will be discussed, and apparent disagreements within the soft error community will be highlighted; (4) new techniques for analyzing circuit and system-level impact of soft errors will be discussed; (5) new protection techniques will be presented; (6) several practical issues will be addressed so that practitioners can relate to the material covered in the tutorial; (7) an extensive bibliography will be provided; (8) several open challenges will be discussed for the researchers;
and (9) the presenters have the reputation of being able to articulate complex concepts in simple ways for the audience to understand the material easily.
[back to top]
Tutorial Organization:
First 60 mins: Will summarize the entire tutorial covering: (1) basic concepts in reliability; (2) Various reliability mechanisms; (3) soft errors; (4) soft error rate goals; (5) basic idea of reliability, data integrity, silent data corruption and availability; (6) a very high-level overview of circuit and system-level impact of soft errors (7) a high-level overview of protection techniques and associated trade-offs; (8) supporting data. This part will be covered by Prof. Subhasish Mitra. This part is essential because it is very important for the audience to have a “big picture” so that they can understand the relationships among the various topics when they are covered in details.
60 mins: Soft error modeling: Topics covered: Circuit level to chip-level estimation strategies; SER model calibration; Modeling of the nominal SER of key circuit types; combinational and clock SER; Timing and logic derating factors; This part will be covered by Dr.Norbert Seifert.
60 mins: System-level impact of soft errors: Component soft error rates and their contributions to system failures vs. silent errors. System derating effects. This part will be covered by Dr. Pia Sanda.
60 mins: Circuit-level soft error protection techniques: Topics covered: classical hardening techniques such as selective node engineering, RC filtering, body biasing; Latch hardening techniques, selective gate sizing, and clock network protection. This part will be covered by Dr. Norbert Seifert.
60 mins: Logic and Architectural soft error protection techniques: Built-In Soft Error Resilience, Soft Error Correcting Combinational Logic, ECC, Concurrent Error Detection, Parity Prediction, Multi-threading, Software Implemented Hardware Fault Tolerance, Application Dependent techniques, Comparisons of various techniques, Future Challenges. This part will be covered by Prof. Subhasish Mitra.
60 mins: System Design Case Study: z990 case study for selective replication, including design and field results. Power5 case study for on-line checking. This part will be covered by Dr. Pia Sanda.
[back to top]
Presenters’ Biographies:
Subhasish Mitra is an Assistant Professor in the Departments of Electrical Engineering and Computer Science of Stanford University.
His research interests include robust system design, VLSI design and test, computer architecture and design for emerging nanotechnologies.
Prior to joining Stanford, he was a Principal Engineer at Intel Corporation.
He received Ph.D. in Electrical Engineering from Stanford University.
Prof. Mitra has published more than 90 technical papers in leading conferences and journals, and invented design and test techniques that have seen wide-spread proliferation in the industry.
His X-Compact technique for test compression has been used by more than 40 Intel products including microprocessors, chipsets, and communications chips, and is supported by major CAD tools.
His most recent honors include the IEEE Circuits and Systems Society Donald O. Pederson Award, a Best Paper Award nomination at the Design Automation Conference, a Divisional Recognition Award from the Intel “for a Breakthrough Soft Error Protection Technology,” a Best Paper Award at the Intel Design and Test Technology Conference for his work on Built-In Soft Error Resilience, the Sundaram Seshu Scholar Lecturer at the University of Illinois at Urbana Champaign, and the Intel Achievement Award, Intel’s highest honor, “for the development and deployment of a breakthrough test compression technology that achieved an order of magnitude reduction in scan test cost.”
Prof. Mitra is a 2006 Terman Fellow at Stanford.
He has held several consulting positions, and served on the organizing and program committees of several IEEE and ACM sponsored conferences.
Pia Sanda received the Ph.D. degree in physics from Cornell University, Ithaca, NY, in surface Raman scattering.
She was a Manager in the VLSI Design Area, IBM T. J. Watson Research Center, Yorktown Heights, NY.
She began her career at IBM in imaging science. In silicon technology, she designed and built 0.1-m channel length CMOS FET’s using phase shift lithography and contributed to the device and cell design for the 256-Mb DRAM.
She has been engaged in designing high-performance circuits for microprocessors and has recently explored new avenues for test and improved semiconductor manufacturability, such as the new PICA measurement technique. She is currently the Program Director of Soft Error Management at IBM.
Norbert Seifert holds an M.S. in physics from Vanderbilt University, Nashville, TN, and received his Ph.D. degree in physics from the Technical University of Vienna, Austria, in 1993.
His Ph.D. thesis focuses on radiation-induced defect formation and diffusion in wide band gap ionic crystals. Dr. Seifert has conducted research in a wide range of physics topics, from charge transfer processes in atomic collisions as a postdoctoral associate at North Carolina State University, to computational fluid dynamics of high-power laser material processing as a postdoctoral associate at the Technical University of Vienna.
In 1997 Dr. Seifert joined the Alpha Development Group (DEC/Compaq/HP) where he worked in the fields of device physics, device reliability, and digital design.
He is currently a Staff Reliability and Design Engineer with Intel Corporation in Hillsboro, Oregon, where he is responsible for all aspects of developing accurate SER models and a coherent chip-level SER methodology. He is also deeply involved in developing methodologies for assessing the impact of NBTI on system performance.
Dr. Seifert has worked extensively on soft errors – the physics of soft errors, virtually all modeling aspects, and the soft error trend with technology scaling.
He has published more than 30 papers and holds several patents. Dr. Seifert is a senior member of IEEE and a member of the Austrian Physical Society. He served on numerous technical program committees and chaired several soft error sessions at major conferences such as IRPS and IEEE International On-line Test Symposium (IOLTS).
He is a frequent reviewer for IEEE Transaction on Device and Materials Reliability (TDMR) and is a co-editor of the Special Issue on Soft Errors and Data Integrity in Terrestrial Computer Systems (TDMR, September 2005).
[back to top]
|
|