Dynamic reliability management based on resource-based EM modeling for multi-core microprocessors. (April 2018)
- Record Type:
- Journal Article
- Title:
- Dynamic reliability management based on resource-based EM modeling for multi-core microprocessors. (April 2018)
- Main Title:
- Dynamic reliability management based on resource-based EM modeling for multi-core microprocessors
- Authors:
- Kim, Taeyoung
Liu, Zao
Tan, Sheldon X.-D. - Abstract:
- Abstract: This article presents a new approach for system-level reliability management for multi/many core microprocessors. In the new approach, the electro-migration (EM) induced time to failure (TTF) at the system level is modeled as a resource, which is abstracted, from a recently proposed physics-based EM model, at the chip level. In this model, a single core can spend the TTF resources at different rates specified by the temperature and the related power consumption. As a result, the new resource-based EM model allows more flexible EM-reliability management for multi/many-core systems. As an application of the new model, we propose a novel task migration method to explicitly balance consumption of EM resources for all the cores. The new method aims at equalizing the probability of failure of each core, which will maximize the lifetime of the whole multi/many core system. To more efficiently regulate the lifetime of the multi/many core system, dynamic voltage and frequency scaling (by using different performance states (p-states), which can represent different operating voltages and frequencies) is further employed to compensate for the excessively consumed lifetime of all the cores when the chip is loaded with heavy tasks for a certain period of time. In this way, the TTF of all the cores could be compensated to meet the lifetime requirement, giving the multi/many core system more flexibility to handle heavy task assignment on demand. Experimental results on a 36-coreAbstract: This article presents a new approach for system-level reliability management for multi/many core microprocessors. In the new approach, the electro-migration (EM) induced time to failure (TTF) at the system level is modeled as a resource, which is abstracted, from a recently proposed physics-based EM model, at the chip level. In this model, a single core can spend the TTF resources at different rates specified by the temperature and the related power consumption. As a result, the new resource-based EM model allows more flexible EM-reliability management for multi/many-core systems. As an application of the new model, we propose a novel task migration method to explicitly balance consumption of EM resources for all the cores. The new method aims at equalizing the probability of failure of each core, which will maximize the lifetime of the whole multi/many core system. To more efficiently regulate the lifetime of the multi/many core system, dynamic voltage and frequency scaling (by using different performance states (p-states), which can represent different operating voltages and frequencies) is further employed to compensate for the excessively consumed lifetime of all the cores when the chip is loaded with heavy tasks for a certain period of time. In this way, the TTF of all the cores could be compensated to meet the lifetime requirement, giving the multi/many core system more flexibility to handle heavy task assignment on demand. Experimental results on a 36-core processor platform show that the proposed task migration scheme could balance the lifetime consumption of all the cores, and maintain even-distributed TTF slacks across different cores, while the existing temperature-based task migration schemes lead to diverged TTF of cores. The results also show that the TTF consumption could be easily compensated for by switching to a low p-state, which could not be achieved if TTF consumption diverged across different cores. … (more)
- Is Part Of:
- Microelectronics journal. Volume 74(2018)
- Journal:
- Microelectronics journal
- Issue:
- Volume 74(2018)
- Issue Display:
- Volume 74, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 74
- Issue:
- 2018
- Issue Sort Value:
- 2018-0074-2018-0000
- Page Start:
- 106
- Page End:
- 115
- Publication Date:
- 2018-04
- Subjects:
- Electromigration -- Reliability management -- Resource-based -- Reliability modeling
Microelectronics -- Periodicals
Microélectronique -- Périodiques
Microelectronics
Electronic journals
Journals - contents and abstracts
Periodicals
621.3805 - Journal URLs:
- http://catalog.hathitrust.org/api/volumes/oclc/5877621.html ↗
http://www.sciencedirect.com/science/journal/00262692 ↗
http://www.intute.ac.uk/sciences/cgi-bin/fullrecord.pl?handle=lesa.1012319367 ↗
http://www.elsevier.com/journals ↗
http://www.elsevier.com/homepage/elecserv.htt ↗ - DOI:
- 10.1016/j.mejo.2018.01.024 ↗
- Languages:
- English
- ISSNs:
- 0959-8324
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5758.973000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 6110.xml