Topology-aware network fault influence domain analysis. (January 2017)
- Record Type:
- Journal Article
- Title:
- Topology-aware network fault influence domain analysis. (January 2017)
- Main Title:
- Topology-aware network fault influence domain analysis
- Authors:
- Wu, Zhenwei
Lu, Kai
Wang, Xiaoping
Chi, Wanqing - Abstract:
- Highlights: Incorporating failure-awareness into system management stack. The concept of network fault influence domain is proposed. The rules for topology-based fault influence analysis are established. Graphical abstract: Abstract: The extremely high performance of supercomputers is derived from the coordination of a large number of compute nodes. As a consequence, the communication subsystem significantly affects the overall system performance. A single router or link breakdown in the interconnection network may affect a group of tasks. The rapid increase of system scale makes this problem even worse. However, impacts of network faults are typically highly skew on different parts of the system. On the occurrence of a network fault, there could be a subset of compute nodes, among which the fault influence could be ignored. With this intuition, we designed FIDA, a network fault influence domain analysis tool, which infers which part of the system suffers most severely from the fault. The influence domain given by FIDA will be further delivered to the resource management subsystem as guidelines to allocate healthy nodes preferentially to achieve better performance.
- Is Part Of:
- Computers & electrical engineering. Volume 57(2017)
- Journal:
- Computers & electrical engineering
- Issue:
- Volume 57(2017)
- Issue Display:
- Volume 57, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 57
- Issue:
- 2017
- Issue Sort Value:
- 2017-0057-2017-0000
- Page Start:
- 266
- Page End:
- 280
- Publication Date:
- 2017-01
- Subjects:
- Interconnection network -- Fault influence domain -- High-performance computing -- System management
Computer engineering -- Periodicals
Electrical engineering -- Periodicals
Electrical engineering -- Data processing -- Periodicals
Ordinateurs -- Conception et construction -- Périodiques
Électrotechnique -- Périodiques
Électrotechnique -- Informatique -- Périodiques
Computer engineering
Electrical engineering
Electrical engineering -- Data processing
Periodicals
Electronic journals
621.302854 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00457906/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compeleceng.2016.11.029 ↗
- Languages:
- English
- ISSNs:
- 0045-7906
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.680000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 846.xml