HOME > 상세정보

상세정보

Fault-tolerance techniques for high-performance computing [electronic resource]

Fault-tolerance techniques for high-performance computing [electronic resource]

자료유형
E-Book(소장)
개인저자
Herault, Thomas. Robert, Yves, 1938-.
서명 / 저자사항
Fault-tolerance techniques for high-performance computing [electronic resource] / Thomas Herault, Yves Robert, editors.
발행사항
Cham :   Springer International Publishing :   Imprint: Springer,   2015.  
형태사항
1 online resource(ix, 320 p.) : ill.
총서사항
Computer communications and networks,1617-7975
ISBN
9783319209432
요약
This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Supérieure de Lyon, France, and a Visiting Research Scholar in the ICL.
일반주기
Title from e-Book title page.  
내용주기
Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.
서지주기
Includes bibliographical references and index.
이용가능한 다른형태자료
Issued also as a book.  
일반주제명
Fault-tolerant computing. High performance computing.
바로가기
URL
000 00000nam u2200205 a 4500
001 000046038257
005 20200727163158
006 m d
007 cr
008 200723s2015 sz a ob 001 0 eng d
020 ▼a 9783319209432
040 ▼a 211009 ▼c 211009 ▼d 211009
082 0 4 ▼a 004.2 ▼2 23
084 ▼a 004.2 ▼2 DDCK
090 ▼a 004.2
245 0 0 ▼a Fault-tolerance techniques for high-performance computing ▼h [electronic resource] / ▼c Thomas Herault, Yves Robert, editors.
260 ▼a Cham : ▼b Springer International Publishing : ▼b Imprint: Springer, ▼c 2015.
300 ▼a 1 online resource(ix, 320 p.) : ▼b ill.
490 1 ▼a Computer communications and networks, ▼x 1617-7975
500 ▼a Title from e-Book title page.
504 ▼a Includes bibliographical references and index.
505 0 ▼a Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.
520 ▼a This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Supérieure de Lyon, France, and a Visiting Research Scholar in the ICL.
530 ▼a Issued also as a book.
538 ▼a Mode of access: World Wide Web.
650 0 ▼a Fault-tolerant computing.
650 0 ▼a High performance computing.
700 1 ▼a Herault, Thomas.
700 1 ▼a Robert, Yves, ▼d 1938-.
830 0 ▼a Computer communications and networks.
856 4 0 ▼u https://oca.korea.ac.kr/link.n2s?url=http://dx.doi.org/10.1007/978-3-319-20943-2
945 ▼a KLPA
991 ▼a E-Book(소장)

소장정보

No. 소장처 청구기호 등록번호 도서상태 반납예정일 예약 서비스
No. 1 소장처 중앙도서관/e-Book 컬렉션/ 청구기호 CR 004.2 등록번호 E14028265 도서상태 대출불가(열람가능) 반납예정일 예약 서비스 M

관련분야 신착자료

Forouzan, Behrouz A. (2022)