Advanced search
Start date

Providing fault tolerance for OpenMP target-based applications

Grant number: 21/09355-2
Support type:Scholarships in Brazil - Doctorate
Effective date (Start): October 01, 2021
Effective date (End): July 31, 2022
Field of knowledge:Physical Sciences and Mathematics - Computer Science - Computer Systems
Principal researcher:Guido Costa Souza de Araújo
Grantee:Pedro Henrique di Francia Rosso
Home Institution: Instituto de Computação (IC). Universidade Estadual de Campinas (UNICAMP). Campinas , SP, Brazil
Associated research grant:13/08293-7 - CCES - Center for Computational Engineering and Sciences, AP.CEPID


High Performance Computing (CAD) environments are often being used in the scientific field. OmpCluster aims to facilitate the development of scientific applications in such environments. Since, there is a large computational power involved, failures are expected to occur more frequently. For this reason, Fault Tolerance (TF) is a constant concern within OmpCluster. With part of the system already fault tolerant, with regard to the MPI (Message Passing Interface) standard used within OmpCluster, this project aims to extend and provide TF in the context of OpenMP, the tool on which OmpCluster is developed. The main goal is to provide fault tolerance at points that are missing or that are current limitations for OmpCluster. It is expected that at the end, the entire system involving OmpCluster will be able to deal with failures automatically, without needing interaction from the application developer, except for the desired settings. (AU)

News published in Agência FAPESP Newsletter about the scholarship:
Articles published in other media outlets (0 total):
More itemsLess items