|
Molar: Modular Linux and Adaptive Runtime Support for HEC OX/R Research
Description : Louisiana Tech’s primary role in this project is to lead the high availability and system management efforts, and to collaborate with ORNL, LLBL, and other partners on monitoring, characterization, replication, and other fault tolerance aspects that ensure well-integrated HEC OS solutions. There are seven major steps over the three year life of this project: 1. Prepare dual head HA-OSCAR (1.0) and OSCAR integration for public release. Create prototype of the n+1 head HA-OSCAR (2.0) for fSM building blocks. 2. Assess the viability and effectiveness of various techniques (i.e., statistical techniques, machine learning, time-series analysis, and visualization) in identifying RAS and system management problems. 3. Develop prototype of HA-OSCAR self-healing improvement including compute node coverage. Investigate non-cluster platforms (e.g. CRAY, SGI). 4. Assess interface, protocols, and mechanisms for inter-partition fSM management and support of the OS-level data replication and distributed control service. 5. Develop tool to digest this data and make near-real time recommendations on reliability, serviceability, and availability using statistical techniques, machine learning, time-series analysis, and visualization. 6. Complete the inter-partition fSM management. 7. Create annual progress report and comprehensive final report, and implement prototype software.
Principal Investigator: Leangsuksun, Box -- Computer Science
Collaborators:
Funding Agencies: Department of Energy
| Start Period: 02/01/2005 |
End Period: 01/31/2008 |
Related People
Related Places
|