Theses and Dissertations


Zimin Wang

Issuing Body

Mississippi State University


Abdelwahed, Sherif

Committee Member

Jones, A. Bryan

Committee Member

Follett, F. Randolph

Date of Degree


Document Type

Graduate Thesis - Open Access


Electrical Engineering

Degree Name

Master of Science


James Worth Bagley College of Engineering


Department of Electrical and Computer Engineering


Large-scale distributed computing systems such as data centers are hosted on heterogeneous and networked servers that execute in a dynamic and uncertain operating environment, caused by factors such as time-varying user workload and various failures. Therefore, achieving stringent quality-of-service goals is a challenging task, requiring a comprehensive approach to performance control, fault diagnosis, and failure recovery. This work presents a model-based approach for fault management, which integrates limited lookahead control (LLC), diagnosis, and fault-tolerance concepts that: (1) enables systems to adapt to environment variations, (2) maintains the availability and reliability of the system, (3) facilitates system recovery from failures. We focused on memory leak errors in this thesis. A characterization function is designed to detect memory leaks. Then, a LLC is applied to enable the computing system to adapt efficiently to variations in the workload, and to enable the system recover from memory leaks and maintain functionality.