Towards an Autonomic Resource Management for a Robust Execution of Scientific Applications
Mississippi State University
Science and engineering research communities are continuously interested in solving problems of increased complexity. The rapid development of computing technology has increased the complexity of computational systems and the ability to solve large and more complex scientific problems. Over the years, scientific computing and data analytics have benefited from research advances in architecture, hardware platforms and software environments, programming models, algorithms, and from many tools and techniques that evolved from these advances. In recent years, “big data”, machine learning and predictive data analytics have been raised as the fourth paradigm of science, in addition to the other three: theory, experiment and computational simulation. Big data, via data analytics, allows researchers to extract insights from scientific instruments, as well as computational simulations. Moreover, in most science domains, data-generation capabilities are growing more rapidly than compute capabilities, causing the high compute domains to become “data intensive”. The increasing need and relevance of using high performance computing with big data applications lead to the birth of this “dual use model” (computing and data analysis), in which the interdependence of computational modeling and data analytics in an advanced computing system raises technical challenges.
Scientific computing and data analytics problems are often intractable (very large, complex), often exhibit irregular and stochastic behavior (data- or time-dependent), and therefore require adaptive algorithms. The resulting applications run on heterogeneous environments (clusters, grids, clouds), which often are expected to offer an efficient, robust, and cost-effective (high utility and green) execution to multiple applications. Moreover, services in cloud computing are also subjected to outages or even data losses that could result from reasons varying from hardware and/or software failures leading to a violation of service level agreements, at both the user level, and the different service provider levels. A number of solutions involving adaptivity have been proposed and implemented at the application and system levels. They often rely on adaptive algorithms and optimization techniques, which may use probabilistic analyses, queuing theory, control-theoretic, machine learning, biologically inspired, and others. During the last decade, an autonomic computing approach has been proposed as a solution to system complexity, due to self-management capabilities known to be exhibited by an autonomic computing system.
In this talk, I will present a few challenges with which the research community is confronted in addressing the issues mentioned above at application and system levels. I will focus on a few recent steps taken towards developing a technology that would enable a robust, cost-effective resource management of applications using an autonomic computing approach.
Ioana Banicescu is a professor in the Department of Computer Science and Engineering, a Director of the Center for Cloud and Autonomic Computing at Mississippi State University (MSU), and also a Co-Director of the National Science Foundation Center for Cloud and Autonomic Computing. She received the Diploma in Engineering (Electronics and Telecommunications) from Polytechnic University – Bucharest, and the M.S. and the Ph.D. degrees in Computer Science from New York University – Polytechnic Institute. Professor Banicescu’s research interests include parallel algorithms, scientific computing, scheduling theory, load balancing algorithms, performance modeling, analysis and prediction. Currently, her research focus is on autonomic computing, performance optimization for problems in computational science, and graph analytics. She has given many invited talks at universities, government laboratories, and at various national and international forums in the United States and overseas. She has authored and co- authored more than 100 articles published in journals, books, and conference proceedings.
Professor Banicescu is the recipient of a number of awards for research and scholarship from the National Science Foundation (NSF), including the prestigious NSF CAREER award, three NSF Information Technology awards, and others. She served and continues to serve on numerous research review panels for advanced research grants in the US and Europe, on steering and program committees of a number of international conferences, symposia and workshops, on the Executive Board and Advisory Board of the IEEE Technical Committee on Parallel Processing (TCPP). She is an Associate Editor of the Cluster Computing journal and the International Journal on Computational Science and Engineering. Over the years, Professor Banicescu was recognized with many distinctions for her scholarly contributions, including the NSF – Stanford University – New Century Scholars, Hearin Professor of Engineering Award, Hearin Eminent Scholar Award, and others.