SC|06 Powerful Beyond Imagination
SC06 is the International Conference for High Performance Computing Networking and Storage

About Registration Conference Technical Program Exhibits News and Press Travel

Home Conference Schedule



SCHEDULE: NOV 11-17, 2006

Entire WeekSaturdaySundayMondayTuesdayWednesdayThursdayFriday
My Itinerary



Problem Diagnosis in Large-Scale Computing Environments

Session: Scalable Systems Software

Event Type: Paper

Time: 2:00pm - 2:30pm

Session Chair: Elisa Heymann

Author(s): Alexander V. Mirgorodskiy, Naoya Maruyama, Barton P. Miller

Location: 18-19

Abstract:
We describe a new approach for locating the causes of anomalies in distributed systems. Our target environment is a distributed application that contains multiple identical processes performing similar activities. We use a new, lightweight form of dynamic instrumentation to collect function-level traces from each process. If the application fails, the traces are automatically compared to each other. We find anomalies by identifying processes that stopped earlier than the rest (sign of a fail-stop problem) or processes that behaved different from the rest (sign of a non-fail-stop problem). Our algorithm does not require reference data to distinguish anomalies from normal behaviors. However, it can make use of such data when available to reduce the number of false positives. Ultimately, we identify a function that is likely to explain the anomalous behavior. We demonstrated the efficacy of our approach by finding two problems in a large distributed cluster environment called SCore.

This paper can be found in the ACM and IEEE Digital Libaries
Click here for ACM
Click here for IEEE



Chair/ Author Details:

Elisa Heymann (Chair)
Universitat Autonoma de Barcelona

Alexander V. Mirgorodskiy
VMware, Inc.

Naoya Maruyama
Tokyo Institute of Technology

Barton P. Miller
University of Wisconsin






Home | About | Contact Us | Registration | Sitemap
IEEEComputer SocietyACM