Available at: http://digitalcommons.calpoly.edu/theses/416
Date of Award
MS in Computer Science
Applications are increasingly hosted on shared, heterogeneous distributed systems. Web applications and “cloud” computing are broad examples of this, but even relatively isolated servers often depend on shared, network-accessible resources. This standardization around service layers abstracts aspects of service delivery, but often entails a large collection of component systems. The complexity that is abstracted from a functional perspective can give rise to runtime issues. Critical issues arising out of this model include resource utilization (sizing, capacity testing, resource monitoring) and dependency management (identifying and monitoring dependencies between components and potential for impact to an application).
This thesis presents an automated solution called DepMap for identifying and monitoring file, network and other communication dependencies in applications by analysis of low-level operating system activity. This work is applicable to resource utilization, diagnosis of performance issues, characterization of workloads, and systems management. This thesis discusses DepMap’s requirements, design and implementation, and evaluates the effectiveness and performance on simulated and actual applications. DepMap can be used to create models of system behavior, based both on observations over time and on injection of delays into selected system operations to understand impact to the larger system.
When applied to two commercial systems DepMap was able to identify changing dependencies, and to characterize the behavior of network and storage dependencies. On a cluster of systems hosting a web application, DepMap showed unexpected variation in network transmission time between peers in the cluster, and helped to uncover large jumps in system clock times arising from unreliable Network Time Protocol (NTP) services. On a database server that processes hundreds of gigabytes of data each day, DepMap was able to characterize I/O workload to storage connected by Fiber Channel and iSCSI. This work showed that serious performance limitations existed in the storage server due to fragmentation, design assumptions poorly suited to a data warehouse workload, and competition from other storage consumers. This information was used to design and validate a new server/storage platform specifically for this workload. The insight gained by using DepMap in this case has provided dramatically improved performance (10-15x faster throughput for some relevant workloads) and large cost savings relative to other options.