Postprint version. Published in International Symposium on Applications and the Internet Proceedings: Tokyo, Japan, January 26, 2004, pages 188-194.
Copyright © 2004 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The definitive version is available at http://dx.doi.org/10.1109/SAINT.2004.1266115.
User perceived quality is the most important aspect of Internet applications. After a single negative experience, users tend to switch to one of the other myriad of alternatives available to them on the Internet. Two key components of Internet application quality are scalability and reliability. In this paper, we present the first general-purpose mechanism capable of maintaining reliability in the face of process, machine, and catastrophic failures. We define catastrophic failures as events that cause entire clusters of servers to become unavailable such as network partitioning, router failures, natural disasters, or even terrorist attacks. Our mechanism utilizes client-side tunneling, clientside redirection, and implicit redirection triggers to deliver reliable communication channels. We capitalize on previous work, Redirectable Sockets (RedSocks), that focuses on Internet application scalability. RedSocks are communication channels enhanced with a novel session layer aimed at modernizing network communication. We modify Red- Socks to create the first fault tolerant socket solution that can handle all server-side failures. Our mechanism is compatible with NATs and Firewalls, scalable, application independent, and backwards compatible.