Published in Proceedings of the 2006 IEEE International Symposium on Workload Characterization: San Jose, CA, October 25, 2006, pages 142-149.
Copyright © 2006 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. The definitive version is available at http://dx.doi.org/10.1109/IISWC.2006.302738.
NOTE: At the time of publication, the author John Oliver was not yet affiliated with Cal Poly.
Soft errors have become a significant concern and recent studies have measured the "architectural vulnerability factor" of systems to such errors, or conversely, the potential that a soft error is masked by latches or other system behavior. We take soft-error tolerance one step further and examine when an application can tolerate errors that are not masked. For example, a video decoder or approximation algorithm can tolerate errors if the user is willing to accept degraded output. The key observation is that while the decoder can tolerate error in its data, it can not tolerate error in its control. We first present static analysis that protects most control operations. We examine several SPEC CPU2000 and MiBench benchmarks for error tolerance, develop fidelity measures for each, and quantify the effect of errors on fidelity. We show that protecting control is crucial to producing error-tolerance, for without this protection, many applications experience catastrophic errors (infinite execution time or crashing). Overall, our results indicate that with simple control protection, the error tolerance of many applications can provide designers with considerable added flexibility when considering future challenges posed by soft errors.
Electrical and Computer Engineering