Postprint version. Published in Proceedings of the 5th Conference on Computing Frontiers: Ischia, Italy, May 5, 2008, pages 139-148.
The definitive version is available at https://doi.org/10.1145/1366230.1366258.
As circuit geometries continue to shrink, and supply voltages remain relatively constant, circuit wearout becomes a concern. We propose that the relative reliability of the circuits of a processor be exposed to the operating system, and be managed by a credit-based wearout monitor. This wearout monitor receives dynamic updates of the reliability of circuits through the use of stability detector circuits that are small enough to be widely deployed. We find that through the combined use of the wearout monitor and stability detectors, we can efficiently and accurately manage the reliability of a processor, and re-coup the performance of a processor that would otherwise be lost when processors are over-provisioned to meet an expected lifetime. We simulate a 16 core DSP with a wearout monitor and stability detectors on a mix of four different media algorithms. Using the wearout monitor and stability detectors, we find that by reducing average performance by only 5%, we can increase the lifetime of the processor by 46%.
Electrical and Computer Engineering
This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 5th Conference on Computing Frontiers.