Preprint version. Published in Information Retrieval in Software Engineering, International Conference on Software Maintenance (ICSM): Philadelphia, PA., September 1, 2006.
Copyright © ACM 2006. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in International Conference on Software Maintenance.
NOTE: At the time of publication, the author Alex Dekhtyar was not yet affiliated with Cal Poly.
Seven to eight years ago, the number of applications of Information Retrieval (IR) methods in Software Engineering was close to zero. These days, IR and text mining methods are accepted approaches to analysis of textual artifacts generated during the software lifcycle. The incentive to try IR methods in such analysis is strong: the field comes with a reputation for proven industrial and academic success, and some important Software Engineering problems related to textual artifacts, can be translated into an instance of a standard IR problem in a reasonably straightforward manner.
In this position paper, we observe that part of the success of IR as a field came from the use of established, well-maintained, and almost universally accepted benchmarks for testing the work of IR methods. We elaborate on the question “Is the field mature enough to talk about benchmarking?” asked by the working session organizers. Our position is that without robust, well-designed time-tested, and, eventually well-established and accepted benchmarks, research on application of IR methods to problems in Software Engineering will not reach its full potential.