Postprint version. Published in Requirements Engineering, Volume 15, Issue 3, January 1, 2010, pages 313-335.
The definitive version is available at https://doi.org/10.1007/s00766-009-0096-6.
The generation of traceability links or traceability matrices is vital to many software engineering activities. It is also person-power intensive, time-consuming, error-prone, and lacks tool support. The activities that require traceability information include, but are not limited to, risk analysis, impact analysis, criticality assessment, test coverage analysis, and verification and validation of software systems. Information Retrieval (IR) techniques have been shown to assist with the automated generation of traceability links by reducing the time it takes to generate the traceability mapping. Researchers have applied techniques such as Latent Semantic Indexing (LSI), vector space retrieval, and probabilistic IR and have enjoyed some success. This paper concentrates on examining issues not previously widely studied in the context of traceability: the importance of the vocabulary base used for tracing and the evaluation and assessment of traceability mappings and methods using secondary measures. We examine these areas and perform empirical studies to understand the importance of each to the traceability of software engineering artifacts.