Date of Award


Degree Name

MS in Computer Science


Computer Science


College of Engineering


Davide Falessi

Advisor Department

Computer Science

Advisor College

College of Engineering


Software traceability establishes a network of connections between diverse artifacts such as requirements, design, and code. However, given the cost and effort of creating and maintaining trace links manually, researchers have proposed automated approaches using information retrieval techniques. Current approaches focus almost entirely upon generating links between pairs of artifacts and have not leveraged the broader network of interconnected artifacts. In this paper we investigate the use of intermediate artifacts to enhance the accuracy of the generated trace links – focus- ing on paths consisting of source, target, and intermediate artifacts. We propose and evaluate combinations of techniques for computing semantic similarity, scaling scores across multiple paths, and aggregating results from multiple paths. We report results from five projects, including one large industrial project. We find that leverag- ing intermediate artifacts improves the accuracy of end-to-end trace retrieval across all datasets and accuracy metrics. After further analysis, we discover that leveraging intermediate artifacts is only helpful when a project’s artifacts share a common vocabulary, which tends to occur in refinement and decomposition hierarchies of artifacts. Given our hybrid approach that integrates both direct and transitive links, we observed little to no loss of accuracy when intermediate artifacts lacked a shared vocabulary with source or target artifacts.