Available at: https://digitalcommons.calpoly.edu/theses/1722
Date of Award
MS in Computer Science
Code clones are duplicate fragments of code that perform the same task. As software code bases increase in size, the number of code clones also tends to increase. These code clones, possibly created through copy-and-paste methods or unintentional duplication of effort, increase maintenance cost over the lifespan of the software. Code clone detection tools exist to identify clones where a human search would prove unfeasible, however the quality of the clones found may vary. I demonstrate that the performance of such tools can be improved by normalizing the source code before usage. I developed Normalizer, a tool to transform C source code to normalized source code where the code is written as consistently as possible. By maintaining the code's function while enforcing a strict format, the variability of the programmer's style will be taken out. Thus, code clones may be easier to detect by tools regardless of how it was written.
Reordering statements, removing useless code, and renaming identifiers are used to achieve normalized code. Normalizer was used to show that more clones can be found in Introduction to Computer Networks assignments by normalizing the source code versus the original source code using a small variety of code clone detection tools.