Published in MSR 2004: International Workshop on Mining Software Repositories at ICSE’04: Edinburgh, Scotland, May 1, 2004, pages 22-26.
Copyright © 2004 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
NOTE: At the time of publication, the author Alexander Dekhtyar was not yet affiliated with Cal Poly.
Software compiles and therefore is characterized by a parseable grammar. Natural language text rarely conforms to prescriptive grammars and therefore is much harder to parse. Mining parseable structures is easier than mining less structured entities. Therefore, most work on mining repositories focuses on software, not natural language text. Here, we report experiments with mining natural language text (requirements documents) suggesting that: (a) mining natural language is not too diffcult, so (b) software repositories should routinely be augmented with all the natural language text used to develop that software.