The transformation of words, locations, and human interactions into digital data forms the basis of trend detection and information extraction opportunities that can be automated with the increasing availability of relatively inexpensive computer storage and processing technology. Trend detection, which focuses on what, is facilitated by the ability to apply analytics to an entire corpus of data instead of a random sample. Since the corpus essentially includes all data within a population there is no need to apply any of the precautions that are in order to ensure the representativeness of a sample in traditional statistical analysis. Several examples are presented to validate the principle that with increasing scale data quality becomes less important. Information extraction, which focuses on causality or why, is concerned with the automated extraction of meaning out of unstructured and structured data. This requires examination of the entities in the context of an entire document. While some of the relationships among the recognized entities may be preserved during extraction, the overall context of a document may not be preserved. The role of information representation in the form of an ontology, as a mechanism for facilitating the collection, extraction, organization, analysis, and retrieval of the semantic content of a sizeable data corpus is described with reference to past research findings.


Software Engineering



URL: https://digitalcommons.calpoly.edu/cadrc/104