Postprint version. Published in Journal of Intelligent Information Systems, Volume 25, Issue 3, December 1, 2005, pages 293-332. Copyright © 2005 Springer. The original publication is available at http://dx.doi.org/10.1007/s10844-005-0197-8.
NOTE: At the time of publication, the author Alex Dekhtyar was not yet affiliated with Cal Poly.
This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions of discrete random variables with finite domains, and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization. Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial experiments evaluating the system are reported.