Published in 8th International IEEE Conference on Intelligent Transportation Systems Proceedings: Vienna, Austria, September 13, 2005, pages 819-824.
Copyright © 2005 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
NOTE: At the time of publication, the author Anurag Pande was not yet affiliated with Cal Poly.
Data mining is the analysis of large “observational” datasets to find unsuspected relationships that might be useful to the data owner. It typically involves analysis where objectives of the mining exercise have no bearing on the data collection strategy. Freeway traffic surveillance data collected through underground loop detectors is one such “observational” database maintained for various ITS (Intelligent Transportation Systems) applications such as travel time prediction etc. In this research data mining process is used to relate this surrogate measure of traffic conditions (data from freeway loop detectors) with occurrence of rear-end crashes on freeways. The results from this analysis are envisioned to be the first step in the development of a functional proactive traffic management system.
The dataset under consideration includes information on crashes and corresponding traffic data collected from detectors neighboring the crash locations just prior to the time of the crash. The problem is setup as a classification problem for a crash being rear-end vs. not. Three types of classification tree involving different splitting criterion were attempted for variable selection. It was found that the classification tree with chi sq. test as the splitting criterion resulted in the most inclusive list of variables. The variable selection was followed by two neural network architectures, namely, the RBF (radial basis function) and MLP (multi-layer perceptron) to model the binary target variable. The two neural network models were then combined based on their output to achieve any possible improvement in the classification accuracy. It was found, however, that the classification tree model with chi sq. test as splitting criterion (with more than 65% classification accuracy) was better than any of the individual or combined neural network models (54-55% classification accuracy). Since the decision tree model also provides simple interpretable rules to classify the data in a real-time application it was recommended as the final classification model
Civil and Environmental Engineering