DOI: https://doi.org/10.15368/theses.2014.141
Available at: https://digitalcommons.calpoly.edu/theses/1289
Date of Award
8-2014
Degree Name
MS in Computer Science
Department/Program
Computer Science
Advisor
Foaad Khosmood
Abstract
Comic books are a unique and increasingly popular form of entertainment combining visual and textual elements of communication. This work pertains to making comic books more accessible. Specifically, this paper explains how we detect elements such as speech bubbles present in Japanese comic book panels. Some applications of the work presented in this paper are automatic detection of text and its transformation into audio or into other languages. Automatic detection of elements can also allow reasoning and analysis at a deeper semantic level than what’s possible today. Our approach uses an expert system and a machine learning system. The expert system process information from images and inspires feature sets which help train the machine learning system. The expert system detects speech bubbles based on heuristics. The machine learning system uses machine learning algorithms. Specifically, Naive Bayes, Maximum Entropy, and support vector machine are used to detect speech bubbles. The algorithms are trained in a fully-supervised way and a semi-supervised way. Both the expert system and the machine learning system achieved high accuracy. We are able to train the machine learning algorithms to detect speech bubbles just as accurately as the expert system. We also applied the same approach to eye detection of characters in the panels, and are able to detect majority of the eyes but with low precision. However, we are able to improve the performance of our eye detection system significantly by combining the SVM and either the Naive Bayes or the AdaBoost classifiers.