Available at: https://digitalcommons.calpoly.edu/theses/2779
Date of Award
3-2024
Degree Name
MS in Electrical Engineering
Department/Program
Electrical Engineering
College
College of Engineering
Advisor
Jane Zhang
Advisor Department
Electrical Engineering
Advisor College
College of Engineering
Abstract
Breast cancer is one of the deadliest cancers for women. In the US, 1 in 8 women will be diagnosed with breast cancer within their lifetimes. Detection and diagnosis play an important role in saving lives. To this end, many classifiers with varying structures have been designed to classify breast cancer histopathological images. However, randomly partitioning data, like many previous works have done, can lead to artificially inflated accuracies and classifiers that do not generalize. Data leakage occurs when researchers assume that every image in a dataset is independent of each other, which is often not the case for medical datasets, where multiple images are taken of each patient. This work focuses on convolutional neural network binary classifiers using the BreakHis dataset. Previous works are reviewed. Classifiers from previous literature are tested with patient partitioning, where individual patients are placed in the training, testing and validation sets so that there is no overlap. A classifier which previously achieved 93% accuracy consistently, only achieved 79% accuracy with the new patient partition. Robust data augmentation, a Sigmoid output layer and a different form of min-max normalization were utilized to achieve an accuracy of 89.38%. These improvements were shown to be effective with the architectures used. Sigmoid Model 1.1 is shown to perform well compared to much deeper architectures found in literature.