Date of Award

6-2024

Degree Name

MS in Statistics

Department/Program

Statistics

College

College of Science and Mathematics

Advisor

Trevor Ruiz

Advisor Department

Statistics

Advisor College

College of Science and Mathematics

Abstract

Understanding marine mammal populations and how they are affected by human activity and ocean conditions is vital, especially in tracking population declines and monitoring endangered species. However, tracking marine mammal populations and their distribution is challenging due to difficulties in observation and costs. Using surrounding plankton environmental DNA (eDNA) has the potential to provide an indirect measure of monitoring cetacean abundances based on ecological associations. This project aims to apply statistical methods to assess the relationship of visual abundances of common species of baleen whales with amplicon sequence variants (ASV) of plankton eDNA samples from the NOAA-CalCOFI Ocean Genomics (NCOG) project. Modeling this relationship of eDNA with marine mammal sightings may greatly aid the ability to predict the abundance of whales in the ocean.

There are several key challenges associated with the analysis of this NCOG data. Plankton eDNA samples are an example of compositional data, where the proportions of each ASV must sum to one; this provides a challenging constraint for statistical analysis and interpretation. High dimensionality (the number of parameters exceeds the observations) and sparsity (many observed zeros) of the genetic sequencing data also pose challenges in estimating parameters. Finally, the model associations should be adjusted for related factors, including seasonality and oceanographic factors, the latter of which goes beyond this project's scope.

This thesis develops and fits models to estimate cetacean abundance from plankton eDNA by leveraging methods of compositional data analysis and high-dimensional regression. This project applies log-ratio data transformations and corresponding log-contrast models to address the compositional aspect of eDNA reads. Regression methods involving high-dimensional data typically rely on dimensionality reduction or regularization. This project implements both reduction and regularization through sparse partial least squares (sPLS) regression. In addition to the data modeling objective of using plankton eDNA to predict baleen whale abundances, this project also identifies ecological correlations between whale abundance and plankton eDNA.

Share

COinS