Date of Award

6-2023

Degree Name

MS in Computer Science

Department/Program

Computer Science

College

College of Engineering

Advisor

Franz Kurfess

Advisor Department

Computer Science

Advisor College

College of Engineering

Abstract

Using deep learning to synthetically generate music is a research domain that has gained more attention from the public in the past few years. A subproblem of music generation is music extension, or the task of taking existing music and extending it. This work proposes the Continuer Pipeline, a novel technique that uses deep learning to take music and extend it in 5 second increments. It does this by treating the musical generation process as an image generation problem; we utilize latent diffusion models (LDMs) to generate spectrograms, which are image representations of music. The Continuer Pipeline is able to receive a waveform as an input, and its output will be what the pipeline predicts the next five seconds might sound like. We trained the Continuer Pipeline using the expansive diffusion model functionality provided by the HuggingFace platform, and our dataset consisted of 256x256 spectrogram images representing 5-second snippets of various hip-hop songs from Spotify. The musical waveforms generated by the Continuer Pipeline are currently at a much lower quality compared to human-generated music, but we affirm that the Continuer Pipeline still has many uses in its current state, and we describe many avenues for future improvement to this technology.

A Novel Approach to Extending Music Using Latent Diffusion.pdf (4584 kB)
Submission with approval of Monique @ Grad Writing Office

Share

COinS