Date of Award

6-2026

Degree Name

MS in Computer Science

Department/Program

Computer Science

College

College of Engineering

Advisor

Stephen Beard

Advisor Department

Computer Science

Advisor College

College of Engineering

Abstract

Machine learning training and inference workloads run at massive scale, where the efficiency of generated machine code directly affects throughput and energy consumption. The compilers that lower model specifications to hardware instructions rely on optimization passes that can only exploit information visible in the representations they operate on. MLIR, the intermediate representation framework underlying production machine learning compilers such as IREE and Triton, is designed so that each abstraction level can encode semantics that lower levels cannot represent. One example is memref.subview, which partitions a buffer into typed regions with explicit offsets and sizes, making structural non-overlap provable from the IR. Standard MLIR-to-LLVM lowering discards this structure, reducing subviews to pointer arithmetic that LLVM alias analysis cannot distinguish from potentially overlapping accesses. This loss blocks optimizations such as loop-invariant code motion, redundancy elimination, and vectorization, even when accesses are provably disjoint at the MLIR level.

This thesis presents an out-of-tree MLIR pass pipeline that identifies provably disjoint buffer regions before lowering and preserves the structural proof as LLVM !alias.scope and !noalias metadata. Rather than introducing a new alias analysis, the approach communicates existing information in a form LLVM optimization passes already understand. Across six case-study kernels, the pipeline eliminates all 32 alias-related optimization misses. It achieves a 3.76x speedup on Apple M4 under -force-vector-width=4 by enabling NEON SIMD vectorization. On benchmark-derived kernels from PolyBench/C, IMEX, and IREE, speedups reach 1.51x across platforms and exceed 2x on out-of-order ARM cores, while neutral and regressing cases establish the practical limits of the approach.

Share

COinS