Date of Award

6-2022

Degree Name

MS in Computer Science

Department/Program

Computer Science

College

College of Engineering

Advisor

Foaad Khosmood

Advisor Department

Computer Science

Advisor College

College of Engineering

Abstract

Automatic text summarization has achieved remarkable success with the development of deep neural networks and the availability of standardized benchmark datasets. It can generate fluent, human-like summaries. However, the unreliability of the existing evaluation metrics hinders its practical usage and slows down its progress. To address this issue, we propose an automatic reference-less text summarization evaluation system with dynamically generated synthetic facts. We hypothesize that if a system guarantees a summary that has all the facts that are 100% known in the synthetic document, it can provide natural interpretability and high feasibility in measuring factual consistency and comprehensiveness. To our knowledge, our system is the first system that measures the overarching quality of the text summarization models with factual consistency, comprehensiveness, and compression rate. We validate our system by comparing its correlation with human judgment with existing N-gram overlap-based metrics such as ROUGE and BLEU and a BERT-based evaluation metric, BERTScore. Our system's experimental evaluation of PEGASUS, BART, and T5 outperforms the current evaluation metrics in measuring factual consistency with a noticeable margin and demonstrates its statistical significance in measuring comprehensiveness and overall summary quality.

Share

COinS