Faculty Advisor(s)

Fanchao Meng

Files

Download

Download Poster (339 KB)

Abstract

Pathological speech data is scarce in Speech-Language Pathology (SLP). Synthetic data, thus, is an appealing alternative. It extends the amount of usable data without the risk for privacy concerns that naturally occurring data may bring. In this work, a collection of Large-Language-Model-based methods for generating synthetic pathological speech data are studied. Human experts in SLP as judges delivered negative opinions on the quality of the synthetic data generated by a variety of prompt engineering methods. From the judgements, the resulting data was found to be weak in reflecting the characteristics of the target disorders. Further research will involve fine-tuning the prompt with assistance from an expert. When this is completed, a random selection of this artificial data will be mixed with naturally occurring data and given to SLP experts to determine whether or not they can differentiate between the synthetic or natural examples. This will determine whether or not synthetically generated SLP data is viable.

Publication Date

2025

Document Type

Poster

Department

Computer Science and Mathematics

Keywords

LLM, AI, speech language pathology, large language models

Disciplines

Artificial Intelligence and Robotics | Computer Sciences | Physical Sciences and Mathematics

On the Applicability of Generating Pathological Speech Data Using Large Language Models

Share

COinS