Faculty Advisor(s)
Fanchao Meng
Files
Download Poster (339 KB)
Abstract
Pathological speech data is scarce in Speech-Language Pathology (SLP). Synthetic data, thus, is an appealing alternative. It extends the amount of usable data without the risk for privacy concerns that naturally occurring data may bring. In this work, a collection of Large-Language-Model-based methods for generating synthetic pathological speech data are studied. Human experts in SLP as judges delivered negative opinions on the quality of the synthetic data generated by a variety of prompt engineering methods. From the judgements, the resulting data was found to be weak in reflecting the characteristics of the target disorders. Further research will involve fine-tuning the prompt with assistance from an expert. When this is completed, a random selection of this artificial data will be mixed with naturally occurring data and given to SLP experts to determine whether or not they can differentiate between the synthetic or natural examples. This will determine whether or not synthetically generated SLP data is viable.
Publication Date
2025
Document Type
Poster
Department
Computer Science and Mathematics
Keywords
LLM, AI, speech language pathology, large language models
Disciplines
Artificial Intelligence and Robotics | Computer Sciences | Physical Sciences and Mathematics
Recommended Citation
Hopkins, Caroline, "On the Applicability of Generating Pathological Speech Data Using Large Language Models" (2025). SURF Posters 2025. 5.
https://digitalcommons.misericordia.edu/surf2025/5