Computer Vision News - July‏ 2024

Computer Vision News Computer Vision News 38 Congrats, Doctor Philipp! Speech and language technology has become ubiquitous and significantly more powerful in recent years. These advancements came with highly complex models which demanded enormous amounts of training data. In the domain of pathological speech, such amounts are unheard of. For a robust modern speech recognizer, we are talking about many thousands of hours of transcribed audio samples from at least some 10.000 different speakers. Good luck trying to adopt this for pathological speech! In his thesis, Philipp proposed a solution to this problem which may appear counterintuitive at first: No pathological data was used during the optimization of any of the large models. Instead, his algorithms rely exclusively on off-the-shelf speech recognition datasets which had been collected from healthy speakers. But how could such a model help to analyze pathological speech? This is where phonetics come into play, the science of speech production, transmission and perception. It helps explain how and why the speech from patients of a particular medical condition deviates from the healthy reference. Not only could the presented approach solve the problem of data scarcity in the medical domain, but it also yields very explainable outputs which are much easier for a clinical expert to understand and interpret. For example, it was possible to show Philipp Klumpp completed his PhD just a few weeks ago. He worked with the team for speech processing and understanding of the Pattern Recognition Lab at FAU ErlangenNürnberg. Under the supervision of Elmar Nöth, his research focused on the automated analysis of pathological speech and language using modern ML techniques. Philipp is now working as a Data Scientist for DATEV.

Made with FlippingBook

RkJQdWJsaXNoZXIy NTc3NzU=