A A+ A++

Hybrid system

of multi-modal signal acquisition and processing in the analysis of sigmatism in children

Hybrid system

of multi-modal signal acquisition and processing in the analysis of sigmatism in children
Title: Hybrid system of multi-modal signal acquisition and processing in the analysis of sigmatism in children
Number: 2018/30/E/ST7/00525
Delivery time: 2019 – 2024
Funding: National Science Center, Sonata Bis 8
Principal investigator: dr hab. inż. Paweł Badura, prof. PŚ
Sigmatism is one of the most common speech disorders

Sigmatism (lisping) is a speech disorder in which sibilant consonants (in orthographic notation: s, z, c, dz, sz, ż, cz, dż, ś, ź, ć, dź) are misarticulated. The issue often results from an incorrect tongue positioning or a child's anatomical conditions. There are many types of lisping, and diagnosis is based mainly on observation of the speech organs' (articulators) performance. Diagnosis and therapy of sigmatism are difficult due to frequent problems with a precise assessment of what is happening in the oral cavity reported by speech-language therapists.

Goal: to design advanced methods for sigmatism diagnosis in children

The project's main achievements include designing and preparing a multimodal data acquisition device (MDAD). It records a 15-channel acoustic signal and a dual-camera stereovision stream of the speaker's oral. We used the device to examine preschool children (aged 4--8), which resulted in a database of 201 records. The data record comprises audiovisual recordings of 51 words and 17 logatomes containing all 12 Polish sibilants and the corresponding speech therapy diagnoses from two independent speech therapy experts. The extensive database allowed us to examine the relationships between specific acoustic and image features and articulation manner.

Analysis of the material
Development of new signal processing methods

Development of new signal processing methods

including articulators' segmentation and acoustic and image features extraction
Analysis of spatial domain acoustic features

Analysis of spatial domain acoustic features

especially band above the 2 kHz, which contains noise typical of sibilants
Detection and segmentation of speech organs using deep learning

Detection and segmentation of speech organs using deep learning

including the usage of YOLOv6 and DeepLab v3+ to delineate lips, teeth, and tongue
Development of new signal processing methods

Development of new signal processing methods

including articulators' segmentation and acoustic and image features extraction
Analysis of spatial domain acoustic features

Analysis of spatial domain acoustic features

especially the 2 kHz band, which contains noise typical of sibilants
Analysis of 2D and 3D image features

Analysis of 2D and 3D image features

regarding texture and shape of articulators
Statistical analysis and classification

Statistical analysis and classification

using acoustic and image features to examine the relationship between parameters and selected articulation aspects and to classify different variants of non-normative articulation
 4D multimodal speaker model for remote speech diagnosis

4D multimodal speaker model for remote speech diagnosis

allowing for tracking of articulator movements in time
Detection and segmentation of speech organs using deep learning

Detection and segmentation of speech organs using deep learning

including the usage of YOLOv6 and DeepLab v3+ to delineate lips, teeth, and tongue
Analysis of 2D and 3D image features

Analysis of 2D and 3D image features

regarding texture and shape of articulators

Read more about the PAVSig dataset

Statistical analysis and classification

Statistical analysis and classification

using acoustic and image features to examine the relationship between parameters and selected articulation aspects and to classify different variants of non-normative articulation
 4D multimodal speaker model for remote speech diagnosis

4D multimodal speaker model for remote speech diagnosis

allowing for tracking of articulator movements in time
Publications
Sage, A. (2025). Performance analysis of 2D and 3D image features for computer-assisted speech diagnosis of dental sibilants in Polish children. Computer Methods and Programs in Biomedicine, 108716. https://https://doi.org/10.1016/j.cmpb.2025.108716
 
Sage, A., & Badura, P. (2024). Detection and segmentation of mouth region in stereo stream using YOLOv6 and DeepLab v3+ models for computer-aided speech diagnosis in children. Applied Sciences-Basel, 14, Article 16. https://doi.org/10.3390/app14167146
 
Sage, A., Miodońska, Z., Kręcichwost, M., & Badura, P. (2024). Hybridization of acoustic and visual features of Polish sibilants produced by children for computer speech diagnosis. Sensors, 24, Article 16. https://doi.org/10.3390/s24165360
 
Miodońska, Z., Kręcichwost, M., Kwaśniok, E., Sage, A., & Badura, P. (2024). Frication noise features of Polish voiceless dental fricative and affricate produced by children with and without speech disorder. W (Red.), Proceedings of INTERSPEECH 2024 (s. 3125–3129). ISCA. https://doi.org/10.21437/interspeech.2024-1731
 
Trzaskalik, J., Kwaśniok, E., Miodońska, Z., Kręcichwost, M., Sage, A., & Badura, P. (2023). Hybrid system for acquisition and processing of multimodal signal: population study on normal and distorted pronunciation of sibilants in Polish preschool children. W P. Strumiłło, A. Klepaczko, M. Strzelecki, & D. Bociąga (Red.), XXXIII Polish Conference on Biocybernetics and Biomedical Engineering. Book of abstracts (s. 81).
 
Miodońska, Z., Levelt, C., Moćko, N., Kręcichwost, M., Sage, A., & Badura, P. (2023). Are retroflex-to-dental sibilant substitutions in Polish children’s speech an example of a covert contrast? A preliminary acoustic study. W (Red.), Proceedings of INTERSPEECH 2023 (s. 3122–3126). ISCA. https://doi.org/10.21437/Interspeech.2023-2046
 
Kręcichwost, M., Sage, A., Miodońska, Z., & Badura, P. (2022). 4D multimodal speaker model for remote speech diagnosis. IEEE Access, 10, 93187–93202. https://doi.org/10.1109/access.2022.3203572
 
Miodońska, Z., Badura, P., & Mocko, N. (2022). Noise-based acoustic features of Polish retroflex fricatives in children with normal pronunciation and speech disorder. Journal of Phonetics, 92, 1–16. https://doi.org/10.1016/j.wocn.2022.101149
 
Sage, A., Miodońska, Z., Kręcichwost, M., Trzaskalik, J., Kwaśniok, E., & Badura, P. (2021). Deep learning approach to automated segmentation of tongue in camera images for computer-aided speech diagnosis. W E. Piętka, P. Badura, J. Kawa, & W. Więcławek (Red.), Information Technology in Biomedicine (T. 1186, s. 41–51). https://doi.org/10.1007/978-3-030-49666-1_4
 
Kręcichwost, M., Mocko, N., & Badura, P. (2021). Automated detection of sigmatism using deep learning applied to multichannel speech signal. Biomedical Signal Processing and Control, 68, 1–11. https://doi.org/10.1016/j.bspc.2021.102612
 
Kręcichwost, M., Miodońska, Z., Trzaskalik, J., & Badura, P. (2020). Multichannel speech acquisition and analysis for computer-aided sigmatism diagnosis in children. IEEE Access, 8, 98647–98658. https://doi.org/10.1109/ACCESS.2020.2996413
RESEARCH TEAM
RESEARCH TEAM

dr hab. inż. Paweł Badura, prof. PŚ

Principal Investigator

dr hab. inż. Paweł Badura, prof. PŚ

Principal Investigator

dr inż. Zuzanna Miodońska

Head of Acoustic and Visual Research

dr inż. Zuzanna Miodońska

Head of Acoustic and Visual Research

dr Joanna Trzaskalik

Head of Speech-Language Pathology Research

dr Joanna Trzaskalik

Head of Speech-Language Pathology Research

dr inż. Michał Kręcichwost

Chief Biomedical Engineer

dr inż. Michał Kręcichwost

Chief Biomedical Engineer

mgr inż. Agata Sage

Biomedical Engineer

mgr inż. Agata Sage

Biomedical Engineer

mgr Ewa Kwaśniok

Speech-Language Therapist

mgr Ewa Kwaśniok

Speech-Language Therapist

dr hab. inż. Paweł Badura, prof. PŚ

Principal Investigator

dr inż. Zuzanna Miodońska

Head of Acoustic and Visual Research

dr Joanna Trzaskalik

Head of Speech-Language Pathology Research

dr inż. Michał Kręcichwost

Chief Biomedical Engineer

mgr inż. Agata Sage

Biomedical Engineer

mgr Ewa Kwaśniok

Speech-Language Pathologist

© Silesian University of Technology

General information clause on the processing of personal data by the Silesian University of Technology

The authors - the organizational units in which the information materials were produced, are fully responsible for the correctness, up-to-date and legal compliance with the provisions of the law. Hosted by: IT Center of the Silesian University of Technology ()

Data availability statement

„E-Politechnika Śląska - utworzenie platformy elektronicznych usług publicznych Politechniki Śląskiej”

Fundusze Europejskie
Fundusze Europejskie
Fundusze Europejskie
Fundusze Europejskie