Politechnika Śląska | Hybrid system of multi-modal signal acquisition and processing in the analysis of sigmatism in children: the dataset

Hybrid system

of multi-modal signal acquisition and processing in the analysis of sigmatism in children

Multi-modal Polish child speech dataset

is publicly available in accordance with the principle of "as open as possible, as closed as necessary" (Polish National Science Centre principle). To access the data you can submit a written request to the Principal Investigator. It should contain the justification for the need for access and the applicant's data. The Project Manager reserves the right to refuse access in justified cases, such as the need to protect the privacy of study participants, data confidentiality or intellectual property rights.

Data on articulation, acoustics, and visual appearance of the articulators

including normal and distorted child speech (focused on sigmatism). We collected data in six kindergarten and school facilities in Poland during the speech therapy examinations of 201 children aged 4-8. Material includes 15-channel spatial audio signals and a dual-camera stereovision stream of the speaker's oral region, as well as speech-therapy diagnosis. The data record comprises audiovisual recordings of 51 words and 17 logotomes containing all 12 Polish sibilants and the corresponding speech therapy diagnoses from two independent speech therapy experts. In total, we gathered 66 781 audio-video segments, including 12 830 words and 53 951 phonemes (12 576 sibilants).

Data organization

The structure of the database (main folder) is shown in the picture below. Each participant has a separate folder with the audio, video, and speech diagnosis data. Folders of speakers are named 00XXX, where XXX stands for the anonymized three-digit ID of a participant. The database includes also five CSV files with dataset specifications, and a PDF file presenting the diagnosis dictionary.

To get more details regarding the participants and the language material, you can access two summaries below. The csv file named participantSummarygathers the anonymized data of the children examined (including age, sex, folders and files structure, etc.), while the file segmentSummary describes all audio-visual segments available in the dataset (including words, logatomes, phonemes; for each file, a quality validation is ensured).