A A+ A++
sem_kat_2026_DZ
Author: Tomasz Strzoda     Published At: 26.01.2026

Seminar: Dawid Zamojski

On January 21, 2026, Dawid Zamojski, BEng, MSc, gave a presentation at our department’s seminar entitled “Training NLP Models Using a Polish Medical Language Corpus”, focused on adapting modern language models to the medical domain.

The main contribution of this work was the creation of a novel and comprehensive Polish medical corpus, composed of domain-specific medical texts. Based on this dataset, domain adaptation of the BERT model was performed, resulting in a Polish version of BERT better capturing the terminology, syntax, and semantics characteristic of medical language.

The presentation covered key stages such as data collection and cleaning, tokenization, model pre-training, and evaluation in the context of NLP tasks relevant to medical documentation processing.

This work was carried out under the supervision of Michal Marczyk, BEng, PhD, DSc, in collaboration with Martyna Szyszka, whose contribution was an important part of the corpus development and analysis process.

© Silesian University of Technology

General information clause on the processing of personal data by the Silesian University of Technology

The authors - the organizational units in which the information materials were produced, are fully responsible for the correctness, up-to-date and legal compliance with the provisions of the law. Hosted by: IT Center of the Silesian University of Technology ()

Data availability statement

„E-Politechnika Śląska - utworzenie platformy elektronicznych usług publicznych Politechniki Śląskiej”

Fundusze Europejskie
Fundusze Europejskie
Fundusze Europejskie
Fundusze Europejskie