Seminar: Dawid Zamojski
On January 21, 2026, Dawid Zamojski, BEng, MSc, gave a presentation at our department’s seminar entitled “Training NLP Models Using a Polish Medical Language Corpus”, focused on adapting modern language models to the medical domain.
The main contribution of this work was the creation of a novel and comprehensive Polish medical corpus, composed of domain-specific medical texts. Based on this dataset, domain adaptation of the BERT model was performed, resulting in a Polish version of BERT better capturing the terminology, syntax, and semantics characteristic of medical language.
The presentation covered key stages such as data collection and cleaning, tokenization, model pre-training, and evaluation in the context of NLP tasks relevant to medical documentation processing.
This work was carried out under the supervision of Michal Marczyk, BEng, PhD, DSc, in collaboration with Martyna Szyszka, whose contribution was an important part of the corpus development and analysis process.