Wu flu, virus couronné, chiński wirus: A multilingual corpus study of COVID-19 neologisms

Natalia Zawadzka-Paluektau and Aleksandra Tomaszewska (University of Warsaw and University of Sevilla)


Aims: The study aims to contribute to a growing understanding of the COVID-19’s impact on Polish, French, and English by identifying and examining neologisms that have come into use during the pandemic. Given that neologism research through corpus linguistics has predominantly been applied to monolingual analysis (e.g., Würschinger et al., 2016; Svanlund, 2018), another objective of the study is to offer a new perspective into corpus methods and develop a neologism extraction procedure suitable for multilingual corpora.
Corpus and Methods: The analysis is conducted on a corpus of 2,196 news articles (1,600,417 tokens) covering COVID-19, published during the first six months of the pandemic in market-leading newspapers in Poland, France, and the UK. The press is chosen as the study material as it has been shown to play a crucial role in disseminating new words (e.g., Loingsigh, 2018). Two methods of automatic and semi-automatic identification of neologisms are employed and integrated into Sketch Engine (Kilgariff et al., 2014). The first method is based on lexical and punctuational discriminants (Paryzek, 2008; Svanlund, 2018). As discriminants are language-independent, they allow for identifying neologisms from multilingual corpora. The second method consists in comparing the focus corpora to reference corpora to retrieve keywords.
The lists of potential neologisms obtained at the two stages of analysis are then subjected to manual verification. Thus identified neologisms are subsequently examined with regard to their thematic categories, embedding, and internationalization. Results: The two corpus procedures have allowed for the identification of 672 potential neologisms. The manual verification of neologismcandidates has limited their overall number to 300 (105 in the French, 104 in the UK, and 91 in the Polish subcorpus). The identified neologisms are predominantly related to health (e.g., superspreader and its Polish and French equivalents: superroznosiciel and super-épandeur/superpropagateur), as well as personal and institutional measures of mitigating the effects of the pandemic (e.g., social distancing and its Polish and French equivalents: dystansowanie społeczne and distanciation sociale). Internationalisms (e.g., lockdown, contact tracing) are over 30% more numerous than country-specific neologisms (e.g., TGV médicalisé, tarcza antykryzysowa, mask diplomacy) in the corpus, which provides evidence of the significant internationalization of COVID-19 media discourses in the analyzed period. The study of embeddings has revealed the frequent use of signaling and distancing devices (such as so-called and its Polish and French equivalents).
Conclusions: The study demonstrates that the COVID-19 health crisis has exercised a considerable impact on the analyzed languages, which has manifested itself in a surge of new lexis. The global reach of the pandemic, among other factors, explains the significant number of internationalisms among the identified neologisms. At the same time, however, the presence of country-specific items attests to the linguistic necessity to account for the diverging social, legal, institutional, medical, and other circumstances during the pandemic. The use of embedding strategies indicates that the new words have not yet become fully integrated into the analyzed languages. Thus, a recommendation for further research is made to establish whether COVID-19 neologisms are ephemeral or whether they are becoming integral elements of the linguistic resources of English, Polish, and French.


Congreso Cilc 2021


