Idioma: Español
Fecha: Subida: 2021-04-15T00:00:00+02:00
Duración: 19m 06s
Lugar: Conferencia
Visitas: 394 visitas

The ORD Corpus of Russian Everyday Speech from the Perspective of Pragmatic Markers

Tatiana Sherstinova and Natalia Bogdanova-Beglarian (St. Petersburg State University)


The ORD corpus of Russian everyday speech known as “One Day of Speech” corpus is the largest linguistic resource of present-day spoken Russian [1]. The corpus is being created with the aim to study Russian spontaneous speech and spoken conversations in natural everyday settings [2]. For collecting data for this resource, individuals-volunteers of both sexes between the ages of 16 and 83 and of different occupations were asked to spend a day with active audio recorders in order to record all their spoken interactions [ibid.]. Now, the ORD collection exceeds 1250 h of recordings, presenting speech of 128 respondents (66 men and 62 women) and hundreds of their interlocutors. 2850 macro episodes of everyday spoken communication have been already annotated, and the speech transcripts add up to 1 mln tokens [3]. The unique recordings of the ORD corpus provide broad opportunities to study Russian speech in linguistic, sociolinguistic, and pragmatic aspects [4].
The report describes a project based on the ORD data, which is aimed to investigate pragmatic markers in everyday communication. The term “pragmatic markers” is used to denote specific items of spoken speech, which are not part of the propositional content of the sentence, but which perform various (often multiple) pragmatic functions [5; 6]. Pragmatic markers help speakers to initiate/close discourse; to attract the attention of the hearer; to mark a discourse boundary; to aid the speaker to find the proper words; to repair one’s own or others’ discourse; to serve as a filler; to express a response/reaction/attitude towards the discourse (including as well “back-channel” signals of understanding); and so on [7; 8]. What makes pragmatic markers difficult to research is that they often look like “standard” words, they much depends on the context and they are usually difficult to translate.
The ORD corpus was specially adopted for pragmatic markers research. Thus, we have expanded the number of annotation levels for spoken data introducing that of pragmatic markers, their functions, variants and some other features of occurrences (e. g., phonetic realization). The continuous annotation of pragmatic markers has been made for the subcorpus of 300,000 tokens. Basing on these data we have obtained preliminary statistics on pragmatic markers distribution in everyday speech. The total share of pragmatic markers in speech turned out to be is 27,753 ipm or 2.7%. However in speech of individual speakers their share can reach up to 6% of the total number of words. All the variety of Russian PMs variants takes origin from 59 standard (basic) forms. The most frequent variants of PMs in spoken Russian turned out to be the following: “vot” (well), “tam” (there), “da” (yes), “kak by” (as if), “tak” (so), “znachit” (it means), “govorit” (he says), “nu vot” (there you are), “znaesh” (you know), “slushay” (listen). As for their types, the most common ones turned out to be hesitation markers (8921 ipm), metacommunicative markers (3362 ipm), and boundary markers, which include starting, finalizing and navigational ones (3108 ipm). Besides, we obtained statistics on pragmatic markers use in different settings of communication (at home, in the office, in university, in restaurant, in the shop, etc.). By the end of 2020, online access to this resource will be provided, so it will become possible to study Russian pragmatic markers online. The research is supported by the Russian Science Foundation (project #18-18-00242 “Pragmatic Markers in Russian Everyday Speech”).


Congreso Cilc 2021


Serie: CILC2021: Discurso, análisis literario y corpus / Discourse, literary analysis and corpora (+información)