Idioma: Español
Fecha: Subida: 2021-04-13T00:00:00+02:00
Duración: 18m 37s
Lugar: Conferencia
Visitas: 1.318 visitas

Per Aspera Ad Astra: The Compilation Process of the First Turkish Learner Corpus

Anna Golynskaia (Istanbul University-Cerrahpaşa)

Descripción

Despite its rather long history, learner corpus research can be said to be an entirely new field of
linguistic studies in Turkey. The literature review shows that all of the currently existing corpora
illustrate the use of Turkish as a mother tongue. One of the first Turkish corpora is METU (Middle
East Technical University) Corpus, which is an offline parsed corpus of two million words (Say
et al., 2002). Another one is TNC (Turkish National Corpus), designed to be a balanced, large
scale (50 million words) and general-purpose corpus of contemporary Turkish (Aksan et al., 2012,
s. 3223). Besides the above-mentioned corpora, there are BOUN Corpus, TS Corpus, and Spoken
Turkish Corpus, all of them being native speaker corpora (Sak et al., 2011; Sezer&Sezer, 2013;
Ruhi et al., 2010). Although teaching Turkish as a foreign language is gaining momentum both
inside and outside Turkey, there are no examples of either written or spoken Turkish learner
corpora. The only theoretical research conducted in this field is the article titled “Learner Corpora:
Scope, Design, and Applications” by Çalışkan (2016).
The aim of our study is to compile a corpus of advanced learners of Turkish coming from different
linguistic backgrounds. The sample of our study was identified in two stages. In the first stage,
Turkish Teaching Centers willing to share the writing part of Turkish Proficiency Exam and/or C1
and C2 exam papers were determined by voluntary sampling method. For this purpose, an official
request was sent to thirty-seven Turkish Teaching Centers of both public and private universities
in Turkey. Data was collected from eight universities that agreed to take part in the study. Since
this is an individual research project, it took six months to collect and keyboard the texts. In the
upcoming second stage, texts to be included in the core corpus will be selected, error annotation
will be done, and the error-annotated data will be uploaded to Sketch Engine. Thus, Turkish
Learner Corpus is going to be a small (up to 300,000 words), error-annotated corpus of texts
written by advanced Turkish learners.
Alongside with keywording the texts, for the subsequent analysis of data the technical team of the
Sketch Engine platform was contacted and information about the error annotation systems was
obtained. In addition, the error annotation systems developed for the Cambridge Learner Corpus
and the International Corpus of Learner English were investigated and codes that could be used in
the Turkish Learner Corpus were determined. Preliminary error tags were divided into 4
categories: spelling, morphology, syntax, and vocabulary. However, as our study progresses, other
categories may be added as well. The completed version of the corpus is expected to be released
in summer 2021.

Propietarios

Congreso Cilc 2021

Comentarios

Nuevo comentario

Serie: CILC2021: Diseño, compilación y tipos de corpus / Corpus design, compilation and types (+información)