Idioma: Español
Fecha: Subida: 2021-04-23T00:00:00+02:00
Duración: 20m 27s
Lugar: Conferencia
Visitas: 1.117 visitas

USAS semantic annotation for single and multiword financial terms: method and strategies used

Chelo Vargas Sierra (Universidad de Alicante) y Antonio Moreno Sandoval (Universidad Autónoma de Madrid)

Descripción

Automatic semantic annotation for general lexical units is not a simple task since the tagger needs to discriminate their specific meaning in cases of polysemy. Dealing with specialized lexical units can be even harder since the system needs to identify their accurate meaning, which may be also challenging for subject-field experts and linguists. The process is even tougher when working with the social sciences domain, in which terms are frequently created by taking words from the general language and assigning them a new specialized meaning (terminologization), so, formally, the tagger cannot find differences between a word and a term.
This paper will present the work carried out within the context of FinT-esp corpus (22 million words) concerning the semantic annotation of single and multiword financial terms. The semantic tagset used was USAS (Rayson et al. 2004), which was originally designed to annotate general words in English, but it has been extended to cover other languages, including Spanish (Piao et al. 2015). Our particular purpose was to use it for financial terms. This annotation could not be done automatically since we needed to feed the system with a financial lexicon first to be able to be trained. For that purpose, we carried out a manual semantic annotation with USAS tagset for terms automatically extracted from FinT-esp corpus. 182 single terms and 497 multiword terms have been semantically annotated so far. This paper will describe the annotation process and strategies used to increase the consistency of the semantic annotation for our financial corpus.
The semantically annotated resulting corpus could be useful for several purposes. First, within the context of the FinT-esp project, we aim to contribute to the development of existing methodologies concerning subject-field corpus processing in computational linguistics. Second, we intend to help finance and accounting fields in the processing, classification and analysis of large amounts of financial text. Third, our project aims to develop a semantically based term extractor.

Propietarios

Congreso Cilc 2021

Comentarios

Nuevo comentario

Serie: CILC2021: Lingüística computacional basada en corpus / Corpus-based computational linguistics (+información)

Relaccionados