Idioma: Español
Fecha: Subida: 2021-04-15T00:00:00+02:00
Duración: 22m 35s
Lugar: Conferencia
Visitas: 1.226 visitas

Identifying formal markers of sarcasm on Twitter: #CatsMovie vs. #TheRiseOfSkywalker

Antonio Moreno-Ortiz, María García-Gámez y Chantal Pérez-Hernández (Universidad de Málaga)

Descripción

Over the last decade sentiment analysis has received increased attention as a Natural Language Processing task that attempts to automatically classify the semantic orientation of a sentence or document. Although the methodologies employed have grown in number and sophistication, from simple rule-based system to machine learning and neural networks, the fact that the text is the only data source for the analysis means that higher-order means of expression, such as figurative language and rhetorical figures, pose a severe problem whose solution is far from trivial. Among such linguistic mechanisms, irony and sarcasm have been recognized as particularly hard to tackle (Riloff et al., 2013; Barbieri et al., 2014; Van Hee, 2017). Furthermore, the role that such tropes play in sentiment analysis tasks can go from close to anecdotal, as in consumer product reviews, where the vast majority of users employ a fairly straightforward language, to highly recurrent in social networks (Bouazizi & Ohtsuki, 2015).
Irony is a form of figurative language through which the literal meaning of a sentence is substituted by the opposite, and sarcasm is considered to be a subtype of verbal irony that conveys a negative attitude in a more aggressive, bitter and offensive way than irony (Attardo, 2000; Kreuz & Roberts, 1993; Sperber & Wilson, 1981; Wilson, 2013). Although the aforementioned terms have often been used interchangeably in the literature, their entailments are not the same: sarcasm is used for purposes such as being funny, expressing annoyance, or avoiding giving a clear answer. Furthermore, although sarcasm invariably has a negative implied sentiment, it may carry a negative surface sentiment, positive surface sentiment, or no surface sentiment at all (Joshi, Bhattacharyya, & Carman, 2017).
Different theories have attempted to explain the phenomenon of irony from a linguistic perspective, such as Grice’s (1975) conversational implicature theory, Wilson and Sperber’s (1992) echoic mention theory, Clark and Gerrig’s (1984) pretense theory, Giora’s (1997) graded salience hypothesis, and the cognitive theories proposed by Ruiz de Mendoza (2011). Nevertheless, none of these theories have yet proved useful to identify the formal mechanisms that are required for automatic sarcasm detection.
Accurate automatic sarcasm detection is highly dependent on the availability of high-quality annotated datasets. However, most existing datasets have been created in a semi-supervised manner, by simply annotating the presence of irony or sarcasm due to the presence of a hashtag, such as "#sarcasm". In this paper we describe the creation of a manually annotated dataset where detailed text markers are included. This dataset is a sample from a larger corpus of tweets (n=100,000) on two highly awaited and potentially controversial motion pictures released almost simultaneously (Cats and Star Wars: The Rise of Skywalker). We took two different samples for each movie, the first before and during their premieres, and the second during the weeks after. We compared reception between these samples using the following procedure: first, we used a lexicon-based sentiment analysis tool and then we manually measured the impact that the lack of sarcasm detection strategies had. In doing so, we generated a high-quality dataset where formal markers of sarcasm are identified, and, when these were not present, the sarcasm generation mechanism (e.g., contrast with reality) was made explicit. We believe the resulting annotated dataset can be extremely useful for supervised machine learning approaches to sarcasm detection.

Propietarios

Congreso Cilc 2021

Comentarios

Nuevo comentario

Serie: CILC2021: Lingüística computacional basada en corpus / Corpus-based computational linguistics (+información)