![]() ![]() This is due to the fact that film dialogue very much resembles spoken language (Bednarek, 2010 Forchini, 2012 Quaglio, 2009 Valdeón, 2009) as it represents a kind of language that is “written to be spoken as if not written” (Gregory, 1967: 191-192). ![]() The POS-tagging exclusively relies on word-class assignment based on morpho-syntactic criteria, whereas the nature of film dialogue requires the pragmatic dimension to be taken into account. Penn Treebank, Stanford POS tagger, CLL-Tagger) since it was freely accessible through the online interface and common to other reference corpora of English such as the BNC and the corpora available through such as COCA and COHA, therefore convenient for the sake of comparability in future studies about film dialogue and spoken language.īeing the PCFD a corpus of orthographically transcribed film dialogues, we expected a certain degree of problematicity in dealing with the tagging of the texts. The POS-tagger CLAWS4 was selected among the available software (e.g. ![]() In wanting to give the opportunity of automatic POS search through the 32 film dialogues collected for the anglophone section of the Pavia Corpus of Film Dialogue Footnote 1 (PCFD henceforth), we chose to conduct a pilot study on the dialogues of the film Thelma & Louise (Ridley Scott, 1991), which at the time was the latest film to be added to the corpus. nouns, pronouns, verbs, adverbs) and combinations of them (e.g. The usefulness of POS tagging lies in the automatisation, thus, the speeding up of research for specific word classes (e.g. POS taggers built upon machine learning algorithms, such as SVM (Giménez & Marquez, 2004) and neural networks (Schmid, 1994), are very powerful however, many machine learning algorithms are not interpretable, which means that it is not possible to understand what motivated the POS tagger’s choices. Statistical POS taggers work by finding the sequence of POS tags that most likely fits the input sentences by means of hidden Markov models (Brants, 2000 Carlberger & Kann, 1999 Cutting et al., 1992) or entropy maximization (Ratnaparkhi, 1996, Toutanova and Manning 2000). While powerful enough to achieve high accuracy on benchmark datasets, rule-based taggers show inherent limitations in uncontrolled experimental environments, due to the lack of comprehension of the context and to the rigidity towards unexpected cases. Rule-based approaches are especially suitable for building multilingual and non-English taggers (Garg et al., 2012, Megyiesi, 1998, Rashel et al., 2014), which often cannot benefit from annotated corpora: any additional language requires a specific set of rules, yet neither data nor training are needed. The ruleset is often coupled with a set of constraints the tagger must follow, e.g., an article cannot be followed by another article. ![]() Rule-based POS taggers (Brill, 1992, 1994 Sadredini et al., 2018) rely on a set of deterministic transformation rules, such as the association of a word to a POS. Traditional methodologies involve rule-based and statistical POS taggers, and more recently machine learning algorithms. Part-of-speech tagging (POS tagging henceforth) is the process of assigning a sequence of tags to a sequence of words in order to mark word classes. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |