Identifying characteristics and difficulties when using gazetteers to annotate Geographical Named Entities

Authors

  • Afonso Xavier Canosa Universidade de Santiago de Compostela (Espanha)

DOI:

https://doi.org/10.21814/diacritica.5157

Keywords:

Geographical Named Entities, NERC, Toponyms, Corpus annotation, Historical corpus

Abstract

In order to annotate geographical named entities, gazetteers have to face ambiguities and contexts where the geographical value of a given expression is not clear. In this paper, an index of place names is used to examine the main problems encountered in the production of an annotated corpus of Mendes Pinto’s Pilgrimage. The difficulties found serve to classify the types of errors that occur when the place name is solved by simple string match and introduce criteria for the identification of geographical entities, a task that should precede and has a direct impact on the results obtained in an automatic annotation approach.

References

Amaral, D. O., Fonseca, E. B., Lopes, L. & Vieira, R. (2014). Comparative Analysis of Portuguese Named Entities Recognition Tools. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik (pp. 2554–2558). European Language Resources Association (ELRA). Available at: http://www.lrecconf.org/proceedings/lrec2014/pdf/513_Paper.pdf.

Canosa, A. X. (2017). Algumas interseções disciplinares na recuperação da geografia da Peregrinação de Fernão Mendes Pinto. Fluxos e Riscos, 2(1).

Canosa, A. X., Varela, X., Lema, P., Gamallo, P., Taboada, J. A. & Garcia, M. (2018). Uma utilidade para o reconhecimento de topónimos em documentos medievais. Linguamática, 11(1). DOI: https://doi.org/10.21814/lm.11.1.291

Gregory, I. N., Baron, A., Murrieta-Flores, P., Hardie, A. & Rayson, P. (2013). Geographical Text Analysis Mapping and spatially analysing corpora. In A. Hardie, & R. Love (Eds.), Corpus Linguistics 2013 Abstracts (pp. 105–108). UCREL. Available at: http://ucrel.lancs.ac.uk/cl2013/doc/CL2013-ABSTRACT-BOOK.pdf.

Gregory, I. N., Cooper, D. C., Hardie, A. & Rayson, P. (2015). Spatializing and Analyzing Digital Texts: Corpora, GIS, and Places. In D. J. Bodenhamer, J. Corrigan, and T. M. Harris (Eds.), Deep Maps and Spatial Narratives. Bloomington: Indiana University Press. Available at: http://e-space.mmu.ac.uk/579357/2/Spatializing%20and%20Analyzing%20Digital%20Texts.pdf.

Leidner, J. L. (2007). Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names (PhD Thesis, University of Edinburgh). Available at: https://www.era.lib.ed.ac.uk/handle/1842/1849.

Nadeau, D. & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3–26. Available at: http://nlp.cs.nyu.edu/sekine/papers/li07.pdf. DOI: https://doi.org/10.1075/li.30.1.03nad

Santos, D. & Cardoso, N. (Eds.). (2007). Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM a primeira avaliação conjunta na área. Linguateca 2007. Available at: http://comum.rcaap.pt/bitstream/10400.26/380/1/LivroSantosCardoso2007.pdf.

Southall, H., Mostern, R. & Berman, M. L. (2011). On historical gazetteers. International Journal of Humanities and Arts Computing, 5(2), 127–145. DOI: https://doi.org/10.3366/ijhac.2011.0028

Won, M., Murrieta-Flores, P. & Martins, B. (2018). Ensemble Named Entity Recognition (NER): Evaluating NER Tools in the Identification of Place Names in Historical Corpora. Frontiers in Digital Humanities, 5(2). doi: https://doi.org/10.3389/fdigh.2018.00002. DOI: https://doi.org/10.3389/fdigh.2018.00002

Albuquerque, L. (Dir.). (1994). Dicionário de História dos Descobrimentos Portugueses. 2 vols. Lisboa: Caminho.

Alves, J. S. (Dir.). (2010). Fernão Mendes Pinto and the Peregrinação. 4 vols. Lisboa: Fundação Oriente.

Bluteau, R. C. R. (1712–28). Vocabulario portuguez e latino, aulico, anatomico, architectonico, bellico, botanico, brasilico, comico, critico, chimico, dogmatico, dialectico, dendrologico, ecclesiastico, etymologico, economico, florifero, forense, fructifero... Coimbra, Portugal: Collegio das Artes da Companhia de Jesus. Digital facsimile edition: Biblioteca Nacional de Portugal. Available at: http://purl.pt/13969.

Flores, A. M., Gomes, R. V. & R. H. Pereira de Sousa. (1983). Fernão Mendes Pinto. Subsídios para a sua Bio-Bibliografia. [Almada]: Câmara Municipal da Almada.

Lagoa, V. (1950–53). Glossário Toponímico da Antiga Historiografia Portuguesa Ultramarina. 4 vols. Lisboa: Junta de Investigações Coloniais.

Pereira, B. (1647). Thesouro da Lingoa Portugueza. Lisboa: Paulo Craesbecck. Digital facsimile edition: Biblioteca Nacional de Portugal. Available at: http://purl.pt/29129.

Pinto, F. M. (1614). Peregrinaçam. Lisboa: Pedro Crasbeek. Digital facsimile edition: Biblioteca Nacional de Portugal. Available at: http://purl.pt/82.

Published

2020-03-24

How to Cite

Canosa, A. X. (2020). Identifying characteristics and difficulties when using gazetteers to annotate Geographical Named Entities. Diacrítica, 32(3), 87–103. https://doi.org/10.21814/diacritica.5157