Minorities languages and syntactic annotations of corpora
research experiences in scientific initiation
DOI:
https://doi.org/10.15448/1984-7726.2023.1.44734Keywords:
computational linguistics, descriptive linguistics, Tupían languages, universal dependencies, treebanks.Abstract
Many of the Brazilian indigenous languages are endangered. In most cases, revitalization and conservation strategies for these languages are essential (Crystal, 2002; Harrison, 2007), requiring continuous processes of promoting language policies and actions focused on indigenous school education. This article presents the use of linguistic tools associated with the construction of treebanks (corpora of texts with syntactic and morphological annotations) and the description of two minority indigenous languages belonging to the Tupían linguistic family spoken in the southwestern Amazon, Brazil. The treebanks, part of the Universal Dependencies project (De Marneffe et al., 2021; Duran et al., 2022), form the basis of experiments conducted in the Institutional Program for Scientific Initiation Scholarships at the Federal University of Paraíba (2021-2022), entitled "Education, Linguistics, History, and Indigenous Communities." We discuss the application of these tools in linguistic description, their relationship with the study of indigenous language typology. Furthermore, we explore the intersection of computational linguistics with descriptive linguistics.
Downloads
References
ARAGON, Carolina. A Grammar of Akuntsú, a Tupian language. 2014. Tese (Doctor of Philosophy in Linguistics) – University of Hawaii at Manoa, Honolulu, 2014. Disponível em: http://etnolinguistica.wdfiles.com/local--files/tese%3Aaragon2014/CarolinaAragonFinal.pdf. Acesso em: 29 set. 2022.
ARAGON, Carolina. Fonologia e aspectos morfológicos e sintáticos da língua Akuntsú. 2008. Dissertação (Mestrado em Linguística) –[ Departamento de Linguística, Português e Línguas Clássicas, Universidade de Brasília, Brasília (DF), 2008. Disponível em: https://repositorio.unb.br/bitstream/10482/5135/1/2008_CarolinaCoelhoAragon.pdf. Acesso em: 29 set. 2022.
ARAGON, Carolina; ALGAYER, Altair. A história contada pelos Akuntsú: ocupação territorial e perdas populacionais. Revista Brasileira de Linguística Antropológica, [S. l.], v. 12, n. 1, p. 223-234, 2020. Disponível em: https://periodicos.unb.br/index.php/ling/article/view/29633. Acesso em: 16 out. 2022.
ALTENHOFEN, Cléo V. Bases para uma política linguística das línguas minoritárias no Brasil. In: NICOLAIDES, C.; SILVA, K. A.; TÍLIO, R; ROCHA, C. H. (org.). Política e Políticas Linguísticas. Campinas: Pontes Editores, 2013. p. 93-116.
BRAGA, Alzerinda. Aspects morphosyntaxiques de la langue Makurap-tupi. 2005. Tese (Doctorat en Sciences du Langage) – Université de Toulouse - Le Mirail, Toulouse, 2005. Disponível em: http://www.etnolinguistica.org/tese:braga-2005. Acesso em: 16 out. 2022.
CRYSTAL, David. Language death. Cambridge University Press, 2002.
DE ALENCAR, Leonel Figueiredo. Yauti: A Tool for Morphosyntactic Analysis of Nheengatu within the Universal Dependencies Framework. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 14., 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023. p. 135-145. https://doi.org/10.5753/stil.2023.234131.
DE MARNEFFE, Marie-Catherine; MANNING, Chistopher; NIVRE, Joakim; ZEMAN, Daniel. Universal Dependencies. Computational linguistics, [S. l.], v. 47, n. 2, p. 255-308, jun. 2021.
DURAN, Magali Sanches et al. Manual de anotação como recurso de Processamento de Linguagem Natural: o modelo Universal Dependencies em língua portuguesa. Domínios de Lingu@ gem, [S. l.], v. 16, n. 4, p. 1608-1643, 2022.
ETHNOLOGUE. languages vitality count. Ethnologue: Languages of the World, 2023. Disponível em: https://www.ethnologue.com. Acesso em: 16 out. 2022.
FERRAZ GERARDI, Fabrício et al., TuDeT: Tupían Dependency Treebank. [S. l.], 19 May 2022. Zenodo. TuDeT: Tupían Dependency Treebank, 2022a.
FERRAZ GERARDI, Fabrício et al. TuLeD. Tupían Lexical Database. [S. l.], 23 May 2022. Zenodo. TuLeD: Tupían lexical database. Max Planck Institute for Evolutionary Anthropology: Leipzig, 2022b.
FREITAS, Cláudia. Linguística computacional. São Paulo: Parábola, 2022.
HARRISON, David. When languages die: The extinction of the world's languages and the erosion of human knowledge. Oxford University Press, 2007.
HAWKINS, John A. Efficiency and complexity in grammars. OUP Oxford, 2004.
HAWKINS, John A. Cross-linguistic variation and efficiency. OUP Oxford, 2014.
IPHAN. Inventário Nacional da Diversidade Linguística. In: Portal Iphan. [S. l.], c2014. Disponível em: http://portal.iphan.gov.br/indl. Acesso em: 1 out. 2022.
MAHER, Terezinha M. Ecos de resistência: políticas linguísticas e línguas minoritárias no Brasil. In: NICOLAIDES, C.; SILVA, K. A.; TÍLIO, R; ROCHA, C. H. (org.). Política e Políticas Linguísticas. Campinas: Pontes, 2013. p. 117-134.
MALDI, Denise. O complexo cultural do Marico: sociedades indígenas dos rios Branco, Colorado e Mequens, afluentes do Médio Guaporé. In: FURTADO, L. G. Boletim do Museu Paraense Emílio Goeldi, Belém, v. 7, n. 2, p. 209-269,1991.
MEZACASA, Roseline. Por histórias indígenas: o povo Makurap e o ocupar seringalista na Amazônia. 2021. Tese (Doutorado em História) – Universidade de Santa Catarina, Florianópolis, 2021. Disponível em: https://repositorio.ufsc.br/handle/123456789/226949. Acesso em: 29 set. 2022.
MOORE, Denny; GALUCIO, Ana Vilacy; GABAS JR., Nilson. O desafio de documentar e preservar as línguas amazônicas. Belém: Museu Paraense Emílio Goeldi, 2008.
MORELLO, Rosângela. Diversidade no Brasil: línguas e políticas sociais. Synergies Brésil, [S. l.], v. 7, p. 27-36, 2009. Disponível em: http://gerflint.fr/Base/Bresil7/bresil7.html. Acesso em: 16 out. 2022
NETTLE, Daniel; ROMAINE, Suzanne. Vanishing voices: The extinction of the world's languages. Oxford University Press on Demand, 2000.
NIVRE, Joakim; DE MARNEFFE, Marie-Cathetine; GINTER, Filip; HAJIC, Jan; MANNING, Chistopher; PYYSALO, Sampo; SCHUSTER, Sebastian; TYERS, Francis; ZEMAN, Daniel. Universal dependencies: An evergrowing multilingual treebank collection. European Language Resources and Evaluation, Marseille, v. 2, p. 4034-4043, maio, 2020. Disponível em: https://aclanthology.org/2020.lrec-1.497.pdf. Acesso em: 26 out. 2022.
RADEMAKER, Alexandre et al. Universal dependencies for Portuguese. In: INTERNATIONAL CONFERENCE ON DEPENDENCY LINGUISTICS, 4., 2017, Pisa. Proceedings [...]. Pisa: Linköping University Electronic Press, 2017. p. 197-206.
RAMOS, Alcida. Vivos, afinal! Povos indígenas do Brasil enfrentam o genocídio. In: Série Antropologia. Brasília: DAN/UnB, 2018. v. 461.
RODRIGUES, Aryon. As línguas indígenas no Brasil. In: RICARDO, F.; RICARDO, B. Povos Indígenas no Brasil. São Paulo: Instituto Socioambiental, 2006. p. 58-63.
RODRÍGUEZ, Lorena.; MERZHEVICH, Tatiana; SILVA, Wellington; TRESOLDI, Tiago; ARAGON, Carolina; GERARDI, Fabrício. Tupían Language Resources: Data, Tools, Analyses. In: ANNUAL MEETING OF THE ELRA/ISCA SPECIAL INTEREST GROUP ON UNDER RESOURCED LANGUAGES, 1., 2022, Marseille. Anais [...]. Paris: European Language Resources Association, 2022. p. 48-58.
SANTOS, Marcelo. ALGAYER, Altair. Índios Isolados do Vale do Corumbiara. Brasília: Fundação Nacional do Índio, 1995. (Relatório Técnico).
STORTO, Luciana. R. Línguas indígenas: tradição, universais e diversidade, São Paulo: Mercado de Letras, 2019.
THOMAS, Guillaume. Universal dependencies for mbyá guaraní. In: WORKSHOP ON UNIVERSAL DEPENDENCIES, 2019, 3., Paris. Anais […]. Paris: The Association for Computational Linguistics, 2019. p. 70-77.
TYERS, Francis; SHEYANOVA, Mariya; WASHINGTON, Jonathan. UD Annotatrix: An annotation tool for Universal Dependencies. In: INTERNATIONAL WORKSHOP ON TREEBANKS AND LINGUISTIC THEORIES, 16., 2018, Prague, Czech Republic. Anais [...]. Praga: Jan Hajič, 2017. p. 10-17.
WILKINSON, Mark; DUMONTIER, Michel; AALBERSBERG, Ijisbrand; APPLETON, Gabrielle; AXTON, Myles; BAAK, Arie; MONS, Barend. The FAIR guiding principles for scientific data management and stewardship. Scientific data, [S. l.], v. 3, n. 1, p. 1-9, 2016. Disponível em: https://www.nature.com/articles/sdata201618#citeas. Acesso em: 17 out. 2022.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Letras de Hoje
This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright
The submission of originals to Letras de Hoje implies the transfer by the authors of the right for publication. Authors retain copyright and grant the journal right of first publication. If the authors wish to include the same data into another publication, they must cite Letras de Hoje as the site of original publication.
Creative Commons License
Except where otherwise specified, material published in this journal is licensed under a Creative Commons Attribution 4.0 International license, which allows unrestricted use, distribution and reproduction in any medium, provided the original publication is correctly cited.