Minorities languages and syntactic annotations of corpora

research experiences in scientific initiation

Authors

DOI:

https://doi.org/10.15448/1984-7726.2023.1.44734

Keywords:

computational linguistics, descriptive linguistics, Tupían languages, universal dependencies, treebanks.

Abstract

Many of the Brazilian indigenous languages are endangered. In most cases, revitalization and conservation strategies for these languages are essential (Crystal, 2002; Harrison, 2007), requiring continuous processes of promoting language policies and actions focused on indigenous school education. This article presents the use of linguistic tools associated with the construction of treebanks (corpora of texts with syntactic and morphological annotations) and the description of two minority indigenous languages belonging to the Tupían linguistic family spoken in the southwestern Amazon, Brazil. The treebanks, part of the Universal Dependencies project (De Marneffe et al., 2021; Duran et al., 2022), form the basis of experiments conducted in the Institutional Program for Scientific Initiation Scholarships at the Federal University of Paraíba (2021-2022), entitled "Education, Linguistics, History, and Indigenous Communities." We discuss the application of these tools in linguistic description, their relationship with the study of indigenous language typology. Furthermore, we explore the intersection of computational linguistics with descriptive linguistics.

 



Downloads

Download data is not yet available.

Author Biographies

Luana Luiza Santos, Universidade Federal da Paraíba (UFPB), João Pessoa, PB, Brasil.

Graduating in Literature – Portuguese from the Federal University of Paraíba (UFPB), in João Pessoa, PB, Brazil.

Carolina Coelho Aragon, Universidade Federal da Paraíba (UFPB), João Pessoa, PB, Brasil.

PhD from the University of Hawaii, in Honolulu, HI, United States; Master's degree from the University of Brasília, DF, Brazil. Adjunct Professor at the Department of Portuguese Language and Linguistics at the Federal University of Paraíba (UFPB), João Pessoa, PB, Brazil.

Fabrício Gerardi, Eberhard Karls Universität Tübingen, Tübingen, Alemanha.

PhD in Linguistics and master in Computational Linguistics from the University of Tübingen, Germany. He also has a master's degree in Hebrew from the University of São Paulo (USP), in São Paulo, Brazil. Professor and researcher at the University of Tübingen, in Tübingen, Germany.

References

ARAGON, Carolina. A Grammar of Akuntsú, a Tupian language. 2014. Tese (Doctor of Philosophy in Linguistics) – University of Hawaii at Manoa, Honolulu, 2014. Disponível em: http://etnolinguistica.wdfiles.com/local--files/tese%3Aaragon2014/CarolinaAragonFinal.pdf. Acesso em: 29 set. 2022.

ARAGON, Carolina. Fonologia e aspectos morfológicos e sintáticos da língua Akuntsú. 2008. Dissertação (Mestrado em Linguística) –[ Departamento de Linguística, Português e Línguas Clássicas, Universidade de Brasília, Brasília (DF), 2008. Disponível em: https://repositorio.unb.br/bitstream/10482/5135/1/2008_CarolinaCoelhoAragon.pdf. Acesso em: 29 set. 2022.

ARAGON, Carolina; ALGAYER, Altair. A história contada pelos Akuntsú: ocupação territorial e perdas populacionais. Revista Brasileira de Linguística Antropológica, [S. l.], v. 12, n. 1, p. 223-234, 2020. Disponível em: https://periodicos.unb.br/index.php/ling/article/view/29633. Acesso em: 16 out. 2022.

ALTENHOFEN, Cléo V. Bases para uma política linguística das línguas minoritárias no Brasil. In: NICOLAIDES, C.; SILVA, K. A.; TÍLIO, R; ROCHA, C. H. (org.). Política e Políticas Linguísticas. Campinas: Pontes Editores, 2013. p. 93-116.

BRAGA, Alzerinda. Aspects morphosyntaxiques de la langue Makurap-tupi. 2005. Tese (Doctorat en Sciences du Langage) – Université de Toulouse - Le Mirail, Toulouse, 2005. Disponível em: http://www.etnolinguistica.org/tese:braga-2005. Acesso em: 16 out. 2022.

CRYSTAL, David. Language death. Cambridge University Press, 2002.

DE ALENCAR, Leonel Figueiredo. Yauti: A Tool for Morphosyntactic Analysis of Nheengatu within the Universal Dependencies Framework. In: SIMPÓSIO BRASILEIRO DE TECNOLOGIA DA INFORMAÇÃO E DA LINGUAGEM HUMANA (STIL), 14., 2023, Belo Horizonte/MG. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2023. p. 135-145. https://doi.org/10.5753/stil.2023.234131.

DE MARNEFFE, Marie-Catherine; MANNING, Chistopher; NIVRE, Joakim; ZEMAN, Daniel. Universal Dependencies. Computational linguistics, [S. l.], v. 47, n. 2, p. 255-308, jun. 2021.

DURAN, Magali Sanches et al. Manual de anotação como recurso de Processamento de Linguagem Natural: o modelo Universal Dependencies em língua portuguesa. Domínios de Lingu@ gem, [S. l.], v. 16, n. 4, p. 1608-1643, 2022.

ETHNOLOGUE. languages vitality count. Ethnologue: Languages of the World, 2023. Disponível em: https://www.ethnologue.com. Acesso em: 16 out. 2022.

FERRAZ GERARDI, Fabrício et al., TuDeT: Tupían Dependency Treebank. [S. l.], 19 May 2022. Zenodo. TuDeT: Tupían Dependency Treebank, 2022a.

FERRAZ GERARDI, Fabrício et al. TuLeD. Tupían Lexical Database. [S. l.], 23 May 2022. Zenodo. TuLeD: Tupían lexical database. Max Planck Institute for Evolutionary Anthropology: Leipzig, 2022b.

FREITAS, Cláudia. Linguística computacional. São Paulo: Parábola, 2022.

HARRISON, David. When languages die: The extinction of the world's languages and the erosion of human knowledge. Oxford University Press, 2007.

HAWKINS, John A. Efficiency and complexity in grammars. OUP Oxford, 2004.

HAWKINS, John A. Cross-linguistic variation and efficiency. OUP Oxford, 2014.

IPHAN. Inventário Nacional da Diversidade Linguística. In: Portal Iphan. [S. l.], c2014. Disponível em: http://portal.iphan.gov.br/indl. Acesso em: 1 out. 2022.

MAHER, Terezinha M. Ecos de resistência: políticas linguísticas e línguas minoritárias no Brasil. In: NICOLAIDES, C.; SILVA, K. A.; TÍLIO, R; ROCHA, C. H. (org.). Política e Políticas Linguísticas. Campinas: Pontes, 2013. p. 117-134.

MALDI, Denise. O complexo cultural do Marico: sociedades indígenas dos rios Branco, Colorado e Mequens, afluentes do Médio Guaporé. In: FURTADO, L. G. Boletim do Museu Paraense Emílio Goeldi, Belém, v. 7, n. 2, p. 209-269,1991.

MEZACASA, Roseline. Por histórias indígenas: o povo Makurap e o ocupar seringalista na Amazônia. 2021. Tese (Doutorado em História) – Universidade de Santa Catarina, Florianópolis, 2021. Disponível em: https://repositorio.ufsc.br/handle/123456789/226949. Acesso em: 29 set. 2022.

MOORE, Denny; GALUCIO, Ana Vilacy; GABAS JR., Nilson. O desafio de documentar e preservar as línguas amazônicas. Belém: Museu Paraense Emílio Goeldi, 2008.

MORELLO, Rosângela. Diversidade no Brasil: línguas e políticas sociais. Synergies Brésil, [S. l.], v. 7, p. 27-36, 2009. Disponível em: http://gerflint.fr/Base/Bresil7/bresil7.html. Acesso em: 16 out. 2022

NETTLE, Daniel; ROMAINE, Suzanne. Vanishing voices: The extinction of the world's languages. Oxford University Press on Demand, 2000.

NIVRE, Joakim; DE MARNEFFE, Marie-Cathetine; GINTER, Filip; HAJIC, Jan; MANNING, Chistopher; PYYSALO, Sampo; SCHUSTER, Sebastian; TYERS, Francis; ZEMAN, Daniel. Universal dependencies: An evergrowing multilingual treebank collection. European Language Resources and Evaluation, Marseille, v. 2, p. 4034-4043, maio, 2020. Disponível em: https://aclanthology.org/2020.lrec-1.497.pdf. Acesso em: 26 out. 2022.

RADEMAKER, Alexandre et al. Universal dependencies for Portuguese. In: INTERNATIONAL CONFERENCE ON DEPENDENCY LINGUISTICS, 4., 2017, Pisa. Proceedings [...]. Pisa: Linköping University Electronic Press, 2017. p. 197-206.

RAMOS, Alcida. Vivos, afinal! Povos indígenas do Brasil enfrentam o genocídio. In: Série Antropologia. Brasília: DAN/UnB, 2018. v. 461.

RODRIGUES, Aryon. As línguas indígenas no Brasil. In: RICARDO, F.; RICARDO, B. Povos Indígenas no Brasil. São Paulo: Instituto Socioambiental, 2006. p. 58-63.

RODRÍGUEZ, Lorena.; MERZHEVICH, Tatiana; SILVA, Wellington; TRESOLDI, Tiago; ARAGON, Carolina; GERARDI, Fabrício. Tupían Language Resources: Data, Tools, Analyses. In: ANNUAL MEETING OF THE ELRA/ISCA SPECIAL INTEREST GROUP ON UNDER RESOURCED LANGUAGES, 1., 2022, Marseille. Anais [...]. Paris: European Language Resources Association, 2022. p. 48-58.

SANTOS, Marcelo. ALGAYER, Altair. Índios Isolados do Vale do Corumbiara. Brasília: Fundação Nacional do Índio, 1995. (Relatório Técnico).

STORTO, Luciana. R. Línguas indígenas: tradição, universais e diversidade, São Paulo: Mercado de Letras, 2019.

THOMAS, Guillaume. Universal dependencies for mbyá guaraní. In: WORKSHOP ON UNIVERSAL DEPENDENCIES, 2019, 3., Paris. Anais […]. Paris: The Association for Computational Linguistics, 2019. p. 70-77.

TYERS, Francis; SHEYANOVA, Mariya; WASHINGTON, Jonathan. UD Annotatrix: An annotation tool for Universal Dependencies. In: INTERNATIONAL WORKSHOP ON TREEBANKS AND LINGUISTIC THEORIES, 16., 2018, Prague, Czech Republic. Anais [...]. Praga: Jan Hajič, 2017. p. 10-17.

WILKINSON, Mark; DUMONTIER, Michel; AALBERSBERG, Ijisbrand; APPLETON, Gabrielle; AXTON, Myles; BAAK, Arie; MONS, Barend. The FAIR guiding principles for scientific data management and stewardship. Scientific data, [S. l.], v. 3, n. 1, p. 1-9, 2016. Disponível em: https://www.nature.com/articles/sdata201618#citeas. Acesso em: 17 out. 2022.

Published

2024-01-11

How to Cite

Luiza Santos, L., Coelho Aragon, C., & Gerardi, F. (2024). Minorities languages and syntactic annotations of corpora: research experiences in scientific initiation. Letras De Hoje, 59(1), e44734. https://doi.org/10.15448/1984-7726.2023.1.44734

Most read articles by the same author(s)