Lexical Bundles across levels of Proficiency in Portuguese as a Second Language : an examination of bundle function

Recebido: 05/6/2020 Aprovado: 07/12/2020 Publicado: 09/02/2021 Abstract: Formulaic sequences are known for being measures of foreign language fluency for learners. Research in language processing suggests that native speakers as well as learners process these sequences as a single word (ELLIS, 1996). Nevertheless, little is known about the use of formulaic sequences in Portuguese and, even fewer studies have examined the use of formulaic sequences in learners of Portuguese. Therefore, in this study, we sought to investigate the textual function of lexical bundles extracted from a corpus of learners of Portuguese as a Second Language (PSL). Lexical bundles are sequences of three or more words that occur with larger than expected frequency in a specific corpus. In this study, we used corpus linguistics tools to extract lexical bundles that occur frequently at two levels of proficiency – beginner and intermediate – in Portuguese. These bundles were, then, classified according to their textual function. Results indicate that beginner level students use more bundles associated with concrete references, while intermediate learners use more bundles associated with textual organization and stance. This study contributes to the description of Portuguese acquisition at these two levels of proficiency. In addition, the results can foster classroom activities with which the PSL teachers introduce new functions of lexical bundles to students. Finally, we hope that this study motivates more research describing the language used at different stages of Portuguese acquisition.


Introduction
There has been extensive research on formulaic sequences (WRAY, 2013), especially on how important and difficult they are to learners of any foreign language, regardless of their proficiency level (PAQUOT; GRANGER, 2012). Under the overarching term formulaic language, we find several different instances of words sequences, such as, collocations, idioms, lexical phrases, and lexical bundles, the latter being the object of study of the present investigation.
Considering that mastering formulaic sequences -including lexical bundles -is intimately related to language proficiency, it is imperative to understand how language learners use these linguistic features across levels of development. However, we know very little about what linguistic patterns, namely lexical bundles (LBs), learners of Portuguese use since most studies examining the use of LB have described the use of these structures across levels in English as a second language (L2).

Ferreira (2014) has investigated how LBs in
Portuguese appear in textbooks, Sardinha, Teixeira and Ferreira (2014) have focused on LBs in different registers, and Goulart (in press) has analyzed their structure. Nevertheless, these studies are scarce, thus, the urgent need for further exploring LBs in Portuguese. Differently from Goulart (in press), who has analyzed the structural patterns across levels of development, this study focuses on the functional patterns of the LBs previously found in that study and relates both the structure and function of these sequences of words. Having said that, it is our hope to contribute to a further understanding of both structure and function of LBs in Portuguese.
This study is divided into five sections, being this introduction the first one, followed by a description of what lexical bundles are and some findings of previous research on the topic. Then, on section three, the corpus is described, as well as the methods. The results accompanied by the discussion are presented in section four, and the fifth and last section is dedicated to the conclusion. Biber et al. (1999)  On one hand, Chen and Baker (2016) found that learners at lower levels of proficiency tend to use more bundles associated with conversation. A similar pattern was found in Staples et al. (2013), for whom lower-level learners use bundles more frequently than their more advanced counterparts, but these bundles are used in the prompts Few studies have investigated lexical bundles in languages other than English. Tracy-Ventura, Cortes and Biber (2007)  1) What differences, if any, are there in the types and tokens of lexical bundles in beginner and intermediate levels?

Lexical bundles
2) To what extent do the functions of the bundles extracted vary at each level of proficiency?

The Corpus of Written Productions of Portuguese as a Second Language
In order to answer the research questions posed was excluded from the analysis due to its small size. In addition, texts with less than 100 words were excluded from the analysis.   Table 2 shows that most of the texts in the corpus were written as a response to texts related to the individual. Nevertheless, environment related topics become more frequent at the intermediate level.
In this section, the corpus and subcorpora used for the analysis were described. In the following section, the method for bundle identification and classification will be presented in detail.

Bundle extraction
This study draws on previous findings of a research examining learner language development in lexical bundles (see GOULART, in press).
Therefore, bundle size and bundle extraction followed this previous investigation. Three-word bundles were selected as the most appropriate bundle size due to the fact that these are short texts, varying from 100 to 600 words. In addition, upon initial analysis, it was determined that fourword bundles resulted in variable slots at the final bundle position (eu gosto de *); thus, three-word bundles resulted in the same grammatical and functional information as four-word bundles.
For extraction criteria, the researchers piloted different solutions, in order to guarantee that these bundles were representative of the two levels being investigated. Tracy-Ventura, Cortes and Biber (2007)  analysis. In addition, for the purposes of this study, dispersion was more critical than frequency. When examining the patterns of language development, the researchers wanted to guarantee that the bundles found were representative of that level, rather than on the learner's idiolect. Therefore, bundles had to occur in at least 5% of the texts in each subcorpora in order to be extracted. This guaranteed that the bundles had a frequency of at least 12 occurrences in each subcorpora, without compromising the number of bundles extracted.
Bundles were extracted using the n-gram function on Antconc. After bundle extraction, their raw frequency was normalized by a thousand.

Bundle classification
This study seeks to explore specifically how bundle functions vary across two levels of proficiency in Portuguese. Previous studies had already examined structural development but lacked an analysis of functional development along with a correlation between function and form. Even though it is not the focus of this study, bundle structure was classified according to the categories presented in Table 3. It is worth noting that, although Hyland's (2008) and Biber, Cortes and Conrad's (2004) categories have been thoroughly used in previous studies, a functional taxonomy should emerge from the bundles found in the corpus, rather than imposed on the data. After an initial survey of the data, the following functional taxonomy was created for the bundles extracted in this corpus.  section, we will briefly introduce the results of the structural patterns found across levels. Then, the functional patterns for each level will be discussed and compared. Finally, the relationship between functional and structure will be examined.

The structural types of lexical bundles across levels
As explained in the section above, the structural classification used in a previous investigation of the same corpus was adapted to combine the A1 and A2 corpus into our beginner corpus, and the B1 and B2 corpus in our intermediate corpus.     While these are appropriate forms to respond to the prompt, we can see in Excerpt 3b how an advanced student responds to the same prompt.      In Excerpt 8, we can see that instead of using the verb gostar to express preferences, students use na minha opinião. That is, we see an increase in the repertoire of devices students use to indicate preferences. We can also notice that the use of place referential bundles might be an outcome of the writing prompt "do you like to live in the city?".

The relationship between forms and function across levels
In this section, we examine the possible relationship between form and function at these two levels of proficiency. For this comparison, we have combined all referential bundles into a single variable. In addition, we only considered bundle type. Figure 4 illustrates the patterns found for beginner levels.