Event-duration semantics in online sentence processing

Several experiments in Psycholinguistics found evidences of Iterative Coercion, an effect related to the reanalysis of punctual events used in durative contexts triggering an iterative meaning. We argue that this effect is not related to aspectual features, and that event-duration semantics is accessed during Sentence Processing. We ran a self-paced reading experiment in Brazilian Portuguese whose sentences contain events with an average duration of a few minutes. These sentences were inserted in durative contexts that became the experiment’s conditions following a Latin Square design: control condition (minutes), subtractive (seconds), iterative (hours) and habitual (days). Higher RTs were measured at the critical segments of all experimental conditions, except for the habitual context. The results corroborated our hypothesis while defying the psychological reality of habitual coercion. To better observe the habitual coercion condition, we now present a reanalysis of Sampaio et al. (2014) data. The present analysis confirms the results of our tests.


Introduction
Much before the glamorous Hollywood release of Dennis Villeneuve's 'The Arrival' (2017) tried to establish a link between language and time, several works have been produced by linguists and psychologists aiming at identifying time properties in the cognitive domain of language (REICHEMBACH, 1947;ALFORD 1981;HORNSTEIN, 1993;MANI et al. 2005;COLL-FLORIT;GENNARI, 2011;SAMPAIO, 2015;2016;FABER;GENNARI, 2015; GAUTHIER; VAN WASSENHOVE, 2016BYLUND;ATHANASOPOULOS, 2017). Language forms complex objects that relate form and meaning. It also seems to be inextricably connected to time since speech becomes acoustic waves that are linearly or serially absorbed by the receptor. This paper aims at testing the hypothesis that the semantics of time affects online sentence processing, based on studies of the so-called aspectual coercion effect (SAMPAIO; FRANÇA; MAIA, 2014;SAMPAIO, 2015). It is a covert mechanism of linguistic system that allows us to understand the difference between two durative sentences having the same structure but different verbal aspects. Consider the following examples.
(1) a. The child cried for two minutes b. The child sneezed for two minutes (1a) has a durative event composed of a durative verb (to cry) in a durative context (for two minutes). In this example, both verb and context share the same aspectual properties of being a single durative crying event. On the other hand, (1b) is composed by a punctual verb (to sneeze) and a durative context. The result is an aspectual mismatch that triggers a multiple event reading in which 'sneezes' can last for two minutes. This effect is Aspectual Coercion. In this trajectory, Pustejovsky (1995), Jackendoff (1997), De Swart (1998 proposed the coercion hypothesis. De Swart (1998) define Aspectual Coercion in the following terms: Typically, coercion is triggered if there is a conflict between the aspectual character of the eventuality description and the aspectual constraints of some other element in the context. The felicity of an aspectual reinterpretation is strongly dependent on linguistic context and knowledge of the world (DE SWART, 1998, p. 360). Dölling (2014), however, defines coercion in different terms. He proposes his own Event Classification in which verbs have no fixed classes and pragmatic can change the aspectual properties of an event depending of its context. Sampaio et al. (2014) and Sampaio (2015;2016) follow a different logic for the explanation of coercion effects, linking sentence processing and time perception studies in cognitive psychology. When one reads a sentence, the average subjective duration of this event is activated. If the sentence has an overt duration (eg. for some minutes) and sentence duration has a mismatch with event duration, we can observe the coercion effect on sentence processing experiments.
In this paper, we present a self-paced reading test to investigate the psychological reality of Sampaio and colleague's proposal. Our first experiment, described in the section 3, presents evidences that aspectual coercion can be observed using durative events in different durative contexts. In the section 4, we present a reanalysis of Sampaio, França & Maia (2014) in which we had reported different and inconsistent effects.

Aspectual Coercion
In the last century, Descriptive Linguistics identified some properties related to the temporal frame of events, namely telicity, progression and duration. For instance, encoded in the temporal frame of events might be the existence of an inherent endpoint (telicity; 'eat an apple', the apple measures the event), its progression (gerunds: 'eating an apple') or its duration (punctual or durative: 'to sneeze' × 'to eat'). These properties are the ingredients of Aspect and they are present both in the verb meaning (the meaning of sing in 'John sang a song') and in the adjoined temporal context ('for three minutes'). Aspectual mismatches between the verb and its temporal context lead to Aspectual Coercion (PUSTEJOVSKY, 1995;JACKENDOFF, 1997;DE SWART, 1998).
The most likely hypothesis for the existence of aspectual coercion is a clash between punctual and durative events. Brennan & Pylkkänen (2008) appropriately renamed it as Iterative Coercion Hypothesis. Imagine a punctual event such as 'John sneezed', combined to a durative context such as 'for three minutes'. As a punctual event, one single sneeze cannot last for three minutes. In this case, we are forced to reanalyze the verb as referring to an iterative event, now "John sneezed [several times] for three minutes". In this view, durative events in durative contexts ('John worked all the day long') and punctual events in punctual contexts ('John sneezed right now') would not present any coercion effects because they share aspectual properties with their temporal predicate.
However, the clash between punctual and durative events of ICH has been challenged in the last few years. For instance, Dölling (2014) developed what we call here Event-Classification-Driven hypothesis 2 . According to this author's view, verbs have no fixed event classification and bounce back and forth among his 13 event classes and subclasses (DÖLLING, 2014; Figure 3). Among them, 9 changes of classes are described by the author 3 . Each one is considered a different type of aspectual coercion. For example, a moment, such as the one John slept, can be turned into an event by stretching coercion (2a), the state of "be clever" can be used in a specific occurrence by agentive coercion (2b), and an event can be clearly interpreted as an incomplete process as the sentence context is not enough for event completion (2c). 4 1 In our view, effects in Todorova et al. (2000a,b) are not caused by aspectual coercion, but by distributional properties of the complements. Aspectual coercion is possible in non-resultative punctual verbs (semelfactives) such as 'sneeze all day long' or 'blink for two minutes'. Todorova's stimuli are resultative punctuals (achievements) such as 'send a large check for many years'. If someone already sent a large check to his daughter, it is obvious that the same check cannot be sent twice. 2 Roughly, Event Classification is a typology of linguistic event types in Philosophy of Language. Aristotle proposed the first event classification in the 9th book of Metaphysics. The most influential event classification in Linguistics is Vendler's (1957). See Rosen (1999) or Sampaio & França (2010) for a review. Dölling (2014) proposes his own classification. 3 Agentive, iterative, ingressive, inchoative, additive, subtractive, completive, stretching and habitual coercion. 4 Coll-Florit & Gennari (2011) and Faber & Gennari (2015) also propose that time perception can be observed in language processing. But these authors, differently from us, do not discuss aspectual coercion in their works, but focus on very different facets of language processing such tense and the influence of discourse contexts on sentence processing.
(2) a. John slept at 3 o'clock => John slept for 3 hours (stretching coercion) b. John is clever => John is being clever (agentive coercion) c. The musician played a sonata for 2 minutes (subtractive coercion) Sampaio's proposal (SAMPAIO et al., 2014;SAMPAIO, 2015;2016) looks at coercion as a kind of time perception phenomenon 4 and will be called "Time-Perception-Driven Hypothesis". Once punctual events are coerced into repetition in durative contexts, the same should happen to durative events when they are inserted in shorter and in larger durative contexts, following the Figure 1 and the example (3) bellow. Remark that, following the logics of pragmatic studies, subtractive and habitual coercion were also proposed by Dölling (2014) so, the main difference here are (i) the nature of each proposal (coercion is a pragmatic or a semantic effect?) and (ii) the number of coercion effects. The Time Perception-Driven hypothesis can be summed up in the Figure 1.
(3) a. During some hours, the girl fulfilled the form (iterative coercion) b. During some seconds, the girl presented the project (subtractive coercion) c. During some days, the girl eat cereal in the breakfast (habitual coercion) Both event-classification-driven and time perceptiondriven hypothesis are compatible with experimental results reported to date and expand the scope of their predictions, hypothesizing with other types of aspectual coercion. The present paper aims at testing the Time Perception-Driven hypothesis fully explained in Sampaio (2015). The next section 3 presents a self-paced reading experiment on subtractive, iterative and habitual coercion. Since Subtractive Coercion and Habitual Coercion have also been proposed by Dölling (2014), the test will present important evidences for the predictions of both hypotheses.

Experiment 1: Coercion of durative events
In order to test the Time-Perception-Driven hypothesis, we ran a self-paced reading experiment in Brazilian Portuguese to verify its major bet that different temporal contexts would indeed influence the reading times of durative events.
Participants. 32 native speakers of Brazilian Portuguese (18-25 y.o), all right handed with normal or corrected-to-normal vision participated in the experiment.
All participants were volunteers, students of UFRJ and the experiment followed the Declaration of Helsinki (2008) of ethical principles for research involving human participants.
Pretest. Prior to the main experiment, we normed 285 verbs by their subjective duration in a simple categorization test. A verb was presented at the center of a screen and ten participants (5 females) were to judge their duration between punctual, seconds, minutes and hours, presented at the top of the screen 5 . Their responses were given by pressing the numbers from 5 The pretest was applied to another 10 participants (5 females  Events have their own average duration in the real world. A song usually lasts for three minutes.
Listening to a song for some seconds or for some hours will trigger subtractive and iterative coercion, regardless of aspectual features. Duration changes also implies different event classification.

Predictions:
-Semantic coercion when the temporal context is not enough or extrapolates the verb subjective duration. 1 (punctual) to 4 (hours) or spacebar (to avoid the verb). All responses with the spacebar and RTs bellow 200ms and above 10s were removed from the categorization analysis, eliminating about 0.03% of data. Only the verbs presenting the minimum of 50% of responses for minutes (≈70%) were selected for the main test. Materials. Sixteen transitive sentences were constructed using these verbs (Appendix 1). Since the sentences are the same between conditions, except by the word for duration (seconds, minutes, hours and days), we did not controlled verbs for lexical or semantic variables. All sentences were inserted in one of four durative contexts: some seconds (subtractive condition), some minutes (control condition), some hours (iterative condition) and some days (habitual condition). Our stimuli are exemplified in (4)  Procedure. Sentences were presented in an external 17" IPS 60Hz monitor placed about 60cm from the subjects. Our stimuli used a white 28-point Arial font on a black background. The experiment is coded in Mathworks Matlab 7.14 2012a for Mac (OSX 10.9, Mavericks) using Psychtoolbox v3 (BRAINARD, 1997;KLEINER et al. 2007). Ten practice sentences were presented prior to the main test. All 32 participants presented accuracy rates above 80% at the practice trials. Each trial started with a fixation dot, presented for one second on the center of the screen. Participants were instructed to read the sentences at their own pace, pressing the spacebar to advance through the sentence. At the end of each trial, an interpretation question was presented in red font. Each participant took about 20 minutes to finish the experiment.
Results. Three participants did not reach 80% of accuracy in the main experiment and were eliminated from the analysis. For the 29 remaining participants, we trimmed data following the outlier labeling rule (TURKEY, 1977) using g = 2,2 (HOAGLIN; IGLEWITZ;TURKEY, 1986). The procedure eliminated about 6% of the data. The reading times for each segment are represented in the Figure 2.
The remaining reading times were not normally distributed according to Shapiro-Wilk test for normality (W<960, p<.01) leading us to use Wilcoxon Signed-Rank test of hypothesis for each of our unrelated paired samples.
Three regions of interest were defined. First, the segment 2 contains the word describing the magnitude of event duration. This region presents no significant effect.  Discussion. Results of the Experiment 1 were consistent with our initial hypothesis. Duration incongruences between events and their temporal contexts are observed online by a significant increase in the reading times at the region of the verb and of the direct object.  However, only subtractive [some seconds] and iterative [some hours] coercion have been found.
Our analysis does not suggest a habitual coercion effect and fails to reject the null hypothesis. Nevertheless, it is important to notice that the visual analysis presents a similar increase in the average reading times at the critical segments for the habitual coercion condition as the ones observed for iterative and subtractive coercion.
By now, we can consider at least three hypotheses: (1) habitual coercion does not exist; (2) habitual coercion exists and cannot be observed by our methods and; (3) habitual coercion exists and our analysis fails to observe a significant effect. New experiments are necessary in order to evidence the existence or the absence of this effect.

Discussion on Sampaio et al. (2014)
By now, another way to shed light on the results for habitual coercion is to compare them with the first experiment on the time-perception-driven hypothesis, presented in Sampaio, França and Maia (2014). In this paper we presented a similar experiment in which we reported inconsistent effects between different habitual conditions [days, months and years]. The results present significant effects only between [minutes] and [years], in the word describing duration [eg. minutes/years] and in the task. Significant effects were also found at the end of the sentence, that can be related to wrap up effects (Figure 3).
years", that is too long a time, inducing a huge temporal difference between conditions. This problem was solved in the new experiment presented in the last section by using "some [Δt]". As for the statistical problems, the data were trimmed using 6 standard deviations which is basically a visual inspection. Also, there was no test for normality distribution either, that would have indicated the suitability of applying an ANoVA test of hypothesis.
In section 4, we present the new standardized analysis of this experiment which will make it comparable to the present experimental venture described in this paper.

A reanalysis of Sampaio et al. 2014: Habitual Coercion of durative events
We re-analyze the data presented in Sampaio, França and Maia (2014) in which we had reported different and inconsistent effects between the three unrelated paired samples (minutes-days, minutes-months, minutesyears). Since data analysis of both tests followed different methods, the first step to understand the real differences between them was to unify the methods of data analysis.
Participants. 36 native speakers of Brazilian Portuguese, 19 females (18-25 y.o), right handed with normal or corrected-to-normal vision participated in the experiment. All participants were volunteers, students of UFRJ, and the experiment follows the Declaration of Helsinki (2008) of ethical principles for research involving human participants.
Materials. The materials consist of twelve durative sentences (Appendix 2) distributed in four habitual contexts [minutes, days, months and years]. Since the sentences were identical between conditions, except by the word describing the temporal context, we did not control the verbs by lexical or semantic variables. Each sentence was followed by an interpretation question in blue font, as exemplified in (5)  Procedures. The word-by-word self-paced reading test was coded and applied using Psyscope B57 (COHEN et al. 1993) on a MacBook White 15" with a 60Hz screen and running OSX 10.5.8 (Leopard). Stimuli were However, there were some methodological and statistical problems with this test and with its first analysis. First, in the stimuli, our sentences used quantified temporal contexts that keep the same number between conditions. For instance, sentences such as "walk for 10 minutes" were compared to others such as "walk for 10 presented in Times New Roman 24 white font in a black background. Questions were presented in a blue font. Ten practice trials were presented to the participants prior to the test. Sentences were randomized by the software and were preceded by a fixation cross presented for one second. Then a series of 5 hashtags (#) indicate that the keyboard is ready. Participants use the [spacebar] to advance through the sentence. The interpretation questions were answered yes [k] in green, or no [l] in red. Three volunteers did not reach 80% of accuracy at the interpretation questions and were replaced by another three participants. The mean accuracy for the total of 36 participants was 94%. Each participant took about 15 minutes to finish the test.
Results. For this paper, the original raw data were reanalyzed using the same methods described in the first experiment. Data were trimmed by the outlier labeling rule (TURKEY, 1977) using g=2.2 (HOAGLIN;IGLEWITZ;TURKEY, 1986). The procedure eliminated about 6% of the data. Figure 4 presents the average reading times for each word of the sentence.
The remaining data were not normally distributed (Shapiro-Wilk ; W<932, p<.006) which led us to apply the Wilcoxon Signed-Rank test for each of our unrelated non-parametric paired samples. A visual analysis suggests a slight difference in the reading times at the segments 5 and 7. However, no relevant differences were found at the segment 5, 7 or at the RTs ( Table 3).
Discussion. This reanalysis fixed several experimental problems we had with our 2014 test. Once again it did not point to any significant effect of habitual coercion and thus habitual coercion hypothesis failed to reject the null hypothesis.
There were also some contrasts between the test we had applied earlier and the one presented in section 3. In the earlier test to which we now presented a reanalysis we had not controlled the duration of the events in a pretest. Also, instead of using a general quantifier (some) before the different cycles (days, months, etc) we used a specific number. So, we compared sentences such as "During 10 minutes Carla walked in Ipanema Beach", that is quite plausible, with another condition in which "Carla walked for 10 years in Ipanema Beach", that is a quite long period. We thought that this use of a specific number might have biased the test. However, even with the use of the general quantifier (some) to all sentences, we did not find any difference.

General Discussion:
As we can observe, both hypotheses -the Time Perception-Driven and the Event-Classification-Driven -predict the habitual coercion but could not find it. However, we could find differences between, on one hand, days and, on the other, seconds, minutes and hours.
To sum up our results, we found that temporal words such as [seconds], [minutes] and [hours] refer to durations and tend to be applied to a single event. On the other hand, [days], [months] and [years] are not durations. They are cyclical time periods in which the event can happen once a day or even be randomly distributed during some days.
Using just linguistic theory, it is possible to say that durations are different than habits and, thus, require different cognitive processes, opening the question of why they are cognitively distinct. In a similar way, it is possible to argue that our hypothesis that links timeperception studies with event duration semantics makes it compatible with its sensitivity to the two different natures of time perception -the durations and the cycles (FRAISSE, 1984;BUHUSI;MECK, 2005, p. 756). It also makes it, perhaps, closer to a full fledge explanation of the phenomenon. Even if sentence processing is different in nature than time-perception processing, it is plausible that time-perception is related to the acquisition of the mean duration of events, as explained in detail in Sampaio, França & Maia (2014, p. 146) and Sampaio (2015).
Nevertheless, more experiments are required for us to fully comprehend the reality and the mechanisms of habitual coercion. At this point, our question and further pursuits is what different interdisciplinary methods can be used to directly evidence the habitual coercion effect.